Invented by Bita Darvish Rouhani, Tara Javidi, Farinaz Koushanfar, Mohammad Samragh Razlighi, University of California

As the use of deep learning continues to grow, so does the potential for adversarial attacks. Adversarial attacks are a type of cyber attack that involves manipulating the input data of a deep learning system in order to cause it to produce incorrect or unexpected outputs. This can have serious consequences in a variety of fields, from healthcare to finance to national security. As a result, there is a growing market for detection of adversarial deep learning. This market includes a range of products and services designed to help organizations protect their deep learning systems from these types of attacks. One key area of the market is software tools for detecting adversarial attacks. These tools use a variety of techniques, such as analyzing the distribution of inputs to a deep learning system or monitoring the output of the system for unexpected behavior. Some tools also use machine learning algorithms to learn from past attacks and improve their ability to detect future attacks. Another area of the market is consulting and training services. Many organizations lack the expertise to effectively protect their deep learning systems from adversarial attacks, and so they turn to consulting firms and training providers for help. These services can include everything from risk assessments to customized training programs for employees. There are also a number of research organizations and academic institutions working on developing new techniques for detecting adversarial attacks. This research is critical to staying ahead of attackers and developing new tools and techniques for protecting deep learning systems. Overall, the market for detection of adversarial deep learning is still relatively new and rapidly evolving. As the use of deep learning continues to grow, it is likely that this market will continue to expand and become increasingly important for organizations across a range of industries.

The University of California invention works as follows

A method of detecting or preventing an adversarial assault against a machine learning target model can be provided. The method can include training a defender model based on at least training data to allow the defender model to be able to identify malicious input samples. The trained defender model can be deployed to the target machine-learning model. The trained defender model can be combined with the target model in order to determine whether the input sample that is received by the target model is malicious or legitimate. Also provided are related systems and articles, such as computer program products.

Background for Detection of adversarial deep Learning

Machine-learning models can be trained to perform cognitive tasks such as object recognition, natural language processing and retrieval of information, or speech recognition. A deep learning machine such as a neural net, belief network, restricted Boltzmann machines, etc., can be trained to perform regression tasks. In order to perform the regression task, the deep learning model may be required to predict changes in dependent variables based on variations in at least one or two independent variables. A deep learning model can be trained by assigning input samples into one or more categories. The deep learning model can be trained to perform a classification task using training data that have been labeled according to the known category memberships of each sample in the training data.

Systems, Methods, and Articles of Manufacture, including Computer Program Products, are Provided for Detecting and Preventing Adversarial Deep Learning. In certain example embodiments, a system is provided that includes at the least one processor and a memory. At least one memory can include program code which, when executed by at least one processor, provides operations. These operations include: training a first-defender machine-learning model based on at least training data to enable it to identify malicious inputs samples; and deploying the trained model at a target model machine learning to determine whether the input received by the target model machine learning is malicious or legitimate.

In some variations, the features described herein can be combined in any possible way. The first defender machine-learning model can respond to an input sample at the target machine-learning model by at least producing a first outcome indicative of whether it is a malicious or legitimate input sample. The first output can be combined with a second outcome from a second defender machine-learning model that has been trained and deployed at the target model. The second output can indicate whether or not the input sample was determined by the second defender machine to be malicious and/or legitimate. The first and second outputs may be combined to produce a metric that indicates the legitimacy of the output inference generated by a target machine learning model after processing the input sample.

In some variations, the layer of a target machine-learning model can be identified by identifying the instability or sensitivity of that layer to perturbations of one or more inputs processed by the machine-learning model. The first defender machine-learning model can be used at a layer of the machine learning target model. A trained first defender model can be deployed on the layer of a target machine-learning model, and a second machine-learning model at a layer below. The first defender and second defender machines learning models may be configured in a way that they are negatively correlated. This can be achieved by training the second machine learning based on data that contains perturbations which were not present when the first machine learning was trained. The second defender model can identify malicious input samples which are able bypass the first defender model.

In some variations, the input layer may be the layer of a target machine-learning model. The first defender machine-learning model can be used at the input layer to the target machine-learning model in order to identify malicious input samples before the target model is processed.

In some variations, the layer may be an intermediary layer of a target machine-learning model. The first defender machine-learning model can be used at the intermediate level of the target model in order to identify malicious input samples using a latent reaction triggered by the malicious samples at the intermediate level of the model. The intermediate layer may be a core computing layer, a standardization layer and/or non-linearity.

In some variations, training data can include a number of valid input samples. The first defender machine-learning model can be trained to determine a probability distribution function for a plurality of valid input samples. The training data can also exclude malicious input samples. “The first defender machine-learning model can be trained unsupervised based on training data that excludes malicious samples.

In some variations, the first defender machine-learning model can determine if the input is malicious or legitimate by determining a probability that the input comes from the explored subspace in the target machine-learning model. The input sample can be classified as malicious if the probability that the input sample originates from the explored subspace is below a threshold.

In some variations, a machine learning model for the first defender may be taught a dictionary of features that are associated with legitimate input sample. The trained first-defender machine-learning model can determine whether an input sample is malicious and/or legitimate by reconstructing the sample based on the dictionary. The trained first-defender machine learning models may determine whether an input sample is malicious or legitimate based on the peak signal-to noise ratio.

In some variations, the replica machine learning models may have the same topographical structures and/or parameters of the target model. The target machine-learning model and/or first defender machine-learning model can be deep learning models.

In some variations, the defender machine-learning model can be used at each layer of the machine-learning target model. Each defender machine-learning model can be trained to identify whether the input sample at the target machine-learning model is malicious or legitimate.

Implementations” of the subject matter include, but aren’t limited to, methods that conform to the descriptions given herein, as well as articles containing a machine-readable medium that can be used to control one or more machines, e.g. computers. This will result in the implementation of one or more features. Computer systems that include one or multiple processors, and one or several memories connected to those processors are also described. A memory can be a nontransitory computer readable or machine readable storage medium that includes, encrypts, stores, or does the like, one or several programs which cause one or multiple processors to perform the described operations. One or more data processing systems can implement computer implemented methods in accordance with one or several implementations of this subject matter. These multiple computing systems may be connected to each other and exchange data, commands or instructions via one or several connections. Internet, wireless wide area networks, local area networks, wide area networks, wired or other networks, etc.), or via direct connections between multiple computing systems.

The drawings and description below provide details on one or more variations. The description, drawings and claims will reveal other features and benefits of the subject described herein. While certain features of currently disclosed subject material are described to illustrate, they are not meant to be restrictive. The claims following this disclosure are meant to define the scope and protection of the subject matter.

Machine Learning models can be trained to perform various cognitive tasks, such as classification, regressions, feature extractions, pattern recognitions, etc. A conventional machine-learning model can be vulnerable to adversarial attack, where the model is presented with malicious input samples designed to confuse the model. In a discriminatory assault, for example, a machine learning target model could be presented with a malicious sample that contains one or more perturbations. These perturbations are often invisible, but they can be enough to cause a target machine learning model (e.g., a discriminative machine learning model) to produce an incorrect output inference. In a generative assault, a malicious model of machine learning (e.g. a discriminative model or a generative model) can interact with the target model to identify features that are present in legitimate input samples. This will cause the target model to make certain output conclusions. The malicious machine learning model can learn to create malicious input samples that are similar to legitimate input samples. In some embodiments, the target machine learning model can be coupled with one of more defender machine-learning models that are configured to detect malicious input samples and prevent adversarial deeplearning.

In some embodiments, the target machine learning model can be a deep network with a number of layers. These layers may include, for instance, core computation layers and normalization layers. They could also include non-linearity or pooling layers. The target machine-learning model can be coupled with one of more defender models that have the same topographical structures and/or parameters as the target model (e.g. weights, biases and/or other similar parameters). In some embodiments, the defender machine-learning model can be applied to each layer of a target machine-learning model. A defender machine-learning model can be deployed in addition or alternatively at the layers that are the most vulnerable to malicious input samples. For example, the layers that exhibit the highest instability and/or sensitivity when perturbations occur within the input samples processed.

In some embodiments, an defender machine-learning model can be deployed on the input layer of a machine-learning target model. A defender machine-learning model deployed at the output layer of a machine-learning target model can be configured to determine whether an input sample is malicious or legitimate before the target model processes it. A defender machine-learning model can be deployed on one or more intermediate levels of the target machine-learning model. A defender machine-learning model deployed at an intermediary layer of the target model can be configured to determine based on at least a latent reaction observed at the intermediate level, whether the input sample that triggers the latent is malicious or legitimate.

In some embodiments, the defender machine-learning models that are deployed to a target model can be negatively correlated. This is achieved, for instance, by training them in a sequence of defender models (e.g. Markov chain). To train the defender model, perturbations can be added at successive layers of the target model. The training data for a defender model that is deployed on a layer of the machine learning target model can include perturbations not present in the data used to create a model at an earlier layer. Each of the defender machines learning models can be trained to detect various malicious input samples. A malicious input sample which is able bypass one defender model can be caught by another defender model.

In some embodiments, the defender machine-learning model can be trained to acquire a probability distribution function (PDF) that is associated with legitimate input samples from a target machine-learning model. A defender machine-learning model deployed in an input layer may be taught a probability density function of the legitimate input sample, while a model deployed in an intermediate layer may be taught a probabil density function for a latent reaction triggered by these legitimate input samples. The probability density function of legitimate input samples can correspond to a subspace explored by the target model. This subspace may contain subsets that are frequently encountered by the model during training. Malicious input samples, on the other hand, are typically derived from the unexplored space of the machine learning target model. This subspace may contain subsets that the machine learning target model encounters infrequently. A malicious input sample can be created by manipulating noncritical features (e.g. nuisance) of input samples in the unexplored space of the target machine-learning model. The defender machine-learning model can be configured to determine a probability and/or latent response of the input sample based on at least the probability density function. The defender machine-learning model can be configured to identify an input sample as malicious if its probability and/or latent response do not exceed a threshold.

FIG. The figure 1 is a schematic illustration of a machine-learning model 100 according to some examples embodiments. Referring to FIG. The machine learning model 100 can be a deep-learning model, such as a neural net and/or similar. As shown in FIG. The machine learning model may be trained by assigning a input image to one or several categories.

As noted, a model of machine learning may consist of a number layers such as core computation layers and normalization layers. It could also include non-linear or non-normalized layers. For further illustration, FIG. The machine learning model 100 is shown in Figure 1. It has, for instance, convolution layers, pooling layers, and fully connected layers. In the meantime, Table 1 below shows examples of layers which may be present in machine learning models such as a deep neural network.

Click here to view the patent on Google Patents.