⬆️ ⬇️

Suggestions for vulnerabilities and protecting machine learning models





Recently, experts have increasingly raised the issue of the safety of machine learning models and offer various ways of protection. It is high time to examine in detail potential vulnerabilities and defenses in the context of popular traditional modeling systems, such as linear and tree models, trained in static data sets. Although the author of the article is not a security expert, he very closely follows topics such as debugging, explanations, fairness, interpretability, and privacy in machine learning.



In this article, we present several probable attack vectors for a typical machine learning system in a typical organization, offer indicative solutions for protection, and consider some common problems and the most promising practices.



1. Attacks to distort data



Distortion of data means that someone systematically changes the training data to manipulate the predictions of your model (such attacks are also called “causal” attacks). To distort the data, the attacker must have access to some or all of your data for training. And in the absence of proper control in many companies, different employees, consultants and contractors may have such access. An attacker outside the security perimeter can also gain unauthorized access to some or all of the training data.

')

A direct attack to distort data may include a change in dataset tags. Thus, whatever the commercial use of your model, an attacker can manage her predictions, for example, by changing the tags, so that your model can learn to issue large loans, large discounts, or set small insurance premiums for attackers. Forcing a model to make false predictions in the interests of the attacker is sometimes called a violation of the “integrity” of the model.



An attacker can also use data corruption to train your model for the purpose of deliberately discriminating a group of people, depriving them of a large loan, large discounts, or low insurance premiums that they rightfully rely on. At its core, this attack is similar to DDoS. Forcing a model to make false predictions to harm others is sometimes called a violation of the model’s “accessibility”.



Although it may seem that it is easier to distort the data than to change the values ​​in the existing dataset rows, you can also add distortions by adding innocuous or extra columns to the datasets. Changed values ​​in these columns can then cause a change in model predictions.



Now let's look at some possible defensive and expert (forensic) solutions in case of data corruption:





Differential impact analysis, residual analysis and self-reflection can be carried out during training and as part of model monitoring in real time.



2. Watermark Attacks



A watermark is a term borrowed from the safety literature of deep learning, which often refers to adding special pixels to an image to get the desired result from your model. It is possible to do the same with customer or transaction data.



Consider a scenario in which an employee, consultant, contractor or an attacker from the outside has access to the code for the production-use of your model, which makes forecasts in real time. Such a person can change the code to recognize a strange or unlikely combination of the values ​​of the input variables to obtain the desired prediction result. Like data corruption, watermark attacks can be used to disrupt the integrity or availability of your model. For example, to violate integrity, an attacker could insert a “payload” into the evaluation code for the production-use model, with the result that it recognizes a combination of 0 years of age at address 99, which will lead to some kind of positive prediction for the attacker. And to block the accessibility of the model, he can insert an artificially discriminatory rule into the assessment code, which will not allow the model to give positive results for a certain group of persons.



Protective and expert approaches to watermarked attacks may include:





Identification of anomalies, data integrity constraints, and differential impact analysis can be used during training and as part of model monitoring in real time.



3. Inversion by surrogate models



Usually, “inversion” refers to obtaining unauthorized information from a model, rather than placing information in it. Also, inversion can be an example of "intelligence reverse engineering attack." If an attacker is able to get a lot of predictions from the API of your model or another endpoint (website, application, etc.), he can train his own surrogate model . Simply put, this is a simulation of your predictive model! Theoretically, an attacker can train a surrogate model between the input data that he used to generate the predictions he received and the predictions themselves. Depending on the number of predictions that can be obtained, the surrogate model can become a fairly accurate simulation of your model. After learning the surrogate model, the attacker will have a sandbox from which he can plan for impersonalization (i.e., imitation) or a case-breaking attack on the integrity of your model, or have the potential to start restoring some aspects of your confidential training data. Surrogate models can also be trained using external data sources that are somehow consistent with your predictions, as ProPublica , for example, did with the COMPAS recidivism model.



To protect your model from inversion using a surrogate model, you can use the following approaches:





4. Adversary attacks



In theory, a purposeful hacker can learn — say, through trial and error (i.e., “intelligence” or “sensitivity analysis”) —the inversion of a surrogate model or social engineering, how to play with your model, to get the desired prediction result or to avoid unwanted forecast. Attempting to achieve such goals using a specially developed data line is called an attack with a controversial example. (sometimes - an attack to investigate integrity). An attacker can use an adversarial example to obtain a large loan or low insurance premium, or to avoid being denied parole with a high criminal risk assessment. Some call using adversarial examples to exclude an undesirable result from the forecast as “evasion”.



Try the methods below to defend or detect an attack with a controversial example:





Activation analysis or comparative models can be used during training and as part of model monitoring in real time.



5. Impersonalization



A goal-oriented hacker can find out - again, through trial and error, using a surrogate model or social engineering, which input or specific people get the desired prediction result. Then the attacker can impersonate this person in order to benefit from the prediction. Attacks with impersonalization are sometimes called “imitative” attacks, and from the point of view of the model, this resembles the theft of personal data. As in the case of an attack with a controversial example, with impersonalization, the input data artificially changes according to your model. But, in contrast to the same attack with a controversial example, in which a potentially random combination of values ​​can be used to deceive, during impersonalization, information related to another modeled object (for example, a convict, client) can be used to obtain a forecast associated with this type of object. , employee, financial transaction, patient, product, etc.). Suppose an attacker can find out from which characteristics of your model the provision of large discounts or benefits depends. Then he can falsify the information you use to get such a discount. An attacker can share his strategy with others, which can lead to large losses for your company.



If you use a two-step model, then beware of an “allergic” attack: an attacker can imitate a string of ordinary input data for the first stage of your model to attack its second stage.



Protective and expert approaches for attacks with impersonalization may include:





Activation analysis, duplicate checking and potential threat notification function can be used during training and as part of real-time model monitoring.



6. Common problems



Some common patterns of machine learning use also entail more general security problems.



Black boxes and unnecessary complexity . Although recent advances in interpretable models and model explanations allow for the use of accurate and transparent non-linear classifiers and regressors, many machine learning processes are still focused on black box models. They are only one type of often unnecessary complexity in the standard workflow of commercial machine learning. Other examples of potentially harmful complexity may include overly exotic technical specifications or a large number of package dependencies. This can be a problem for at least two reasons:



  1. An assertive and goal-oriented hacker may eventually learn more about your overly complex black box modeling system than you or your team (especially in today's overheated and rapidly changing market of data analysis). To do this, an attacker can use a variety of new model- independent explanation methods and classical sensitivity analysis, apart from many other more common tools of hacking. This imbalance of knowledge can potentially be used to carry out the attacks described in sections 1-5, or for other, as yet unknown types of attacks.

  2. Machine learning in a research and development environment largely depends on a diverse ecosystem of open source software packages. Some of these packages have many participants and users, others are highly specialized and are needed by a small circle of researchers and practitioners. It is known that many packages are supported by brilliant statisticians and machine learning researchers who focus on mathematics or algorithms, rather than software engineering and certainly not security. There are frequent cases when the machine learning pipeline depends on dozens or even hundreds of external packages, each of which can be cracked to conceal a malicious "payload."



Distributed systems and models . Fortunately or unfortunately, we live in the age of big data. Many organizations today use distributed data processing and machine learning systems. Distributed computing can be a big target for attacking from inside or outside. The data may be distorted only on one or several working nodes of a large distributed storage or data processing system. The “back door” for watermarks can be encoded into one model of a large ensemble. Instead of debugging one simple dataset or model, practitioners now have to study data or models scattered across large computing clusters.



Distributed Denial of Service (DDoS) attacks . If the predictive modeling service plays a key role in your organization's activities, make sure that you take into account even the most popular distributed DDoS attacks, when attackers attack the prediction service with an incredibly large number of requests to delay or stop the issuance of predictions for legitimate users.



7. General solutions



You can use several common, old and new, most effective methods to reduce the vulnerability of the security system and increase the fairness, accountability, transparency and trust in machine learning systems.



Authorized access and frequency control (throttling) prediction . Standard protections, such as additional authentication and prediction frequency adjustment, can be very effective in preventing a number of attack vectors described in sections 1-5.



Comparative models . As a comparative model to determine whether any manipulations were made with the forecast, you can use the old and proven modeling pipeline or another interpretable forecasting tool with high transparency. Manipulations include data corruption, watermark attacks, or attacks with a controversial example. If the difference between the forecast of your proven model and the forecast of a more complex and opaque model is too great, write down such cases. Send them to analysts or take other measures to analyze or correct the situation. Serious precautions must be taken to ensure that your comparative model and the conveyor remain safe and unchanged compared to their original, reliable condition.



Interpreted, fair or private models . Currently, there are methods (for example, monotone GBM (M-GBM), scalable Bayesian rule lists (SBRL) , explained neural networks (XNN) ) that provide both accuracy and interpretability. These accurate and interpretable models are easier to document and debug than the classic “black boxes” of machine learning. Newer types of fair- and private-models (for example, LFR, PATE) can also be trained in how to pay less attention to the outside, the demographic characteristics that are available for observation, use by social engineering during an attack with a controversial example or impersonalization. Are you considering creating a new machine learning process in the future? Consider building it on the basis of less risky interpreted private or fair models. They are easier to debug and potentially resistant to changes in the characteristics of individual objects.



Debugging models for security . A new area of debugging models is dedicated to detecting errors in the mechanisms and predictions of machine learning models and correcting them. Debugging tools such as surrogate models, residual analysis and sensitivity analysis can be used in “white” trials to identify your vulnerabilities, or in analytical exercises to identify any potential attacks that may occur or occur.



Documenting models and methods of explanation . Model documentation is a risk mitigation strategy that has been used in banking for decades. It allows you to save and transfer knowledge about complex modeling systems as the composition of the model owners changes. Traditionally, documentation has been applied to linear models of high transparency. But with the advent of powerful, accurate explanation tools (such as the SHAP tree and the derived attributes of local functions for neural networks), previously existing workflows of the black box models can be at least a little explained, debugged, and documented. Obviously, the documentation should now include all security objectives, including known, fixed, or expected vulnerabilities.



Monitor and manage models directly for safety . Serious practitioners understand that most of the models are trained in static “snapshots” of reality in the form of datasets, and that in real time the accuracy of forecasts decreases as the current state of affairs moves away from the information collected earlier. Today, the monitoring of most models is aimed at identifying such a shift in the distribution of input variables, which ultimately will lead to a decrease in accuracy. Model monitoring should be designed to track the attacks described in sections 1 through 5, and any other potential threats that are detected when debugging your model. Although this is not always directly related to safety, models should also be evaluated for differential impact in real time. Along with model documentation, all modeling artifacts, source code, and associated metadata should be managed, versioned, and security checked, as well as the valuable commercial assets that they are.



Options for notifying possible threats . Functions, rules and stages of pre-processing or post-processing can be included in your models or processes equipped with tools for notifying you of possible threats: for example, the number of similar lines in the model whether the current row represents an employee, contractor, or consultant; whether the values ​​in the current line are similar to those obtained in “white” attacks with a competitive example. These functions may or may not be needed during the first training of the model. But saving space for them may one day be very useful in evaluating new data or in the subsequent retraining of a model.



Detection of system anomalies . Teach an auto-coder-based anomaly detection anomaly on operating statistics of your entire prognostic modeling system (number of forecasts for a certain period of time, delays, processor, memory and disk usage, number of simultaneous users, etc.), and then closely monitor this meta model for anomalies. Anomaly can tell if something goes wrong. To accurately track the causes of the problem, follow-up investigations or special mechanisms will be required.



8. References and information for further reading.



A large number of modern academic literature on the safety of machine learning is focused on adaptive learning, deep learning and encryption. However, so far the author does not know the practitioners who would actually do all this. Therefore, in addition to recently published articles and posts, we present articles from the 1990s and early 2000s about network violations, virus detection, spam filtering, and related topics, which were also useful sources. If you want to learn more about the fascinating topic of protecting machine learning models, here are the main links - from the past and the present - that were used to write the post.





Conclusion



Those who are concerned with the science and practice of machine learning are concerned that the threat of hacking during machine learning, combined with the growing threat of breach of confidentiality and algorithmic discrimination, can increase growing social and political skepticism about machine learning and artificial intelligence. We must all remember the difficult times for AI in the recent past. Security vulnerabilities, breaches of confidentiality, and algorithmic discrimination can potentially be combined, leading to reduced funding for computer-based research, or draconian measures to regulate this area. Let's continue the discussion and solution of these important problems in order to prevent a crisis, and not to clear up its consequences.

Source: https://habr.com/ru/post/458892/



All Articles