⬆️ ⬇️

Creating a reliable and verifiable AI: Compliance with specifications, reliable training and formal verification

Errors and software went hand in hand from the very beginning of the computer programming era. Over time, the developers have developed a set of practices for testing and debugging programs before they are deployed, but these practices are no longer suitable for modern systems with deep learning. Today, the main practice in the field of machine learning can be called training on a specific data set, followed by testing on another set. In this way, one can calculate the average performance of the models, but it is also important to guarantee reliability, that is, acceptable performance in the worst case. In this article, we describe three approaches for accurately determining and eliminating errors in trained predictive models: adversarial testing, robust learning, and formal verification .



Systems with MO, by definition, are not stable. Even systems that win over a person in a certain area can not cope with the solution of simple problems when making subtle differences. For example, consider the problem of making disturbances in images: a neural network capable of classifying images better than people is easy to make believe that a sloth is a racing car by adding a small amount of carefully calculated noise to the image.





Controversial input when overlaying a conventional image can confuse the AI. The two extreme images differ by no more than 0.0078 for each pixel. The first is classified as a sloth, with a 99% probability. The second is like a race car with a 99% probability.



This problem is not new. Programs have always had bugs. For decades, programmers have gained an impressive array of techniques, from unit testing to formal verification. On traditional programs, these methods work well, but it is extremely difficult to adapt these approaches for thorough testing of MO models because of the scale and lack of structure in models capable of containing hundreds of millions of parameters. This suggests the need to develop new approaches to ensure the reliability of MO systems.

')

From the programmer’s point of view, a bug is any behavior that does not meet the specification, that is, the planned functionality of the system. As part of our AI research, we are studying techniques to assess whether MO systems meet the requirements not only of the training and test set, but also the list of specifications describing the desired properties of the system. Among such properties may be resistance to sufficiently small changes in the input data, safety restrictions preventing catastrophic failures, or compliance of predictions with the laws of physics.



In this article, we will discuss three important technical problems that the MO community faces when working with the goal of making MO systems sustainable and reliably meeting the desired specifications:



  1. Efficient specification checking. We study effective ways to verify that MO systems conform to their properties (for example, such as stability and invariance), which are required of them by the developer and users. One approach to finding cases in which a model may deviate from these properties is a systematic search for the worst results.
  2. Learning MO-models according to specifications. Even if there is a sufficiently large amount of training data, standard MO-algorithms can produce predictive models that do not meet the desired specifications. We are required to revise the learning algorithms so that they not only work well on the training data, but also meet the desired specifications.
  3. Formal proof of the compliance of MO models with the desired specifications. It is necessary to develop algorithms confirming the compliance of the model with the desired specifications for all possible inputs. Although the field of formal verification has been studying similar algorithms for several decades, despite its impressive progress, these approaches are not easy to scale to modern MO systems.


Check for model compliance with desired specifications



Resistance to competitive examples is a fairly well-studied problem of GO. One of the main conclusions drawn is the importance of assessing network actions as a result of strong attacks, and the development of transparent models that can be analyzed quite effectively. We, along with other researchers, have found that many models are robust against weak competitive examples. However, they provide almost 0% accuracy for more competitive cases ( Athalye et al., 2018 , Uesato et al., 2018 , Carlini and Wagner, 2017 ).



Although most of the work focuses on rare failures in the context of learning with the teacher (and basically this is the classification of images), there is a need to extend the application of these ideas to other areas. In a recent work with an adversarial approach to finding catastrophic failures, we apply these ideas to testing reinforced-trained networks for use in places with high security requirements. One of the problems in developing autonomous systems is that, since one mistake can have serious consequences, even a small probability of failure is not considered acceptable.



Our goal is to design a "rival" who will recognize these errors in advance (in a controlled environment). If an opponent can effectively determine the worst input data for a given model, this will allow us to catch rare cases of failures before it is deployed. As in the case of image classifiers, evaluating work with a weak opponent gives a false sense of security during deployment. This approach is similar to software development with the help of the “red team” [red-teaming - engaging a third-party team of developers who take on the role of intruders to detect vulnerabilities. transl.], however, goes beyond the search for failures caused by hackers, and also includes errors that manifest themselves naturally, for example, due to insufficient generalization.



We have developed two complementary approaches to the adversary testing of trained trainee networks. In the first, we use optimization without derivatives to directly minimize the expected reward. In the second, we learn the competitive value function, which, in experience, predicts what situations the network may fail. Then we use this learned function for optimization, concentrating on estimating the most problematic input data. These approaches form only a small part of the rich and growing space of potential algorithms, and we are very interested in the future development of this field.



Both approaches are already showing significant improvements over random testing. Using our method, it is possible to find shortcomings in a few minutes, which previously would have had to be searched for all day, and perhaps even could not be found ( Uesato et al., 2018b ). We also found that competitive testing can reveal a qualitatively different behavior of networks compared to what one would expect from an assessment on a random test case. In particular, using our method, we found that networks that performed the task of orienting on a three-dimensional map, and usually cope with this at the human level, cannot find the target in unexpectedly simple mazes ( Ruderman et al., 2018 ). Our work also emphasizes the need to design systems that are safe with respect to natural failures, and not only to competitors.





Conducting tests on random samples, we practically do not find cards with a high probability of failure, but competitive testing shows the existence of such cards. The probability of failure remains high even after the removal of many walls, that is, the simplification of maps compared to the original.



Learning models adhering to specifications



Adversary testing attempts to find a counterexample that violates specifications. Often it overestimates the consistency of the models with these specifications. From a mathematical point of view, a specification is a kind of relationship that must be maintained between the input and output data of a network. It may take the form of upper and lower bounds or some key input and output parameters.



Inspired by this observation, several researchers ( Raghunathan et al., 2018 ; Wong et al., 2018 ; Mirman et al., 2018 ; Wang et al., 2018 ), including our team from DeepMind ( Dvijotham et al., 2018 ; Gowal et al., 2018 ), worked on algorithms invariant to adversary testing. This can be described geometrically - we can limit ( Ehlers 2017 , Katz et al. 2017 , Mirman et al., 2018 ) the worst specification violation, limiting the output data based on the input set. If this boundary is differentiable by network parameters and can be quickly calculated, it can be used during training. Then the original boundary can spread through each layer of the network.







We show that the spread of the interval boundary is fast, efficient, and - unlike what was previously thought - it gives good results ( Gowal et al., 2018 ). In particular, we show that it can reduce the number of errors (i.e., the maximum number of errors any opponent can cause) in a demonstrable way compared to the most advanced image classifiers on sets from the MNIST and CIFAR-10 databases.



The next goal will be to study correct geometric abstractions to calculate excessive approximations of the output data space. We also want to train the networks so that they work reliably with more complex specifications that describe the desired behavior, such as the previously mentioned invariance and compliance with physical laws.



Formal verification



Careful testing and training can be a great help in building reliable MO systems. However, formally, whatever volumetric testing cannot guarantee that the behavior of the system will coincide with our desires. In large-scale models, iterating through all possible output options for a given set of inputs (for example, minor image changes) is difficult to implement because of the astronomical number of possible image changes. However, as in the case of training, it is possible to find more efficient approaches to setting geometric constraints on a multitude of output data. Formal verification is a topic of current research conducted in DeepMind.



The MO community has developed several interesting ideas for calculating the exact geometric boundaries of the network's output space (Katz et al. 2017, Weng et al., 2018 ; Singh et al., 2018 ). Our approach ( Dvijotham et al., 2018 ), based on optimization and duality, consists of formulating a verification problem in optimization terms that attempts to find the largest violation of the property under test. A problem becomes computable if you use ideas from duality in optimization. As a result, we obtain additional constraints that refine the boundaries calculated when moving the boundary of the interval [interval bound propagation] using the so-called cutting planes. This is a reliable, but incomplete approach: there may be cases when the property of interest to us is satisfied, but the boundary calculated by this algorithm is not sufficiently strict for the existence of this property to be formally proved. However, having obtained the border, we get a formal guarantee of the absence of violations of this property. In fig. Below this approach is illustrated graphically.







This approach allows us to extend the applicability of verification algorithms to more general-purpose networks (activator functions, architectures), general specifications and more complex GO models (generative models, neural processes, etc.) and specifications that go beyond the limits of competitive reliability ( Qin , 2018 ).



Perspectives



Deploying MO in high-risk situations has its own unique challenges and difficulties, and this requires the development of assessment technologies that are guaranteed to detect unlikely errors. We believe that sequential training according to specifications can significantly improve work efficiency in comparison with those cases when specifications arise implicitly from training data. We look forward to the results of current research, adversary assessment, robust learning models and verification of formal specifications.



It will take a lot more work so that we can create automated tools to ensure that AI systems in the real world will “do everything right”. In particular, we are very pleased with the progress in the following areas:



  1. Training for adversarial evaluation and verification. With the scaling and complexity of AI systems, it is becoming increasingly difficult to design competitive evaluation and verification algorithms sufficiently adapted to the AI ​​model. If we can use all the power of AI for evaluation and verification, this process can be scaled.
  2. Developing publicly available tools for competitive assessment and verification: it is important to provide engineers and other people using AI with easy-to-use tools that shed light on possible modes of failure of the AI ​​system before this failure leads to extensive negative consequences. This will require a certain standardization of contentious evaluations and verification algorithms.
  3. Expansion of the spectrum of competitive examples. So far, much of the work on controversial examples concentrates on the resilience of models to small changes, usually in the field of pictures. It has become an excellent testing ground for developing approaches to competitive evaluation, reliable training and verification. We began to study various specifications for properties that are directly related to the real world, and we look forward to the results of future research in this direction.
  4. Learning Specifications. Specifications describing the "correct" behavior of AI systems are often difficult to formulate accurately. As we create more and more intelligent systems capable of complex behavior and work in an unstructured environment, we will need to learn how to create systems that can use partially formulated specifications, and get further specifications from feedback.


DeepMind is set to positively influence society through responsible development and deployment of MO systems. To make sure that the input from the developers is guaranteed to be positive, we need to cope with many technical obstacles. We intend to make a contribution to this area and are happy to work with the community to solve these problems.

Source: https://habr.com/ru/post/450520/



All Articles