# Thoughts on "An Abstraction-Refinement Approach to Verification of Artificial Neural Networks"

Neural networks (NNs) have become a ubiquitous machine learning technique in a wide range of domains. In 2003, some researchers surveyed the literature on verification of NNs and concluded that they “represent a class of systems that do not fit into the current paradigms of software development and certification”. Wondering if that was still the case after the NN renaissance that’s taken place over the last ten years or so, I recently began looking for recent work on verifying interesting properties of neural networks. The first relevant paper I found in the software verification literature was “An Abstraction-Refinement Approach to Verification of Artificial Neural Networks”, by Luca Pulina and Armando Tacchella, which appeared in CAV 2010. Here are some thoughts on that paper.

## Background on neural networks

Like other machine learning techniques, instead of being programmed to compute a function, an NN learns to approximate the function based on a set of training data. NNs are made up of layers of processing units (or neurons), each of which takes some number of inputs and produces an output that is passed on to the next layer. At each unit, the inputs are combined using parameters (or weights) that determine how each input contributes to the result, and it is these parameters that must be learned during the training process.

The typical approach to training an NN is to separate a set of available data into training, validation, and test sets. Each of these data sets is made up of examples that pair an input (say, an array of pixels that represent a picture) with an expected output (say, the label “pizza”), and the idea is for the NN to learn some unknown function $f$ from the data. During training, the NN learns to approximate $f$ from the input-output pairs in the training set by adjusting the parameters and biases associated with each unit and layer in the network, to minimize error when run on the training set — an iterative process that may take quite a long time. The validation set is used to tune the so-called hyperparameters of a network, which include properties like the number of units in each layer. Finally, the test set is used to assess the trained network’s performance.1

The NN may turn out to underfit $f$, meaning that it consistently deviates from $f$, or it may overfit $f$, meaning that it approximates $f$ closely only for examples that it has already seen. Underfitting and overfitting result in poor performance, and if few or no training examples correspond to an input that the NN sees in practice, then the behavior of the NN for that input will be difficult or impossible to predict. And all this is to say nothing of inputs that are intentionally designed to fool the NN.

## Using abstraction-refinement to verify the safety of a neural network

Pulina and Tacchella address the question of verifying whether an NN will produce output that falls within particular bounds. This is a meaningful question to ask about NNs trained to predict a single continuous value. In the domain of autonomous driving, for instance, the input might be a collection of sensor readings from a car, and the single continuous value to be predicted might be, for instance, a steering angle.

In the paper, Pulina and Tacchella define a “safe” NN as one whose output will always fall within certain specified bounds. The key to their approach to verifying that an NN is safe is an abstract interpretation-inspired technique for sound approximation of the NN. The approximation yields an abstract NN such that, if the abstract NN is safe, so is the concrete NN. Furthermore, it’s possible to verify the safety of the abstract NN relatively easily using an off-the-shelf model checker, HySAT.2 Hence sound approximation opens the way to feasible verification of NNs.

Importantly, the approximation is sound but not complete. In this setting, soundness of the approximation means that if the abstract NN produces a safe output given a particular input, then the concrete NN will produce a safe output for that input, too. But incompleteness of the approximation means that it is possible for the abstract NN to produce an unsafe output for some inputs, even if the concrete NN produces a safe output for them. If these false negatives, or “spurious counterexamples” as Pulina and Tacchella call them, crop up during the verification process, we can automatically refine the abstraction, producing an abstract NN that is a little closer to the concrete NN, and try again, in a process that Pulina and Tacchella call counterexample-triggered abstraction-refinement (CETAR).3 The process continues until either the concrete NN has been shown to be safe (because the abstract NN passes the safety check), or until a true counterexample input (an input for which both the abstract NN and the concrete NN produce an unsafe output) has been found.

## Training on spurious counterexamples to make NNs more robust

An interesting thing about spurious counterexamples is that they could be good for more than just refining the abstract NN during verification. Pulina and Tacchella observe that spurious counterexamples could be especially valuable for training the concrete NN in the first place, because they identify “weak spots” in the abstract NN that could point to corresponding (hidden!) weak spots in the concrete one. Therefore the CETAR approach suggests a way not only to demonstrate the safety (or lack thereof) of NNs, but also to improve safety by discovering spurious counterexamples that can be trained on.4 Pulina and Tacchella offer empirical evidence that supplementing the original training set with spurious counterexamples results in a more robust NN — that is, one in which outputs fall within an interval that is closer to the safe interval — and they show how to automate this NN “repair” process as part of the CETAR loop.

They evaluate their appproach with a case study involving control of an industrial robot arm: a 6-degrees-of-freedom manipulator with revolute joints. They use an NN trained to predict the final position of the robot’s end effector (the device at the end of the robot arm) based on its joint angles. The examples for training the NN are vectors encoding six angle measurements and the coordinate position of the end effector. A certain region is considered safe for the robot arm’s motion. If the training examples only encode positions in which the end effector is in a safe position, then we might expect the trained NN to only produce safe outputs, but there is no guarantee that this is the case. Using CETAR, they can prove that the NN is in fact safe.

## Limitations and next steps

So far, all this seems promising. So, what are the limitations of the approach described in the paper, and where do we go from here?

First, the robot-arm case study in the paper uses a three-layer NN (an input layer, an output layer, and one inner or hidden layer), with three units in the hidden layer, trained on 141 training examples. Both the size of the network and the number and size of the training examples are tiny by today’s standards. For instance, the MNIST handwritten digit classification tutorial that comes with TensorFlow shows how to build a network with multiple hidden layers and tens or hundreds of units per hidden layer, and train it on 60,000 examples, each of which has a 784-element vector as its input part. And that’s just for a “Hello World”-style tutorial. The ImageNet challenge problems, for instance, have training sets of millions of examples. The teams that do well in those challenges build NNs with dozens of layers and thousands of units per layer, resulting in tens of millions of parameters to be learned. It’s not unusual for training to take days or weeks.

For the robot-arm case study, since the network is so small and since the training examples are so few, the training time can be quite short.5 It needs to be, because the automated NN repair algorithm in the paper calls the “train” function repeatedly in a loop. For networks that take a long time to train, it’s hard to imagine the CETAR repair approach being practical. However, the CETAR verification approach would still be valuable for figuring out whether an NN is safe in the first place.

Second, verifying that an NN’s outputs fall within a specified range seems kind of boring to me. If you need to ensure that outputs are in a certain range, then practically, you could just have a postprocessing step that clamps the output to that range. Pulina and Tacchella published a 2011 follow-up paper, “Challenging SMT solvers to verify neural networks”, that I haven’t yet had a chance to read closely, but at a quick glance, it looks like it acknowledges this point and considers a more subtle safety property, “local safety”. I don’t entirely understand local safety yet, but it looks like it has to do with ensuring that if an input to an NN is “close” to that of some examples on which it was trained, then the output will be “close” to that example’s output. This raises interesting questions about what it means to be “close”, which is something I’ve been thinking about lately in the context of adversarial examples. In fact, it seems like adversarial examples can be explained by a conflict between different metrics for closeness. Long story short, it would be interesting to see whether local safety is still a useful notion when dealing with really high-dimensional inputs.

Third, the robot-arm case study in particular seems odd because, as far as I can tell, the NN is being trained to approximate a function that could be worked out analytically. If you know all of the robot’s joint angle measurements (which you do, because they’re the input to the NN), you should be able to figure out the exact position of the end effector by adding vectors, without doing any machine learning at all. Of course, this doesn’t mean that there’s necessarily anything wrong with CETAR, but it would be nice to see if the approach could be used to verify an NN that does something that would be hard to do without machine learning, like, say, recognizing handwritten digits, or telling different kinds of irises apart based on petal and sepal measurements. (I should note that in the 2011 follow-up paper from Pulina and Tacchella, there’s a new robot-arm case study, and in that one the function that the NN is approximating looks like one that would be much harder to work out analytically: the inputs are joint positions and velocities and the outputs are internal forces and torques, accounting for “components from gravity, Coriolis forces, and manipulator inertia”.)

And finally, I wonder if the CETAR verification approach is applicable to NNs that do classification tasks. Checking that outputs fall within a specified range makes sense for NNs that do regression tasks, like “What is the position of the robot’s end effector?” or “What should the steering angle be?”, but not those that do classification tasks, like “Is this a picture of a pizza?” or “Is there a pedestrian in the crosswalk?” It’s not clear to me how one would define what safety means for classification tasks, or what a sound approximation of an NN that does classification would look like. Of course, it’s still worthwhile to have a verification approach that only works for NNs that do regression tasks. (Alex points out that regression tasks are harder than classification tasks “because there’s an infinite number of ways to be wrong”.) Still, it would be really interesting to explore what safety means in the setting of NNs that do classification. It’s possible that the aforementioned local safety property, or something like it, would be a good place to start.

1. By “performance”, I mean quality of results. This usage is confusing if, like me, you come from the parallel computing world, where “performance” usually means how long something takes. When they want to talk about how long it takes to train a network or to how long it takes to use it once it’s trained, ML people will speak of “training time” and “inference time”, respectively.

2. It’s worth pointing out that since 2010 there have been not one but two successors to the HySAT model checker, iSAT and iSAT3, and iSAT3 apparently demonstrates speedups over iSAT of between one and two orders of magnitude on average. (However, improvements in the speed of the model checker don’t address the issue of NN training being slow.)

3. Pulina and Tacchella call their approach counterexample-triggered abstraction-refinement rather than counterexample-guided abstraction-refinement because the refinement isn’t done in a way that’s particular to the counterexample that was found; the existence of the spurious counterexample just indicates that some refinement should occur. How much to refine is determined by an “abstraction parameter” $p$, and, as with choosing too small a learning rate for neural network training, choosing too small a $p$ means you’re going to be waiting around for a while.

4. I’m curious about the relationship between this kind of training and adversarial training.

5. If I understand section 2 of the paper correctly, the time to train the original network was only 0.64 seconds!