Fooling AI


Let’s take a trip

We start by taking a short trip to Sydney to review one of the foremost conferences on machine learning, the International Conference on Machine Learning (Don’t worry, things won’t get technical). While the conference certainly lacks a creative name, we turn our focus to an idea that won “Best Paper” at the conference, titled “Understanding Black-box Predictions via Influence Functions.” Essentially, the authors developed models and experimented with methods to determine which data points were most “influential” to making which predictions. One of the problems with machine learning algorithms right now is as these systems get more complicated, the ways in which the machine arrived at a solution become more uninterpretable. To counter this trend, the authors built a model to predict if an image contained either a dog or a fish, then made predictions using this model and tried to find the training examples which were most responsible for making a dog a dog.

A good analogy would be, let’s say you have watched all the NFL games on TV and there is one game remaining, the Super Bowl. Your office has a pool to see who can come closest to the exact score, and you need to make a prediction. The two teams in the Super Bowl haven’t played each other before, so you have to extrapolate. In making your prediction, there would probably be games that either team played that stand out in your mind of why one team would beat the other and these games would heavily influence your decision. This is what the authors are doing, except with dogs and fish.

Except, well…

A very interesting thing the authors did was if you know what example lead to the biggest effect on your predictions, if you go back and change that example slightly and retrain the model, it produces a different result. Getting back to our football analogy, it would be like if we went back in time and somehow changed the outcome of the game, maybe by wearing a very bright shirt that caused the Quarterback to throw 6 interceptions and lose the game. Knowing this, you would probably change your prediction. But if I went on to the field and cut a blade of grass to be a little shorter than every other blade, which had no effect on the game, you probably wouldn’t change your decision.

Yet, this blade of grass analogy completely fools a computer. Instead of making a drastic change, the authors make a change so small, which is not noticeable by humans, and the result completely changes. The model goes from predicting a dog is a dog with 97% confidence, to predicting a dog is a fish with 97% accuracy.

That’s really odd

Yes! It is! In fact, there is an entire active area of AI research right now that involves “attacking” different machine learning methods. When I say “attacking”, I mean making changes to examples that completely screw everything up. An often-cited example is from a paper written by Ian Goodfellow, where after developing a state of the art image recognition algorithm, he introduces a slight change, which is not noticeable to the human eye and suddenly a panda becomes a gibbon:

In that paper, Goodfellow essentially says that machine learning models are not really learning what a panda is, but learning another way of abstracting it

“classifiers based on modern machine learning techniques, even those that obtain excellent performance on the test set, are not learning the true underlying concepts that determine the correct output label. Instead, these algorithms have built a Potemkin village that works well on naturally occurring data, but is exposed as a fake when one visits points in space that do not have high probability in the data distribution.”

Another interesting example is researchers printed out an image of a washer/dryer then took a photo of it and asked an algorithm to identify what it was. The researchers move the picture ever so slightly and the prediction completely changes from a washer to a safe.

Yikes, and AI is making critical decisions for us every day?

Of course, there are a lot of examples (medical diagnosis, mortgage approvals, job matching) were AI is starting to affect our daily lives and one must worry about AI getting it wrong. A pretty relevant example is autonomous cars. a company focusing on autonomous driving, was profiled on its efforts and challenges to solving autonomous driving. In the profile, it talks about overpasses, and how originally their system thought they were obstacles because of the shadow they create. If you have a car stopping short on a highway right before an overpass, that’s a big problem.

A bigger problem is the Tesla autopilot accident in 2016, which involved a Tesla merging into a truck, killing the driver. Shortly after the crash, Tesla had this to say

“What we know is that the vehicle was on a divided highway with Autopilot engaged when a tractor trailer drove across the highway perpendicular to the Model S. Neither Autopilot nor the driver noticed the white side of the tractor trailer against a brightly lit sky, so the brake was not applied. “

After the NSTB investigated the incident, the NSTB concluded the Tesla was not at fault However, shortly after the crash, Tesla upgraded their autopilot and TechCrunch said “Version 8.0 of Autopilot includes huge changes to how the object detection system works, using radar to help detect things that might not get picked up by the camera vision sensors on the vehicle.” While Jalopnik surmised it could be a blind spot in the Autopilot, don’t rule out the possibility that the system misclassified a truck for sunny, open space.

If AI is going to make more and more crucial decisions, we should hope that researchers are stress testing their models more and more.

Facebook Comments