|December 24th, 2015|
Once upon a time, the US Army wanted to use neural networks to automatically detect camouflaged enemy tanks. The researchers trained a neural net on 50 photos of camouflaged tanks in trees, and 50 photos of trees without tanks. Using standard techniques for supervised learning, the researchers trained the neural network to a weighting that correctly loaded the training set—output "yes" for the 50 photos of camouflaged tanks, and output "no" for the 50 photos of forest. This did not ensure, or even imply, that new examples would be classified correctly. The neural network might have "learned" 100 special cases that would not generalize to any new problem. Wisely, the researchers had originally taken 200 photos, 100 photos of tanks and 100 photos of trees. They had used only 50 of each for the training set. The researchers ran the neural network on the remaining 100 photos, and without further training the neural network classified all remaining photos correctly. Success confirmed! The researchers handed the finished work to the Pentagon, which soon handed it back, complaining that in their own tests the neural network did no better than chance at discriminating photos.
It turned out that in the researchers' dataset, photos of camouflaged tanks had been taken on cloudy days, while photos of plain forest had been taken on sunny days. The neural network had learned to distinguish cloudy days from sunny days, instead of distinguishing camouflaged tanks from empty forest.
I was curious about the source. Did this actually happen, or did someone make it up to illustrate a point? I found a 1998 version that says "this story might be apocryphal" and sets it in the 1980s. I also found pedanterrific commenting on LessWrong to say:
It's almost certainly not the actual source of the "parable", or if it is the story was greatly exaggerated in its retelling (admittedly not unlikely), but [November 1993 Fort Carson RSTA Data Collection: Final Report] (pdf) may well be the original study (and is probably the most commonly-reused data set in the field).
Unfortunately, there's a version of the story published in 1992, "What Artificial Experts Can and Cannot Do" (pdf) which means that dataset can't be related:
For an amusing and dramatic case of creative but unintelligent generalization, consider the legend of one of connectionism's first applications. In the early days of the perceptron the army decided to train an artificial neural network to recognize tanks partly hidden behind trees in the woods. They took a number of pictures of a woods without tanks, and then pictures of the same woods with tanks clearly sticking out from behind trees. They then trained a net to discriminate the two classes of pictures. The results were impressive, and the army was even more impressed when it turned out that the net could generalize its knowledge to pictures from each set that had not been used in training the net. Just to make sure that the net had indeed learned to recognize partially hidden tanks, however, the researchers took some more pictures in the same woods and showed them to the trained net. They were shocked and depressed to find that with the new pictures the net totally failed to discriminate between pictures of trees with partially concealed tanks behind them and just plain trees. The mystery was finally solved when someone noticed that the training pictures of the woods without tanks were taken on a cloudy day, whereas those with tanks were taken on a sunny day. The net had learned to recognize and generalize the difference between a woods with and without shadows!
This paper calls it a "legend," though, and doesn't make any attempt at sourcing. Weirdly, Dreyfus also included this nearly word-for-word in his same-year "What Computers Still Can't Do: A Critique of Artificial Reason," except that he dropped the qualifier "the legend of," to just say "consider one of connectionism's first applications." It still doesn't cite anything though.
A few things make me think it's pretty likely this story was made up to illustrate a common pitfall of machine learning:
- Even the earliest sources referring to this call it a "legend" or "possibly apocryphal," even though the time period when it could have happened would have been only 5-10 years before the 1992 paper.
- The story is appealing because you can feel superior to the experts of the day, but that same quality is a reason to be skeptical since, well, they were experts.
- Many people would have been working on a project like this, but even though the story is widely known, no one has come forward saying "hey, let me set the record straight..."
(Dreaded Anomlaly's digging was very helpful here.)
Update 2017-10-01: A helpful commenter found a paper that looks like it could be it: Kanal and Randall, 1964. My "even though the time period when it could have happened would have been only 5-10 years before the 1992 paper" was wrong: people were working in this direction much earlier than I thought they were.