• Posts
  • RSS
  • ◂◂RSS
  • Contact

  • Detecting Tanks

    December 24th, 2015
    machine_learning  [html]
    There's a story that's passed around to illustrate the ways machine learning can pick up on features in your dataset that you didn't expect, and probably gained the most exposure through Yudkowsky using it in "Artificial Intelligence as a Positive and Negative Factor in Global Risk" (pdf, 2008):
    Once upon a time, the US Army wanted to use neural networks to automatically detect camouflaged enemy tanks. The researchers trained a neural net on 50 photos of camouflaged tanks in trees, and 50 photos of trees without tanks. Using standard techniques for supervised learning, the researchers trained the neural network to a weighting that correctly loaded the training set—output "yes" for the 50 photos of camouflaged tanks, and output "no" for the 50 photos of forest. This did not ensure, or even imply, that new examples would be classified correctly. The neural network might have "learned" 100 special cases that would not generalize to any new problem. Wisely, the researchers had originally taken 200 photos, 100 photos of tanks and 100 photos of trees. They had used only 50 of each for the training set. The researchers ran the neural network on the remaining 100 photos, and without further training the neural network classified all remaining photos correctly. Success confirmed! The researchers handed the finished work to the Pentagon, which soon handed it back, complaining that in their own tests the neural network did no better than chance at discriminating photos.

    It turned out that in the researchers' dataset, photos of camouflaged tanks had been taken on cloudy days, while photos of plain forest had been taken on sunny days. The neural network had learned to distinguish cloudy days from sunny days, instead of distinguishing camouflaged tanks from empty forest.

    I was curious about the source. Did this actually happen, or did someone make it up to illustrate a point? I found a 1998 version that says "this story might be apocryphal" and sets it in the 1980s. I also found pedanterrific commenting on LessWrong to say:

    It's almost certainly not the actual source of the "parable", or if it is the story was greatly exaggerated in its retelling (admittedly not unlikely), but [November 1993 Fort Carson RSTA Data Collection: Final Report] (pdf) may well be the original study (and is probably the most commonly-reused data set in the field).

    Unfortunately, there's a version of the story published in 1992, "What Artificial Experts Can and Cannot Do" (pdf) which means that dataset can't be related:

    For an amusing and dramatic case of creative but unintelligent generalization, consider the legend of one of connectionism's first applications. In the early days of the perceptron the army decided to train an artificial neural network to recognize tanks partly hidden behind trees in the woods. They took a number of pictures of a woods without tanks, and then pictures of the same woods with tanks clearly sticking out from behind trees. They then trained a net to discriminate the two classes of pictures. The results were impressive, and the army was even more impressed when it turned out that the net could generalize its knowledge to pictures from each set that had not been used in training the net. Just to make sure that the net had indeed learned to recognize partially hidden tanks, however, the researchers took some more pictures in the same woods and showed them to the trained net. They were shocked and depressed to find that with the new pictures the net totally failed to discriminate between pictures of trees with partially concealed tanks behind them and just plain trees. The mystery was finally solved when someone noticed that the training pictures of the woods without tanks were taken on a cloudy day, whereas those with tanks were taken on a sunny day. The net had learned to recognize and generalize the difference between a woods with and without shadows!

    This paper calls it a "legend," though, and doesn't make any attempt at sourcing. Weirdly, Dreyfus also included this nearly word-for-word in his same-year "What Computers Still Can't Do: A Critique of Artificial Reason," except that he dropped the qualifier "the legend of," to just say "consider one of connectionism's first applications." It still doesn't cite anything though.

    A few things make me think it's pretty likely this story was made up to illustrate a common pitfall of machine learning:

    • Even the earliest sources referring to this call it a "legend" or "possibly apocryphal," even though the time period when it could have happened would have been only 5-10 years before the 1992 paper.
    • The story is appealing because you can feel superior to the experts of the day, but that same quality is a reason to be skeptical since, well, they were experts.
    • Many people would have been working on a project like this, but even though the story is widely known, no one has come forward saying "hey, let me set the record straight..."
    So I think it's very likely, though not certain, that this didn't actually happen.

    (Dreaded Anomlaly's digging was very helpful here.)

    Update 2017-10-01: A helpful commenter found a paper that looks like it could be it: Kanal and Randall, 1964. My "even though the time period when it could have happened would have been only 5-10 years before the 1992 paper" was wrong: people were working in this direction much earlier than I thought they were.

    Comment via: google plus, facebook

    Recent posts on blogs I like:

    More on the Deutschlandtakt

    The Deutschlandtakt plans are out now. They cover investment through 2040, but even beforehand, there’s a plan for something like a national integrated timetable by 2030, with trains connecting the major cities every 30 minutes rather than hourly. But the…

    via Pedestrian Observations July 1, 2020

    How do cars fare in crash tests they're not specifically optimized for?

    Any time you have a benchmark that gets taken seriously, some people will start gaming the benchmark. Some famous examples in computing are the CPU benchmark specfp and video game benchmarks. With specfp, Sun managed to increase its score on 179.art (a su…

    via Posts on Dan Luu June 30, 2020

    Quick note on the name of this blog

    When I was 21 a friend introduced me to a volume of poems by the 14th-century Persian poet Hafiz, translated by Daniel Ladinsky. I loved them, and eventually named this blog for one of my favorite ones. At some point I read more and found that Ladinsky’s …

    via The whole sky June 21, 2020

    more     (via openring)


  • Posts
  • RSS
  • ◂◂RSS
  • Contact