Friday, June 9, 2017

Still a Few Bugs In the System: "DeepMind Shows AI Has Trouble Seeing Homer Simpson's Actions"

From IEEE Spectrum:
The best artificial intelligence still has trouble visually recognizing many of Homer Simpson’s favorite behaviors such as drinking beer, eating chips, eating doughnuts, yawning, and the occasional face-plant. Those findings from DeepMind, the pioneering London-based AI lab, also suggest the motive behind why DeepMind has created a huge new dataset of YouTube clips to help train AI on identifying human actions in videos that go well beyond “Mmm, doughnuts” or “Doh!”

The most popular AI used by Google, Facebook, Amazon, and other companies beyond Silicon Valley is based on deep learning algorithms that can learn to identify patterns in huge amounts of data. Over time, such algorithms can become much better at a wide variety of tasks such as translating between English and Chinese for Google Translate or automatically recognizing the faces of friends in Facebook photos. But even the most finely tuned deep learning relies on having lots of quality data to learn from. To help improve AI’s capability to recognize human actions in motion, DeepMind has unveiled its Kinetics dataset consisting of 300,000 video clips and 400 human action classes.

“AI systems are now very good at recognizing objects in images, but still have trouble making sense of videos,” says a DeepMind spokesperson. “One of the main reasons for this is that the research community has so far lacked a large, high-quality video dataset.”

DeepMind enlisted the help of online workers through Amazon’s Mechanical Turk service to help correctly identify and label the actions in thousands of YouTube clips. Each of the 400 human action classes in the Kinetics dataset has at least 400 video clips, with each clip lasting around 10 seconds and taken from separate YouTube videos. More details can be found in a DeepMind paper on the arXiv preprint server.

The new Kinetics dataset seems likely to represent a new benchmark for training datasets intended to improve AI computer vision for video. It has far more video clips and action classes than the HMDB-51 and UCF-101 datasets that previously formed the benchmarks for the research community. DeepMind also made a point of ensuring it had a diverse dataset—one that did not include multiple clips from the same YouTube videos....

The first part of the headline is a ripoff of a 2008 headline at Wired:

Aug. 7, 1944: Still a Few Bugs in the System

Computer honchos work on a section of Harvard's Mark I in 1944. The whole apparatus measured 55 feet long.

Courtesy Computer History Museum __1944: __Harvard and IBM dedicate the Mark I computer. Also known as the IBM Automatic Sequence Controlled Calculator, or ASCC, the pioneering computer was notable for producing reliable results and its ability to run 24/7....
Which was itself stolen from a computer reference in Doonesbury, 1970: