Automatic face-tracking in television, film and other complex video has emerged in recent years as an important topic in computer vision research. It underlies numerous applications, such as finding an individual within a collection of security footage, or tracking changes in viewer response while a certain TV character is on screen. But creators of face-tracking algorithms often have a hard time assessing the performance of their software because they lack an accurate benchmark dataset for comparison. Computer scientists from France’s Rennes Research and Innovation Center hope to solve that problem with Hannah, a face-tracking dataset extracted manually from over 150,000 frames in the 1986 Woody Allen film, Hannah and Her Sisters.
Faces are tagged from the moment they enter the frame until the moment they leave, except when more than half of their face is occluded. Each of the film’s 52 named characters is tagged along with the face tracks. Audio annotation, which is considerably easier to collect, was added to the dataset as well.