Researchers at Google, Stanford University, and Queen Mary University of London have created a dataset to improve translation models. The dataset contains 20 hours of recorded audio featuring English speakers in India, Nigeria, and the United States participating in over 3,600 image or word guessing games, as well as transcriptions of the conversations that contain 200,000 words. Researchers can use the dataset to train translation models to understand different dialects of English.
Image credit: Flickr user Jackson Lanier