Nvidia and Mozilla have updated a dataset of crowdsourced speech data. The dataset now contains 13,905 hours of speech in 76 languages. The newest version of the dataset features 182,000 unique voices, demographic information of the speaker like age, gender, and accent, and adds 16 new languages: Basaa, Slovak, Northern Kurdish, Bulgarian, Kazakh, Bashkir, Galician, Uyghur, Armenian, Belarusian, Urdu, Guarani, Serbian, Uzbek, Azerbaijani, and Hausa.
Image credit: Flickr user Drestwn