Home BlogDataset Training Speech-Enabled Applications

Training Speech-Enabled Applications

by Morgan Stevens

Nvidia and Mozilla have updated a dataset of crowdsourced speech data. The dataset now contains 13,905 hours of speech in 76 languages. The newest version of the dataset features 182,000 unique voices, demographic information of the speaker like age, gender, and accent, and adds 16 new languages: Basaa, Slovak, Northern Kurdish, Bulgarian, Kazakh, Bashkir, Galician, Uyghur, Armenian, Belarusian, Urdu, Guarani, Serbian, Uzbek, Azerbaijani, and Hausa.

Get the data.

Image credit: Flickr user Drestwn

You may also like

Show Buttons
Hide Buttons