Home BlogDataset An Open Dataset for Multilingual Speech Research 

An Open Dataset for Multilingual Speech Research 

by Cassidy Chansirik
English alphabet.

Facebook AI has released Multilingual LibriSpeech (MLS), a multilingual audio dataset to help improve speech research in AI-powered services, such as voice assistants. MLS expands upon English-only audiobook data from LibriVox to provide more than 50,000 hours of audio across seven languages: German, Dutch, French, Spanish, Italian, Portuguese, and Polish. Additionally, MLS provides data for language-model training sets and pretrained language models that enable researchers to compare existing data on different automatic speech recognition systems. 

Get the data.

Image credit: Mahesh Patel 

You may also like

Show Buttons
Hide Buttons