Facebook AI has released Multilingual LibriSpeech (MLS), a multilingual audio dataset to help improve speech research in AI-powered services, such as voice assistants. MLS expands upon English-only audiobook data from LibriVox to provide more than 50,000 hours of audio across seven languages: German, Dutch, French, Spanish, Italian, Portuguese, and Polish. Additionally, MLS provides data for language-model training sets and pretrained language models that enable researchers to compare existing data on different automatic speech recognition systems.
Image credit: Mahesh Patel