Training Language Translation Systems

by Morgan Stevens

Meta has created the largest open dataset to train AI systems to translate languages in speech and text to date. The dataset contains over 443,000 hours of speech paired with texts and around 29,000 hours of speech paired with speech. The company used the dataset to train an AI system to recognize speech in around 100 languages, translate speech-to-text and text-to-text in nearly 100 languages, and translate speech-to-speech and text-to-speech in nearly 100 input languages and 36 output languages.

