Training Language Translation Systems

by Morgan Stevens August 24, 2023

written by Morgan Stevens August 24, 2023

Meta has created the largest open dataset to train AI systems to translate languages in speech and text to date. The dataset contains over 443,000 hours of speech paired with texts and around 29,000 hours of speech paired with speech. The company used the dataset to train an AI system to recognize speech in around 100 languages, translate speech-to-text and text-to-text in nearly 100 languages, and translate speech-to-speech and text-to-speech in nearly 100 input languages and 36 output languages.

Get the data.

Image credit: Flickr user Eyesplash

Morgan Stevens

Morgan Stevens is a Research Assistant at the Center for Data Innovation. She holds a J.D. from the Sandra Day O'Connor College of Law at Arizona State University and a B.A. in Economics and Government from the University of Texas at Austin.

Training Language Translation Systems

5 Q’s for Ashraf Samhouri, Co-founder and CEO of Activepieces

Visualizing the Arrival of Fall

You may also like