Improving Language Translation Models

by Morgan Stevens

Google has created a multilingual dataset to improve language translation models. It contains 4 billion documents with 100 billion sentences in 419 languages. The dataset improves upon past multilingual datasets as researchers manually audited the text to remove unusable, misaligned, or mislabeled data. 

