Home BlogDataset Improving Language Translation Models

Improving Language Translation Models

by Morgan Stevens

Google has created a multilingual dataset to improve language translation models. It contains 4 billion documents with 100 billion sentences in 419 languages. The dataset improves upon past multilingual datasets as researchers manually audited the text to remove unusable, misaligned, or mislabeled data. 

Get the data.

Image credit: Flickr user Ivan Radic

You may also like

Show Buttons
Hide Buttons