Facebook has released CCMatrix, a dataset that contains 4.5 billion parallel sentences—sentences in one language and their corresponding translations in other languages. The dataset comprises parallel sentences for more than 500 language pairs. CCMatrix can help advance the development of translation systems, particularly for languages for which there is relatively little digitized material.
Image: PxHere