OpenAI has released a new multilingual dataset to help AI developers evaluate how well large language models (LLMs) perform in 14 languages, including Arabic, French, German, Spanish, and Simplified Chinese. OpenAI created the new dataset by translating the test set in the Massive Multitask Language Understanding (MMLU), a commonly used benchmark of general knowledge to evaluate LLM performance, using professional human translators. This open-source dataset can make LLMs more accurate and accessible for people who speak a variety of languages.
Image credit: Jonathan Kemper