Testing AI Performance in Multiple Languages

by Martin Makaryan September 26, 2024

written by Martin Makaryan September 26, 2024

OpenAI has released a new multilingual dataset to help AI developers evaluate how well large language models (LLMs) perform in 14 languages, including Arabic, French, German, Spanish, and Simplified Chinese. OpenAI created the new dataset by translating the test set in the Massive Multitask Language Understanding (MMLU), a commonly used benchmark of general knowledge to evaluate LLM performance, using professional human translators. This open-source dataset can make LLMs more accurate and accessible for people who speak a variety of languages.

Get the data.

Image credit: Jonathan Kemper

Martin Makaryan

Martin Makaryan is a research assistant specializing in digital policy. Makaryan is a current master's student at the School of Advanced International Studies (SAIS) at Johns Hopkins University where he specializes in security and strategy, with a focus on the intersection of security, policy, and emerging technologies. He holds a B.A. in Political Science and Global Studies from UCLA and previously worked in government affairs and policy research in California both in the non-profit and government sectors. His academic and professional interests include the impact of innovation and technology on foreign policy and national security policy, as well as automation and AI, cybersecurity, and digital policy.

Testing AI Performance in Multiple Languages

Mapping Grocery Store Ownership in the United States

5 Q’s for Russell D’Sa, Co-founder and CEO of LiveKit

You may also like