Translating Classical Japanese Literature

by Michael McLaughlin December 14, 2018

written by Michael McLaughlin December 14, 2018

A combination of researchers from the Japanese government, academia, research institutes, and Google have published three datasets of Japanese script to preserve Japanese cultural knowledge. The datasets contain nearly 500,000 images of characters from the classical Japanese cursive script Kuzushiji, which most Japanese natives cannot read because the writing style is no longer a part of the official school curriculum. The researchers classified the images by their 4,000 modern equivalent characters. Millions of classical Japanese books use Kuzushiji characters, and this dataset could promote the development of machine learning algorithms that can translate Kuzushiji to the modern Japanese writing system.

Get the data.

Image: mxbi

Michael McLaughlin

Michael McLaughlin is a research analyst at the Center for Data Innovation. He researches and writes about a variety of issues related to information technology and Internet policy, including digital platforms, e-government, and artificial intelligence. Michael graduated from Wake Forest University, where he majored in Communication with Minors in Politics and International Affairs and Journalism. He received his Master’s in Communication at Stanford University, specializing in Data Journalism.

Translating Classical Japanese Literature

Mapping Population Sizes as 3D Mountains

10 Bits: the Data News Hotlist

You may also like