Home BlogDataset Building a Dataset of Islamicate Texts

Building a Dataset of Islamicate Texts

by Michael McLaughlin
Arabic text in a medical book.

Researchers from Knowledge, Information, and The Arabic Book (KITAB), a project to create digital tools to analyze Arabic writing, have released a dataset of more than 4,000 Arabic texts to help construct the first machine-readable corpus of premodern Islamicate texts. The texts include work from nearly 2,000 authors and contain more than a billion words combined. Researchers can use this dataset to develop algorithms that can identify relationships between ideas within Arabic texts.

Get the data.

Image: Wellcome Images

You may also like

Show Buttons
Hide Buttons