Home Blog Capturing India’s Linguistic Diversity

Capturing India’s Linguistic Diversity

by Aswin Prabhakar
by

Researchers at the Indian Institute of Technology, Madras, India, have created IndicVoices, a dataset of natural and spontaneous speech that captures the cultural, linguistic, and demographic diversity of India. The dataset contains around 7,300 hours of audio from 16,000 speakers, covering 145 Indian districts and 22 languages. This dataset will enable the development of innovative speech recognition solutions and make essential services more accessible to people across India, particularly in remote areas where language barriers have been a significant hurdle.

Get the data.

Image credits : Unsplash user Rohan Solankurkar

You may also like

Show Buttons
Hide Buttons