Home BlogDataset Documenting Historical Newswire Articles

Documenting Historical Newswire Articles

by Martin Makaryan
by

Researchers at Harvard University have created a dataset that contains almost three million articles from newswire services, which are services that distribute news stories and content to media outlets, published between 1878 and 1977. The researchers built the dataset by extracting roughly 140 million articles from the front pages of local U.S. newspapers and using a deep learning model to analyze their image scans. For each newswire article, the dataset lists the newspapers that covered it, the publication dates, the dispatch location, people mentioned in the text, and the general topic.

Get the data.

Image credit: Annie Spratt

You may also like

Show Buttons
Hide Buttons