Researchers at Harvard University have created a dataset that contains almost three million articles from newswire services, which are services that distribute news stories and content to media outlets, published between 1878 and 1977. The researchers built the dataset by extracting roughly 140 million articles from the front pages of local U.S. newspapers and using a deep learning model to analyze their image scans. For each newswire article, the dataset lists the newspapers that covered it, the publication dates, the dispatch location, people mentioned in the text, and the general topic.
Image credit: Annie Spratt