Documenting Historical Newswire Articles

by Martin Makaryan July 8, 2024

written by Martin Makaryan July 8, 2024

Researchers at Harvard University have created a dataset that contains almost three million articles from newswire services, which are services that distribute news stories and content to media outlets, published between 1878 and 1977. The researchers built the dataset by extracting roughly 140 million articles from the front pages of local U.S. newspapers and using a deep learning model to analyze their image scans. For each newswire article, the dataset lists the newspapers that covered it, the publication dates, the dispatch location, people mentioned in the text, and the general topic.

Get the data.

Image credit: Annie Spratt

Martin Makaryan

Martin Makaryan is a research assistant specializing in digital policy. Makaryan is a current master's student at the School of Advanced International Studies (SAIS) at Johns Hopkins University where he specializes in security and strategy, with a focus on the intersection of security, policy, and emerging technologies. He holds a B.A. in Political Science and Global Studies from UCLA and previously worked in government affairs and policy research in California both in the non-profit and government sectors. His academic and professional interests include the impact of innovation and technology on foreign policy and national security policy, as well as automation and AI, cybersecurity, and digital policy.

Documenting Historical Newswire Articles

Policy Highlights, Week of July 1, 2024

5 Q’s For Johnny Fitrakis, CISO of Vega Cloud

You may also like