Home BlogDataset Making Scholarly Articles More Accessible for Machine Learning

Making Scholarly Articles More Accessible for Machine Learning

by Cassidy Chansirik

ArXiv, an open-access digital repository of scholarly articles maintained by Cornell University in New York, made available all of its 1.7 million research articles on Kaggle, a public online platform for machine learning training datasets. For each article, the dataset includes information such as the author, article title, category, abstract, citations, as well as a link to the full-text PDF.  Researchers can more easily use the data from arXiv articles to perform trend analysis, create algorithms that group scholarly papers by topic, and improve search engines for scholarly papers.  

Get the data. 

Image: Susan Yin

You may also like

Show Buttons
Hide Buttons