by Morgan Stevens

The Macrocosm Consortium, an international AI research organization, has created a dataset of embeddings for titles and abstracts for every research paper on arXiv, a U.S.-based repository for research papers. Embeddings enable researchers to use search terms that are semantically similar to each other. For example, with embeddings, a search request for a dog will return similar results as a search request for a puppy. Researchers can use this dataset to improve search engines and identify similar research papers.

