by Michael McLaughlin

Github and Microsoft have released a dataset of search queries for code and annotated results to advance the development of search engines that can locate specific code. The dataset includes 99 search queries and ten likely results per query, which experts annotated for their relevance to the query for a total of 4,000 annotations. The search results contain a total of six million functions from open-source code across six programming languages, including Python and Java. 

