Reddit users have created a machine-readable data set of over 200,000 Jeopardy questions. The data, which the dataset’s creators scraped from fan-created question repository J!-Archive, contains each question’s answer, along with category, dollar value, air date, and other data. One analysis using the data set showed how diverse Jeopardy’s question categories are: the 100 most commonly used categories span only 11 percent of total questions asked. The creator of that analysis noted that this extreme amount of variation “has given me a lot of sympathy for IBM’s Jeopardy!-playing robot Watson.”
Photo: Queen’s University