Cybersecurity research firm Endgame has published a dataset called the Endgame Malware BEnchmark for Research (EMBER) to foster the development of AI systems that can detect malware. The dataset includes representations of 1.1 million portable executable files categorized as training or test samples. Each file is dated for when it was discovered in 2017, and Endgame has also provided a benchmark algorithm that can be improved upon to better detect malware by identifying the characteristics of files labeled malicious in the dataset. This kind of training data can be very valuable for cybersecurity, however privacy and intellectual property concerns limit its availability to researchers.