Home BlogDataset Training AI Systems to Code

Training AI Systems to Code

by Morgan Stevens
by
Code

BigCode, a project led by U.S. AI research company Hugging Face and Canadian AI research company ServiceNow Research, has created a dataset of permissively-licensed code from GitHub. The dataset contains over 300 million code files in 30 programming languages, such as Java, Python, and Dockerfiles, as well as information on each file’s repository, size, and content. Researchers can use the dataset to train AI systems that can generate code. 

Get the data.

Image credit: Flickr user CyberHades

You may also like

Show Buttons
Hide Buttons