Training AI Systems to Code

by Morgan Stevens November 18, 2022

written by Morgan Stevens November 18, 2022

BigCode, a project led by U.S. AI research company Hugging Face and Canadian AI research company ServiceNow Research, has created a dataset of permissively-licensed code from GitHub. The dataset contains over 300 million code files in 30 programming languages, such as Java, Python, and Dockerfiles, as well as information on each file’s repository, size, and content. Researchers can use the dataset to train AI systems that can generate code.

Get the data.

Image credit: Flickr user CyberHades

Morgan Stevens

Morgan Stevens is a Research Assistant at the Center for Data Innovation. She holds a J.D. from the Sandra Day O'Connor College of Law at Arizona State University and a B.A. in Economics and Government from the University of Texas at Austin.

Training AI Systems to Code

Visualizing Population Demographics

10 Bits: The Data News Hotlist

You may also like