Building a More Precise Image Dataset

by Michael McLaughlin January 17, 2019

written by Michael McLaughlin January 17, 2019

A man parachuting against a blue sky background in a Red Bull outfit.

Chinese technology company Tencent has released Tencent ML-Images, a dataset containing 18 million images across 11,000 categories. The dataset combines images from previously released datasets, removing images labeled in abstract categories like “event” or “summer” and placing other images into more fine grained categories, such as separating images of dogs into categories based on breed. On average, there are nearly 1,450 images per category. In addition, Tencent ML-Images has an average of eight labels per image—many image datasets contain images with only a single label, which can waste useful visual information to train models on because a single label often cannot describe all important objects in an image.

Get the data.

Image: Tencent

Michael McLaughlin

Michael McLaughlin is a research analyst at the Center for Data Innovation. He researches and writes about a variety of issues related to information technology and Internet policy, including digital platforms, e-government, and artificial intelligence. Michael graduated from Wake Forest University, where he majored in Communication with Minors in Politics and International Affairs and Journalism. He received his Master’s in Communication at Stanford University, specializing in Data Journalism.

Building a More Precise Image Dataset

National Survey Finds Few Americans Willing to Pay for Privacy

Visualizing How Delhi’s Air Quality Becomes Hazardous in the Winter

You may also like