Home BlogDataset Improving Document Understanding

Improving Document Understanding

by Morgan Stevens

Google has curated a dataset of business documents, such as receipts, insurance quotes, and financial statements, to help train systems automatically extract data from such documents. The dataset consists of publicly available data, containing 641 invoices and receipts for political advertisements and 1,915 documents containing information about foreign agents registering with the United States government, as well as annotations labeling parts from each document. Researchers can use the dataset to train AI systems to identify and extract information from business documents that feature more than plain text.

Get the data.

Image credit: Flickr user ben_osteen

You may also like

Show Buttons
Hide Buttons