Researchers at the Massachusetts Institute of Technology (MIT) have published the first database of annotated English sentences written by non-native English speakers to help train natural language processing systems. English is the most-used language on the Internet, but the majority of people who use English online are non-native speakers, which can make it difficult for language processing algorithms to analyze large amounts of text with the imperfect grammatical quirks non-native speakers often exhibit. The database consists of 5,124 English sentences written by native speakers of 10 different languages, all of which contain at least one grammatical error, along with annotation about the parts of speech used and the relationship between different words, and a corrected version of each sentence for comparison.
Image: WokinghamLibraries.