10 Bits: The Data News Hot List
This week’s list of data news highlights covers May 17-23 and includes articles about an English city that plans to deploy an ambitious Internet of Things network and a database that will help zoos breed rare animals.
New York City’s Health Department is piloting using Yelp review data to track health code violations in the city. Using software developed at Columbia University, the agency mined around 300,000 restaurant reviews from July 2012 to March 2013, looking for reviews mentioning diarrhea or vomiting after a meal. Ultimately, only three restaurants have been investigated as part of the pilot program, but each one was found to have serious health code violations.
The city of Milton Keynes, England, plans to install a citywide public network for Internet of Things devices. The network, which will span 15 base stations across the city, will be able to pick up signals from sensors that detect when parking spaces are filled and when garbage bins are overflowing. After a trial period, supported devices will extend to smart rodent traps, soap dispensers, water meters, and central heating systems. The city will use the data to improve its services and data will also be available publicly on the city’s website.
About 150 zoos and aquariums across Japan plan to centralize information about their animals in a single database to help breeders of rare species identify and borrow animals from other sites. Breeders will be able to input information on an animal and quickly find potential mates, expediting the critical and often time-sensitive process of breeding rare animals. The database, which will launch in June 2014 and include information on about 60,000 species, is modeled after a similar animal data sharing collaboration between Australia, New Zealand, and South Pacific island nations.
Whistle, a company that makes wearable devices for dogs, hopes its product will help promote better canine health. Dog owners can set a daily goal for walking and exercise, adjusting how often they walk their dog based on how active it is when they are not around. Owners also have the option of logging food and medication intake, as well as demographic information about the dog, the combination of which makes for a powerful canine health database. Researchers are already using the data to study the effects owners and their pets have on one another’s health.
A 15-year-old Boston high school student won first place at Intel’s International Science and Engineering Fair earlier this month for a machine learning software tool that analyzes mutations of a gene linked with breast cancer. Using open scientific data, Nathan Han trained an algorithm to differentiate between disease-causing and benign mutations of the gene and exhibited an 81 percent accuracy rate in identifying cancer threats.
Human-computer interaction researchers are developing systems to integrate smell and taste into devices, including LCD advertising screens that incorporate products’ smells and a smartphone accessory that signals alarms via scent. These and other efforts are part of a larger trend toward “data visceralization,” the act of communicating data to a user through all sensory channels, not just the visual and sonic. Wearable devices will likely accelerate efforts to develop technologies that incorporate data visceralization, as they permit individuals to interact with information without relying on a visual display.
Accommodation renting service Airbnb is preparing for its initial public offering. An important part of the company’s value is using data to drive crucial business decisions. For example, the company initially specified a simple set of rules governing search results, but it found that it could increase customer bookings and satisfaction if it changed search results based on past user data. In another case, the company’s data science team redesigned its site for users in Asian countries based on data that certain aspects of the interface led to a poor turnover rate in those countries.
The U.S. Navy is facing a flood of data being collected by unmanned aerial vehicles and should adopt a cloud storage system, according to a report released by the RAND Corporation earlier this month. The report warns that the Navy could reach a point as early as 2016 when the data will become so voluminous that intelligence analysts will be unable to effectively use current procedures. The report’s authors recommend the Navy institute a comprehensive metadata tagging system so that analysts querying the envisioned cloud-based database need not search through raw data, only metadata.
The U.S. Defense Logistics Agency wants to ensure that counterfeit parts do not make it into military supply chains by using a digital watermark containing data about parts’ provenance. The agency has issued a requirement for digital authentication marks and procurement guidelines for parts with these features. One approach to storing this information is through custom-built DNA sequences that encode data and can be stored on paint, metal, and other surfaces.
The U.S. Postal Service has developed an in-house analytics tool to pinpoint targets for potential fraud investigations. The tool, known as the Risk Assessment Data Repository, merges data from a variety of internal sources and attempts to automatically identify potentially fraudulent actions. So far, the agency has deployed the tool on worker compensation and health care fraud.