10 Bits: The Data News Hot List
This week’s list of data news highlights covers March 22-28 and includes articles on an initiative to evaluate U.S. cities’ open data efforts and an international database to help law enforcement officials investigate child abuse.
The U.S. Open Data Census launched this week, scoring 36 cities in their open data efforts across a variety of variables and data types. The project, a joint initiative of the Open Knowledge Foundation, the Sunlight Foundation, and Code for America, looks at the quality of each city’s data on crime, transit, and emergency management, among other things. Perhaps unsurprisingly, the Census’s top-ranked city was San Francisco, but smaller cities like Louisville and Anchorage also received high marks.
The New York Public Library is partnering with New York-based startup Zola Books to offer personalized book recommendations. The library currently offers a basic recommendation system based on what other people are reading, but the new system will also take into account an individual reader’s own searches and interests. The library hopes the technology, which was originally developed for publishers, will help its patrons more easily explore its large collections.
Law enforcement agencies in several countries may soon use a cloud-based image database to help with child sexual abuse investigations. The database, called Project Vic, will allow police forces to automatically check images seized in raids against images that have already been collected, to identify children that have not been seen before. The database is coordinated by the U.S. Department of Homeland Security and the International Centre for Missing and Exploited Children, and is being tested by forces in the U.K., Canada, New Zealand, and Australia.
The Centers for Disease Control (CDC) has begun to experiment with using shopper card data to inform investigations into public health outbreaks. After ensuring that they have permission to use the data, CDC researchers can match food purchases among infected individuals to help identify what product likely started the outbreak. The CDC is also hoping to begin using real-time hospital antibiotics prescription data to aid in research about the spread of antibiotic-resistant bacteria.
Facebook, LinkedIn, Twitter and Google have teamed up to create a web-scalable version of the venerable MySQL database, which will be available in open source. The database, called WebScaleSQL, will combine MySQL’s rapid operations with the flexibility of a distributed system.
A Washington state initiative to track emergency room patients in a statewide database has helped reduce ER visits by 10 percent, according to a report released last week. The database allows ER doctors to see data on a patient’s past visits, which helps streamline the complicated and costly process of diagnosing and treating ER patients, and helps doctors pass patients on to outpatient, rather than emergency care.
The states of Illinois, Iowa and Tennessee are looking to begin sharing corrections data. With the help of the federal Cross Boundary Corrections Information Exchange Policy Academy, the states hope to one day use the data sharing system to reduce recidivism. The initial plan for the exchange will include establishing access protocols and agreeing on data standards for correctional information.
The U.S. Department of Health and Human Services’ strategy for implementing health information exchanges is lacking in several key respects, according to a Government Accountability Office report released this week. The report highlighted the department’s lack of specific prioritized actions and milestones to address obstacles such as insufficient standards, privacy rules that vary across states, and difficulties in matching patients to records.
Startups are taking novel approaches to the data collection business, starting out by selling smart light fixtures or agricultural drones in the hopes of one day making money from the data those devices collect. Sensity Systems, which specializes in fixtures for LED bulbs, hopes it will one day be able to help customers manage everything from parking spots to air quality levels. PrecisionHawk, which makes agricultural drones, plans to offer a software and data service to farmers, with the aircraft only included as a means to an end for collecting data.
DARPA’s Mining and Understanding Software Enclaves (MUSE) program will leverage large scale data analytics on hundreds of billions of lines of open source code to identify properties and behaviors that make good and bad software. The program, tasked with improving software performance, will output a public database containing insights about the software in the enormous codebase. The program’s administrators hope it will help spur the development of tools to automatically identify and repair program errors, and even tools to create new software based only on a description of desired properties.