10 Bits: The Data News Hot List
This week’s list of data news highlights covers November 22-28 and includes articles about an ambitious 85-year climate model and an audio database that could help researchers identify when a person has been drinking from their speech alone.
Genomic research has revealed that there are hundreds of different genetic subtypes of autism. Ongoing studies are attempting to map the landscape of these subgroups using large amounts of genetic data in hopes of developing treatments that are geared toward an individual’s particular subtype. One such project, the Simons Variation in Individual Project, is characterizing about 200 people with variations of a common chromosomal variation that often leads to autism. Because of the increasing ease of conducting large genetic studies, future efforts are targeting thousands of individuals.
Researchers at two universities in Munich have created the first audio database of drunk and sober speech. The researchers took 30 GB of recordings of inebriated individuals speaking ito provide a foundational data set for other researchers attempting to predict whether a person had been drinking from nothing but the sound of their voice. The database’s creators hope it can be integrated into car ignition systems to monitor people who have been caught driving under the influence.
A climate modeling hobbyist in Montreal has created a model to predict where it will snow on Christmas in North America each year through 2100. Because of the inexact nature of climate modeling, the creator expects to have many dramatic misses over the next 85 years. The point of the project, however, was to mine Statistics Canada’s massive open data set of future atmospheric conditions to show long-term trends. The resulting data set shows the encroaching effects of climate change, with snowy areas receding northward each year, as well as a few spots that appear to carry snow long after surrounding areas have thawed.
Medical device company Aerocrine is using cloud-based data processing and the Internet of Things to help doctors diagnose and treat asthma sufferers more effectively. The company makes devices that measure the presence of a particular chemical in patients’ breath that can help determine whether or not the patients’ symptoms result from asthma. The company has equipped its devices with sensors that send data to a cloud-based data processing platform that can help rapidly determine when a hospital or doctor’s office needs a replacement device. In the future, the company hopes to do the same thing with patient-generated data.
CrocBITE is an effort by Australian researchers to catalog worldwide crocodile and alligator attacks in hopes of aiding conservation efforts and safety interventions. The database now contains around 2,700 attack records stretching back to 1864 and paints a unique picture of crocodile populations over time. For example, crocodiles’ average size is increasing around the world, and the number of attacks in Australia in particular has increased over time.
Increasing numbers of state and local government agencies are releasing information that has been frequently requested under freedom of information laws. For example, Gainesville, Florida offers an open data set of city commissioners’ emails, and Wisconsin offers granular tax expenditure data. The most frequently cited barriers to publishing more of this popular state and local data are data reliability and agencies’ unwillingness to release data that could negatively affect them if disclosed publicly.
The Cop Accountability Program, an initiative of the nonprofit Legal Aid Society, will collect information about New York City Police Department (NYPD) officers accused of wrongdoing and make it available to lawyers. The project’s organizers contend that the NYPD does a poor job of tracking informations on officers who have received public complaints. The recently launched database already has 2,750 entries, spanning lawsuits, complaint histories, internal police review board information, and newspaper stories.
Researchers from the UK and Mexico have teamed up to create a database of people who have “disappeared” in Mexico’s ongoing gang and drug wars in recent years. Although over 100,000 people have been killed in Mexico since 2007, there is no centralized database for the many human remains that have been collected by local officials. The “Transformative Citizens-Led Forensics Project” will create DNA data profiles of a minimum of 1,500 living relatives of victims in order to help researchers identify up to 500 sets of remains and better keeping track of the victims of the ongoing violence.
The U.S. Navy and mapping software company Esri are partnering to use geospatial data to improve the armed forces’ battlespace awareness. The Naval Meteorology and Oceanography Command is working to develop applications based on the company’s software that will offer environmental information and decision-making aids to troops in real time. The flagship project of this partnership is the Intelligent Decision Map, which will conduct geospatial and oceanographic analytics and make this analysis available to other naval commands.
10. Tweets Reveal People’s Movements During Disaster
Researchers at Virginia Tech are attempting to use a massive collection of Twitter data to predict how people move around in the immediate aftermath of a disaster such as a hurricane. The researchers mapped New York City individuals’ locations immediately before and after 2012’s Hurricane Sandy to show how people evacuated areas affected by the storm and moved to safety. The researchers hope their work can be expanded to help cities better anticipate how to provision services during disasters.
Photo: Flickr user Tambako