This week’s list of data news highlights covers July 18-24, 2015 and includes articles about how fisherman are lending their boats for citizen science and artificial intelligence software that can identify sketches faster than humans.
AncestryDNA, the personal genetics branch of genealogical service Ancestry, has partnered with Google’s Calico, a biotechnology company researching human longevity. Calico will study AncestryDNA’s database of its customers genetic information to identify genes that influence longevity and spur the development of lifespan-extending drugs. AncestryDNA’s database holds over one million genetic samples from its customers, 90 percent of which have agreed to contribute this data for research purposes.
2. Fishing for Environmental Data
Fishermen in the United Kingdom have converted their boats into data collection tools for the Secchi Disk Study, which aims to provide scientists with better data on the ocean plankton that support global food chains. Fisherman drop small white discs attached to a measuring tape into the ocean, record the distance at which the disc is no longer visible, and upload this data into a smartphone app. This project, which started in 2014, allows researchers to monitor the concentration of plankton based on water visibility at various points in the ocean—a task fisherman are particularly well suited for since they regularly visit the same locations. The relationship is mutually beneficial, as better monitoring of plankton abundance can help support ocean health and ensure that fishermen have reliable yields.
3. Supporting Scientific Research with Open Data Apps
The National Institute of Standards and Technology (NIST) has launched a competition challenging participants to develop mobile applications that make better use of the agency’s Standard Reference Data—quantitative information about physical science. The challenge focuses on improving the accessibility and navigability of six data sets related to chemistry and physics, and entrants are eligible for $45,000 in prize money. NIST makes over 100 types of Standard Reference Data publicly available and wants to encourage the development of applications that use this data to lower barriers to taxpayer-funded research.
4. Taking Self-Driving Cars for a Test Drive
The University of Michigan has opened a 32-acre lab called Mcity that simulates real world environments for researchers testing self-driving cars. The lab, designed as a miniature city, is networked with sensors to collect data on cars as they traverse the course and respond to potential obstacles such as faded lane markings or graffiti-covered street signs. The environment is also designed to test the reliability and accuracy of sensors on the cars themselves, with large metal structures, tree cover, and buildings of all sizes—all of which could affect the performance of the radar, GPS, and image sensors that self-driving cars rely on.
5. Artificial Intelligence Beats Humans at Pictionary
Researchers from Queen Mary University in London have developed artificial intelligence software called Sketch-a-Net that can correctly identify simple drawings at a rate higher than humans. Other less successful drawing-recognition algorithms have relied on analyzing the finished work as a whole, but Sketch-A-Net also incorporates data on how the drawing is constructed, which can provide clues about what the final result will be. The researchers imagine their software could eventually help match police sketches to mug shots better than humans can.
6. Standardizing Open Permit Data
A group of real estate and civic technology companies has created the Building and Land Development Specification (BLDS), a standardized data format to help make open data on building and construction permits more usable. City and county governments that issue these permits can use the BLDS to better support citizens and developers that rely on permit data in the same way municipal governments use open standards for restaurant inspection or transit data. Boston, Seattle, and San Diego County are among the first to adopt the BLDS.
7. Genomic Data Can Determine a Patient’s’ Cancer Prognosis
Researchers at Stanford University’s School of Medicine have identified a link between specific genetic activity and immune system cells with a patient’s likelihood of surviving cancer. The researchers examined data from nearly 18,000 cancer patients to identify this correlation, which they expect could guide the development of new cancer therapies and influence treatment decisions.
8. Building a Biometric Passport
The Customs and Border Protection (CBP) agency has launched a pilot testing biometric data on foreign travelers as they enter and leave the United States. CBP agents will collect fingerprints and passport information via handheld devices to help conduct law enforcement checks on people leaving the country. The pilot will run through June 2016 in several major U.S. airports as part of a CBP initiative to integrate innovative technology in national security and trade-promoting efforts.
9, Keeping Computers Safe with Machine Learning
Cybersecurity researchers at Cylance, an antivirus software company, have developed a machine learning tool that can identify malware code in 100 milliseconds or less. Traditional malware-detection methods involve testing suspicious code to see if it is malicious, but this gives the software an opportunity to act and potentially damage a computer or network. Cylance trained their new algorithms to recognize malware characteristics in code without testing it, before any such actions could occur. Given the amount of new malware generated daily, automated learning approaches are necessary for antivirus software to stay relevant and effective.
10. Building a Bigger Health Data Network
The Patient-Centered Outcomes Research Institute (PCORI), a non-governmental health care research organization, will invest $142.5 million in expanding its PCORnet, a large interconnected repository of health data available to researchers. The investment will add seven additional individual health networks to PCORnet, bringing the total number of participating networks to 34, and will give researchers a substantially expanded pool of minable data.
Image: Joachim Müllerchen.