This week’s list of data news highlights covers April 16-22, 2016 and includes articles about a major new European quantum computing initiative and a new global database for drug ingredients.
1. Making Progress on Police Data
The White House has announced a series of new commitments and partnerships to support the Police Data Initiative (PDI), launched in May 2015 to build community trust, increase accountability, and increase policing effectiveness with open data. The Department of Justice (DOJ) Office of Community Oriented Policing Services will develop technical-assistance and training programs for participating police departments so they can better work with their own data. Meanwhile, the DOJ Office on Violence Against Women will develop resources to educate police departments about how to publish their data while protecting the personal information of victims. Several private-sector and nonprofit organizations, including the International Association of Chiefs of Police, the open data portal company Socrata, and the Sunlight Foundation have also pledged to develop tools, technical assistance, and training resources to support the PDI.
2. Europe Gets a Quantum Computing Flagship
The European Commission has announced it will launch a quantum computing research initiative to support a wide variety of quantum computing technologies. The initiative will receive €1 billion ($1.13 billion) in funding over the next 10 years from the Commission and national funders, and will be the Commission’s third “flagship” technology research initiative. The initiative will focus on a range of near-to-market quantum technologies, such as communication networks, highly sensitive cameras, and more ambitious futuristic endeavors, such as highly precise sensors that are small enough to fit in a smartphone.
3. Teaching Artificial Intelligence to Spot Cyberattacks
Researchers at the Massachusetts Institute of Technology’s Computer Science and Artificial Intelligence Laboratory have developed an artificial intelligence system called AI2 that relies on machine learning algorithms and human input to detect cyberattacks. The system has successfully detected 86 percent of attacks while ignoring many of the false positives that other algorithm-only methods frequently flag. The researchers trained AI2 on 40 million lines of code per day for three months. The system learned as it analyzed this information to distinguish between legitimate threats and regular, more benign abnormalities such as system stress tests. Whenever AI2 comes across suspicious activity, it flags the situation for a human analyst to investigate further and provide additional information back to AI2 so it can continually refine its anomaly detection.
4. Streamlining the Search for Organ Donors
Medical researchers from India’s National Center for Biological Sciences along with several health research institutions have developed a rapid genetic-sequencing technique to identify markers that could make it much easier to find suitable matches for organ transplants. The technique relies on sequencing human leukocyte antigen (HLA) genes, which relate to immune response, and then identifying six particular genetic markers that can indicate how compatible a donor would be for a recipient and thus lower the risk of tissue rejection. As the researchers sequence more HLA genes, they plan to build a database of markers associated with different ethnic communities in India to help doctors find suitable transplant candidates faster.
5. Analyzing Customer Feedback from Unstructured Data
Computer manufacturer Lenovo is developing machine-learning software capable of analyzing unstructured social media data to understand customer complaints about their products. Lenovo already collects customer feedback from traditional methods such as call centers and surveys, but wants to better understand customers’ concerns when they post video reviews on YouTube or upload pictures of their broken laptops to Instagram. Lenovo trained its system on a data set that includes human analysis of specific examples so its algorithms can learn to identify common factors in customer feedback.
6. Making it Easy to Identify Drug Ingredients Around the World
Drug regulators from the United States, Europe, Canada, France, Germany, Switzerland, and the Netherlands are collaborating to improve global information sharing by developing a database of medicinal product ingredients, called the Global Ingredient Archival System (ginas). Because drug and chemical names can vary between regions, ginas uses a unique-identifier system to serve as a lingua franca for regulators and drug companies around the world. The database will also make it easier for drug companies and health authorities to more closely monitor the global pharmaceutical supply chain and better prepare for outbreaks and other public health emergencies.
7. Piloting Mobile Census Data Collection
The U.S. Census Bureau has announced plans to pilot new methods of data collection in Puerto Rico to prepare for the 2020 Census as part of the bureau’s ongoing effort to reduce costs by leveraging technology. The pilot will test online self-reporting tools as well as a mobile app that tags Census workers’ recordings with geolocation data, then updates and validates address data in the bureau’s records. The pilot will begin on April 1, 2017.
8. Building a Smart City Corridor
Kansas City, Missouri, has partnered with Cisco to create a 2.2-mile smart corridor in the heart of the city by installing connected sensors, cameras, and other technology in municipal infrastructure. The plans include embedding sensors in streets to power apps that inform residents of nearby parking spaces; cameras that analyze traffic flows for smart traffic lights and monitor weather conditions; and smart street lighting that brightens whenever pedestrians are near but dims when the area is empty to conserve energy.
9. Analyzing the Labor Market to Avoid Skills Mismatch
Analytics firm Burning Glass Technologies and the nonprofit Institute for Public Policy Research have created an online tool called “Where the Work Is” that analyzes labor market data for mid-skilled occupations throughout the United Kingdom to reveal skills mismatches—areas where the supply of skilled workers in a particular sector is substantially different than the number of jobs available. The analysis compares the level of employment opportunity to average salaries for jobs broken down by category, such as secretarial jobs or skilled trades, to illustrate which occupations in which areas have the most promising opportunities. The analysis could help people looking for a job in a particular sector identify where in the UK they have the best chances, as well as help businesses make more informed decisions about where to recruit prospective employees.
10. Pooling 2 Million Genomes for Disease Treatment
Pharmaceutical company AstraZeneca has partnered with biotechnology company Human Longevity and several research institutions to build a database of two million people’s genetic sequences and health records over the next 10 years. By creating such a large resource of genetic data, AstraZeneca hopes to identify rare genetic variations related to complicated diseases, such as diabetes and cardiovascular disease, which could lead to new insights into how to develop personalized treatments. AstraZeneca’s database will place a particular focus on genetic data from Finland, which has a highly homogenous population, and consequently higher concentrations of particular genetic variations that are very rare in the rest of the world.
Image: Americasroof.