10 Bits: The Data News Hot List
This week’s list of data news highlights covers November 23-29 and includes articles on large-scale data analysis in health care and the obstacles to open government in Africa.
This week, the U.S. Food and Drug Administration (FDA) sent a letter ordering personal genomics company 23andMe to halt sales of its genetic tests. The FDA argues that 23andMe’s tests are designed to diagnose or prevent diseases, and thus fall under its regulatory framework for medical devices. 23andMe has not received regulatory approval for its tests, which the FDA says could produce false positives and lead patients to undertake unjustifiably risky procedures. A spokesperson from the National Institutes of Health said that while genetic information has great potential to improve health, testing and verification must come first.
Large-scale data analysis promises big changes for several areas within health care. On the insurance side, data analysis firm BeyondCore recently used big “data” mining to determine that young and healthy patients will likely cost insurance companies more than expected due to enrollment in mental health care under the Affordable Care Act. In healthcare research, a “health cloud” initiative launched last week to make de-identified treatment data from the FDA and the Centers for Medicare and Medicaid Services available to universities, pharmacologists and hospital systems.
The Global Open Data for Agriculture and Nutrition Initiative, which launched this fall, aims to improve the quality of global agricultural and nutritional data, with an eye toward improving food security in the developing world. The initiative demonstrates the growing interest in putting agriculture data to better use in the development sector. One example of an application using such data is the NextGen Cassava Project, which uses statistical modeling to predict cassava growth and identify the highest-yield varieties. Another recent example is the Plantwise Knowledge Bank, a repository of open-access information on pest mitigation and crop diseases.
Only a relatively small percentage of financial services companies have leveraged “big data” extensively to detect and crack down on financial fraud. In particular, smaller banks, which often lack sophisticated analytics teams, have an opportunity to improve their fraud analytics capacities using new and easy-to-use analytics technologies. Even simple data analysis, such as tracking how fast users enter their passwords or mapping the location where a user usually signs into their bank account, can serve as early warnings for certain common types of fraud.
The recent Open Government Partnership summit in London shined a light on a novel challenge for open government advocates: Africa. In a continent where few countries’ governments even meet the criteria for joining the partnership, there is widespread skepticism that government officials will participate in the partnership in good faith, instead using their membership as a veil of legitimacy behind which to continue financial mismanagement and cronyism. Moreover, many citizens in poorer African countries do not have access to the internet, meaning that open data advocates may need to turn to FM radio and SMS to communicate newly-released government information.
With all the attention paid in the United States to the new health insurance exchanges cropping up under the Affordable Care Act, it is no wonder that hackers have tried, and in some cases succeeded, to break into the systems. But one way health care organizations can defend their systems is through the use of data-driven security measures. These include tracking individual user behaviors and network traffic profiles to identify deviations in real-time.
The Large Hadron Collider at the European Organization for Nuclear Research (CERN) collects around 25 petabytes of particle physics data and other information each year. While CERN has created an estimable infrastructure to store and process the information, one remaining challenge is to preserve knowledge as the scientists who created the data retire, move on, or forget details. One international group, the Data Preservation in High Energy Physics, has set out to mitigate this problem by convening particle physics labs, including those at CERN, to develop and promulgate best practices for data preservation.
The British Broadcasting Corporation (BBC) signed an agreement this week with the Open Data Institute and four other nonprofits to promote open data standards and publish its own open data where possible. The BBC, which has already implemented some open data initiatives, such as its Olympics Data Service, hopes to expand its efforts and serve as an exemplar for other organizations in journalism and elsewhere. Among other projects, the news organization is developing a linked data system to extract concepts in articles and aggregate articles around similar concepts.
Despite advances in search technologies, finding a specific video on the internet remains relatively difficult. One startup, the Santa Monica-based Structured Data Intelligence, hopes to contend with the problem by analyzing and categorizing video metadata. In a presentation last week, the company’s CEO detailed the Video Genome Project, a massive web scraping effort to collect and structure data about films.
The imminent shortage of data science talent is well-documented, but many programs proposed to close the gap may not intervene early enough in the education process. Fostering analytical thinking as early as middle school could help prepare the next generation of workers. Early training could help put students on track not only to become data scientists, but also data-savvy managers and communicators.