10 Bits: The Data News Hot List
This week’s list of data news highlights covers January 4-10 and includes articles on the presence of Internet of Things technologies at the Consumer Electronics Show and a $1 billion investment from IBM in its artificial intelligence technology Watson.
The annual Consumer Electronics Show in Las Vegas had a broad array of Internet of Things devices on display this year, including an Internet-enabled tennis racket and sensors to collect and upload washing machine data. Given the steadily declining cost of sensors, it has become easier than ever before to collect data on a range of personal and household activities, and industry analysts expect Internet of Things devices to grow dramatically in popularity in 2014.
IBM announced that it would invest more than $1 billion to develop applications and seek out new markets for its Watson artificial intelligence technology. Two fields where IBM is hoping to deploy the technology are financial services and healthcare: DBS, one of Asia’s largest banks, as well as several major hospitals, already use Watson. The company hopes its technology will also be able to help online shoppers and travelers get better recommendations.
San Francisco’s “House Facts” data set offers building owners’ names, housing code violation data, property assessments and other information that gives renters and housing inspectors alike insight into city housing conditions. The data set, created by the San Francisco Department of Public Health, has been used to visualize housing violation “hot spots” around the city, among other things. The city hopes opening up the data set will lead to better housing code compliance and reduced medical costs for preventable health problems associated with poor living conditions.
This week, Google revealed the methods it used to automatically identify house numbers from its enormous collection of Street View data. Using a neural network, Google attempts to attempts to identify the entire number all at once, without splitting it into individual digits; the creators then compare the number to a database of 200,000 other known numbers, achieving around 98 percent accuracy. The algorithm can transcribe all the house numbers in France in less than an hour using Google’s computing infrastructure.
Israeli startup Windward uses satellite imaging and analytics to identify potential illegal activity in the world’s oceans in real-time. The algorithms Windward created were initially designed to detect patterns associated with illegal fishing, but they have been expanded to detect pirate activity around the Horn of Africa and other illegal activity. The company has set its sights on the oil and gas industry as primary clients.
Startups are using open government data released by cities and states to create economic value. One such startup, Porch.com, uses data on work permits, professional licenses and other home construction information released by the City of Seattle to power a searchable database for industry professionals and homeowners to compare costs for projects in their neighborhood. Seattle has put more than 200 data sets online since 2010 and plans to post at least 75 more this year.
In December 2013, the U.S. Food and Drug Administration (FDA) announced that it would use data to assess prescription drug risks even after the drugs have completed clinical trials and gone to market. This is useful because some drugs may carry rare but serious side effects that may not occur frequently enough to be adequately catalogued during trials. To accomplish this analysis, the FDA hopes to develop a database of de-identified electronic health records on a minimum of 10 million patients to track side effects over time.
The Lawrence Livermore National Laboratory is collaborating with enterprise software firm Ayasdi Inc. to conduct research into cutting-edge data analysis techniques, including topological data analysis Topological data analysis, which is generally deployed on data sets that are too large or complex for traditional analysis, treats data as if it occupied physical space and uses geometry and other mathematics to analyze the resulting structures. This kind of analysis, while not always intuitive for human operators, has been piloted for applications in the pharmaceutical, energy and financial sectors.
Gordon Moore, co-founder of Intel and creator of Moore’s Law, will offer $1.5 million grants to 15 interdisciplinary scientists to promote research into data analysis for scientific discovery. The Gordon and Betty Moore Foundation notes that the grant-funded research will be inherently multidisciplinary, combining the natural sciences with computational and statistical methodology.
Lending companies are heading to social media sites for data on loan applicant. A lender, for example, may consider whether a small business draws many negative eBay or Yelp reviews to determine its creditworthiness. The Federal Trade Commission has been looking into a regulatory framework for these practices. Under the Fair Credit Reporting Act, lending companies can be required to verify borrowers’ credit history, but those that use social media in lending decisions are not yet required to verify that information.
Photo: Flickr user David Berkowitz