10 Bits: The Data News Hot List
This week’s list of data news highlights covers January 18-24 and includes articles on an initiative to track animals’ movements from space and a mathematician who used data analysis to hack OkCupid and meet his future wife.
The Immunity Project, a medical research startup working on an HIV vaccine, is using machine learning to develop its product, scanning blood samples of rare individuals who are resistant to the virus and pinpointing those aspects of the virus their immune system targets. The founders hope to raise over $400,000 through crowdfunding to conduct more lab tests before going to clinical trials.
Mathematician Chris McKinlay devised an elaborate methodology to game the online dating site OkCupid. Using a variety of data science techniques, including cluster analysis and text mining, he was able to identify specific demographic profiles that interested him and tailor his account to their interests. Using the profile he created with this process, he met his future wife.
Pawn shops in Genesee County, Michigan, may soon have to enter their purchases into an online database to aid police in tracking stolen goods. The database, proposed by area police chiefs, is under consideration by the county. This data is already being collected, but it is not stored in a centralized location; the proposed system would be updated daily and available immediately to local police departments.
The one-way flow of data from research participants to scientists and hospitals must come to an end, argue researchers from Harvard Medical School and Kings College London in an editorial released this week. The researchers argue that such access would reduce data fabrication and other research fraud, and ensure the just, reciprocal relationship between researcher and subject demanded by accepted medical ethics standards.
Ford, MIT and Stanford University are teaming up to research situational awareness systems for future generations of autonomous cars. One approach uses Lidar (laser-radar) devices, cameras and other sensors to produce a 3D representation of the car’s surroundings in real time; with a time series of this data, algorithms can predict the future locations of objects in the car’s environment, and deploy this data to automatically brake for a pedestrian or steer around an obstruction, for example.
Next year, the International Space Station will be fitted with the Icarus wildlife receiver, a radio device that will remotely track the movements of birds, bats, and insects that have been tagged with tiny transmitters. The Icarus project, funded by the German and Russian space agencies and 12 scientific groups, is expected to give scientists advance warning of volcanic eruptions and earthquakes, as well as aid in modeling the spread of animal-borne diseases like bird flu and the West Nile Virus.
The Internet contains an enormous amount of video on sites like YouTube, but it has been difficult to extract useful information from the unstructured video data. However, advances in automated object recognition and other computer vision technologies point the way to an array of new use cases for video data. For one, retail analytics can benefit hugely from video, figuring out when stores are busiest and tracking foot traffic.
A report released by the Defense Science Board last week recommends that the United States deploy large-scale data analysis to track nuclear weapons proliferation. Communications data collected by the intelligence community could be an indispensable resource for identifying the production of nuclear weapons. Moreover, commercial satellite image data could be used to cut remote monitoring costs.
Publishing company HarperCollins is ramping up its data initiatives, collecting more consumer data and conducting analytics in the areas of digital sales and pricing. In addition, the company uses data as a negotiating chip with its authors, using statistics it has collected to persuade authors to adopt certain marketing strategies the company has identified as effective.
After Princeton researchers released an article purporting to show Facebook’s declining user base, data scientists at the company released a withering response debunking the researchers’ methods. In the tongue-in-cheek article, Facebook uses the same simplistic forecasting methods employed by the researchers, “predicting” that by 2021 the Ivy League institution will have no students at all. Using the popularity of keywords on Google is not necessarily a sound way of making predictions, the Facebook authors argue; declining search interest in the term “air,” they note, does not indicate that there will be no more air in the future.