The Center for Data Innovation spoke with Kenth Engø-Monsen, senior data scientist at Telenor Research, the research division of Norwegian telecommunications company Telenor, headquartered in Fornebu. Engø-Monsen discussed his recent research showing how mobile phone data could more accurately predict outbreaks of dengue fever than traditional methods.
This interview has been edited.
Joshua New: You made the news recently for your research showing that you could track and even predict outbreaks of dengue fever in Pakistan by using data from call records. How do you ascribe specific call records from such a large population to specific aspects of an outbreak? How do you determine what calls are relevant?
Kenth Engø-Monsen: We used deidentified voice call data records (CDRs) that were available during the period of June through December 2013. “Deidentified” in this case means all personal information is hashed to a meaningless code, so the customer ID becomes something like “HY&65TRFgr.”. These records in Telenor Pakistan’s database typically contain information about who is calling, date and time, and the base station that the caller is attached to when he or she is making the call. Using information about the base stations you connect to when you make calls, it is possible to find a rough estimate of your travel patterns. For example, person X called from Islamabad on Monday at eight in the morning, and called from Lahore that evening. So we know person X traveled from Islamabad to Lahore in that time period, without knowing who person X is. Sum this up over all the customers and you have estimated how people travel within all of Pakistan. Correlating these travel patterns with the actual outbreak of dengue in 2013 enables you to build a model for how the epidemic spread through the country.
The important thing is to use call records from people traveling and moving around the country, in order to assess the human travel patterns. So in the beginning all calls are relevant since they give us place and time. Then we extract human movement from these records, such as the total number of people traveling from place A to place B in a day. This is the information we need for our analysis of how the disease is spreading.
New: Can this method only work for disease and epidemic scenarios? Or are there other useful applications of this, such as disaster response or foreign aid deliver?
Engø-Monsen: Using information about human travel throughout a country can be used for many other scenarios than disease tracking. For example, in the case of flooding, earthquake, or cyclone events, it is important to know where people are traveling due to the potential for natural disasters to displace large amounts of people or damage infrastructure. Knowing where people are is important in order to make sure that relief effort and help are actually sent to where the people are that need it.
New: How accurate was this method compared to traditional epidemic mapping efforts?
Engø-Monsen: Using human mobility pattern as extracted from mobile data turned out to be very successful. The traditional state-of-the-art mapping approach completely failed to predict a major local outbreak of the disease in Mingora, in a region called the Swat Valley. But just by using mobile data, we were able to predict this outbreak would occur at the same time and place.
New: Were there any kinds of data that would have helped this modeling, but that you could not access?
Engø-Monsen: Getting access to all case data of dengue fever would have helped, of course. But one has to be pragmatic in the sense that you use available information and then run analysis in order to test that your results are robust and sound. Even without this data, we believe that the results from our study are robust and reproducible for different diseases and in other countries.
New: You also head Telenor’s Big Data for Social Good program. What are some of the other projects you are working on?
Engø-Monsen: The Big Data for Social Good program has three main focuses—diseases, disaster, and privacy. For diseases, we explore ways to use telecom data in fighting the spread of infectious diseases. For disaster, we investigate population movement and socioeconomic impacts after disaster events. As an example of this, we mapped population flows using CDRs before and after cyclone Mahasen hit Bangladesh in May 2013. And finally, for privacy, we study the tension between personal privacy and the utilization of CDRs for social good. This is important because all involved parties need to better understand what the limits are, which is something nobody else seems to be addressing.