The Center for Data Innovation spoke with Dr. Philip Bourne, the associate director for data science at the National Institutes of Health. Dr. Bourne spoke about some of the agency’s recent initiatives and how the culture around sharing data among medical researchers is changing for the better.
Travis Korte: Please introduce yourself and some of your current work.
Philip Bourne: I am the associate director for data science at the National Institutes of Health (NIH), a position that transcends all of the 27 institutes and sectors at NIH. My role is essentially to facilitate movement in biomedical research that hopefully will have an impact in health care, based on the digitization of much of what we do in research. It’s about to become a digital enterprise and the question is: can we accelerate cures? Can we accelerate health and well-being by virtue of the digital mediums that we now find ourselves in? An ever increasing amount of information is coming from many different avenues like mobile devices, electronic health records, and so on. So there are definitely new opportunities. The director of NIH, Francis Collins, recognized this as something that was important. About two and a half years ago, he convened a working group to make recommendations, and I was one of those recommendations. Other recommendations that we have been following include work force training in the areas of data science, developing new strategies, support software, and data, and data policies.
Korte: What happens when you are successful? What does the NIH look like some years down the line once it makes better use of data and data sharing? What sorts of things are possible that weren’t possible before?
Bourne: Well I think we’re heading towards the notion of precision medicine: having precise recommendations for an individual. Right now, the drugs that we prescribe and the treatments that we recommend are based on an en masse approach. In the majority of cases, people take the same (or similar) doses of the same drug for a particular condition. It’s very cookbook. But, we are all very different in many ways: ethnicity, gender, environment, and all sorts of other key factors. Now, we’re heading into a situation where we’ll have a much better description of the individual. So we’ll be able to tailor treatment. What’s called “front line genomics” is an example of this, where the drug dosage and type is very much dependent on your genetic disposition. Everyone does not get the same thing. This is already known to be effective in a number of situations. I think we’re just going to see the expansion of that kind of model. There is this idea that, for the first time in history, healthcare is becoming patient-centric, which I think is a major manifestation of what I’m talking about.
Korte: Tell me about the Big Data to Knowledge initiative. Tell me about some of its goals and where the project currently stands.
Bourne: We just had out first meeting in Pittsburg yesterday where we awarded $32 million dollars. The program has just started. Folks wrote grants about nine months ago, and it’s the first round of awards. 12 centers around the country received awards. Those centers are all working on new applications and research based on large quantities of data coming from a range of different sources. For example, there is a huge body of data that’s now being generated through various smart devices on a daily basis from the mobility of people and all the measurements that go with motion. There are other projects using data from electronic health records in various institutions and genomic information. There is a lot of data! The idea is really to come up with key methodologies for discovery from these disparate types of large data sets. This involves integrating data of different types, ranging from genomic data to what’s observed from the patient themselves. Putting all that together is going to be a large challenge.
Also, we have mandates from the federal government to share data generated by public money. That is driving some policy developments and increasing the amount of data that’s available. There is a notion of what’s called “fair” which relates to this data, which is essentially defined as data that is accessible, interoperable, and reusable. We’re definitely supporting this notion and that’s what we’re trying to do through this program. For example, there’s an award for developing a discovery index, so we can discover data on the internet.
The last part, which is also very important, has to do with training. There are awards to do short term training, helping people to learn just a small amount—enough to make use of techniques of big data, essentially training for graduate students to become experts by starting their career in handling big data. We are also trying to increase diversity of the workforce in this area.
This is the first set of awards, and we’ll be making awards each year for the next several years.
Korte: Let’s talk a little more about open data and data sharing work you’ve been doing. I know NIH just put out a new genomic data publication policy. Were you deeply involved in that? If so, what was the motivation behind it and what are some of things you’re trying to accomplish with it?
Bourne: I wasn’t involved with it but it’s really all part of the memo, essentially a mandate, from the President of the United States regarding data sharing. We were already compliant with much of the memo, in the sense of what we do and the awards we make. We’re looking to go even farther.
With different types of data and particularly with genomic information, there’s the issue of patient privacy, which we must balance with accessibility. That’s really what our policy tried to improve upon. It’s pretty clear what can we done with that kind of data and what can’t be done. More generally, the idea is to increase the data sharing.
Right now, investigators have to write a data management plan for NIH grants of over $500,000 a year in direct funding. Grants less than that do not require a plan. What we foresee is that data sharing plans will be applicable to all grants going forwards. We will monitor to see how data being generated from the awards are being shared because there is now the opportunity to reuse that data.
Reuse of data points to a fundamental shift where we’re seeing more research being done on data that’s already generated versus on data that’s generated by the investigators themselves. Going forward, I think this implies the notion that data will play a more important role in what constitutes scholarship. Scholarship used to be measured by papers, the end product of research. But since data itself has value, there should be credit given to scholars providing well-formed, reusable datasets. I think we’re doing things in the direction of trying to support that idea. The data citation can become an important part of the scholarly entity.
Korte: Could you elaborate a little more on this idea? Tell me more about motivating researchers to prioritize or to think more highly about producing data, because I know that it hasn’t been seen, traditionally, as being as prestigious as writing papers. Now it’s becoming increasingly important. What is the NIH do trying to do to change that culture?
Bourne: Because of competition, I think there is, undoubtedly, a culture of not wanting to share. In particular, competition is ripe right now because money for research is so tight. That creates heightened competitiveness and it is working against the idea of sharing data. On the other hand, the grant policies are pretty descriptive, so once they’re put into place and the grant is awarded investigators will be contributing. There is a balance emerging. Another piece to consider is the idea that the kinds of research we’re doing is becoming increasingly complex. As a result, people with different types of expertise are involved in studies together. Over the years, the number of authors on a paper has gone up, quite significantly. That creates a sense of collaboration and sharing right off the bat.
It’s very much a cultural change and very much dependent on disciplines and sub disciplines. I think there’s always been a culture of sharing in genetics, by virtue of the Human Genome project. In some other fields, not so much. But, I think that slowly but surely they’re all moving in that direction.