The Center for Data Innovation spoke with Brendan Frey, president and chief executive officer of Deep Genomics, a machine learning and genomics research company based in Toronto. Frey discussed how to use machine learning to analyze genetic data, as well as the technical obstacles to advancing personalized medicine.
This interview has been lightly edited
Joshua New: Genomics research produces such large amounts of data that it seems obvious machine learning would be a natural fit to process it all. Why do you think you were among the first to take this approach?
Brendan Frey: It’s not the case that machine learning plus big data equals good results. To get good results, you need to figure out what it is you want to learn and then you need to adapt machine learning algorithms so they work well for your task.
What’s unusual about our approach is that we use machine learning to construct a modular system that mimics how cells read DNA. One module tells us how DNA interacts with proteins. Another module tells us how these interactions cause changes in a crucial cellular process called splicing. Another module tells us if a change in splicing leads to a disease, such as autism or cancer. We have several modules like this and we are building more. Together, they form a sort of computational “engine”, where the combination of modules makes a high-functioning system.
The advantage of this modular, multi-layer engine is that it provides deep insights into why a mutation is problematic and it provides information about how therapies can potentially reverse the effects of mutations.
I think we were among the first to take this modular approach because our team has strong expertise in both biology and machine learning. Put another way, our goal has always been to develop an engine that can provide biological explanations for why a mutation is problematic, not just identify if a mutation is problematic.
New: The first product Deep Genomics created is called SPIDEX. How does it work?
Frey: One of the modules in our engine mimics how splicing occurs within cells. We used this module to build a database called SPIDEX, which is a large index of mutations accompanied by information about how those mutations affect splicing. SPIDEX has information for over 300 million mutations; most of these are not problematic, but a large number are dangerous. Using experimental data and in collaboration with medical researchers, we’ve validated some of the most striking cases in SPIDEX. In particular, we’ve examined mutations that cause certain cancers, spinal muscular atrophy, and autism.
New: It’s interesting that you decided to make SPIDEX free for non-commercial users. Is this simply to further scientific understanding, or are there other motives?
Frey: We could not have developed SPIDEX without the contributions of hundreds of scientists, engineers, and medical researchers. Some of those researchers have asked us to make SPIDEX available, and we owe it to them to give back to the scientific community and enable others to make discoveries. In the long term, these discoveries will help Deep Genomics, but also other researchers and companies.
We’re being optimistic in our motives. In some situations, it is rational to think that there is a finite pie and you’d better grab your slice and not share it with others. In other situations, it is rational to think that you and others each have something novel and valuable to contribute and that if you work together, you can make a huge pie. You’ll each get a much bigger slice. The second scenario is where biotechnology is today. By sharing information, we all benefit.
I should add that we put a lot of thought into our decision to make SPIDEX freely available for non-commercial purposes, and we are certainly being strategic about what we share. Our commercial-grade system makes use of proprietary modules and, most importantly, the proprietary engine that puts it all together.
New: Deep Genomics just launched in July, so it is a pretty young company. Can you tell us about your accomplishments so far?
Frey: We publicly launched in July, but we actually incorporated in October 2014. On the technology front, we have a comprehensive roadmap for developing the different modules that are needed in our engine. Our most recently developed module can determine when and where proteins will interact with DNA and RNA. This was described in the August 2015 issue of Nature Biotechnology and highlighted on the front cover. On the business front, our goal is to make our engine as valuable as possible for genetic testing, pharmaceutical development, and personalized medicine. We are working with leading companies and hospitals to test our hypotheses for how we can add value. This is important for a vision-driven company like ours: learning from potential clients about how we can add value.
New: Ultimately, Deep Genomics is about advancing personalized medicine. What are some of the biggest obstacles for this?
Frey: I think the primary obstacle is technological. There are many computational systems out there, but for the most part, patient-facing experts and pharmaceutical researchers don’t trust them enough to rely on them. I believe this is because most computational systems don’t provide explanations for their answers, so experts can’t figure out whether the computational system had the right reasoning behind its answer. Our engine directly addresses this concern. It isn’t yet complete and it isn’t perfectly accurate, but it’s the first step in this important direction.
Image: TEDx Talks.