The Center for Data Innovation spoke with Szabolcs Nagy, co-founder and chief executive officer of Turbine, a Hungarian startup using AI to simulate experiments to test the effects of new cancer drugs. Nagy discussed how AI can predict the behavior of cancer cells in response to different drugs and how AI testing can help scientists select the best lab experiments to run.
Nick Wallace: Turbine models the effects of experimental drugs on cancer cells. How does that process work, and what are the advantages over laboratory experiments?
Szabolcs Nagy: The process itself has three parts. One is building a simulated model of human cells. This is one of the key activities that we have in-house. We have biologists and doctors looking through the literature finding and identifying known molecular effects of protein interactions. We have our own methodology for characterizing these in a mathematical format. In the end what you get is what we like to call the “wiring diagram” of human cells. We have a map of how thousands of proteins in your cells interact with each other to drive cellular behavior. For example, to decide whether a cell should divide or multiply, or whether a cell should die.
This general model obviously does not help us understand how specific cancer types will behave. So we add certain parameters. On the one hand we have information about the mutation profile of a particular type of cell and how its protein machinery looks, such as whether any proteins have mutated in a functional way. The other data that we load in is about the concentration levels of these proteins versus each other and versus a baseline of hundreds of different tissue types and cell types in humans. What we have in the end when we add all this to the general model is a simulated version of a particular type of cell. This is the first key pillar.
The second is the software we’ve written, which allows us to model how this complex system will respond to external stimuli or how it would behave. We do this by simulating the different interactions going on between the proteins directly. If we know that protein A acts on protein B and C in some manner, which we’ve defined when we built the model, then we know it will pass on its activity along those interactions. Thousands and thousands of these happen at one time or in one simulation.
That allows us to read out a so-called “steady-state” of this cell. For example, we match a particular type of proteomic or expression profile to, say, aggressive cell division—which is relevant in cancer, obviously—and we match another to, say, apoptotic activity, which is controlled cell death. And there’s a couple of other activities or behaviors that we read out. This is what you get at the end of one simulation, but obviously we don’t run just one simulation, we run millions of these. In each, we either work with different cells, or we apply some kind of biological noise to the cell to make it more realistic, because obviously you never have hundreds of thousands of very similar cells in a tumor, you have different cells.
That gives us a pretty robust idea of how these cells would behave under native conditions, and then we can also map certain interventions onto this complex system. That gives us an insight into how cells behavior would change under therapy in a certain dose. So we might see how many of these cells die from the treatment, and we can see whether others stopped proliferating, or whether a third did not stop proliferating at all and even became more malignant and able to metastasize. So that’s the second pillar of the technology.
The third pillar is where most of the AI comes in. When you run millions of these experiments, you get a really large, potentially terabyte-sized database of what happened. So you need to understand why certain cells responded to the drug better than others, or why other cells became more resistant to the drug, or how you can identify how cells will respond to certain interventions. Or you can even try to answer questions like, “what were the most efficacious therapies or therapy combinations?” that is, which were most effective at killing the cells. If you want to generate this level of insight you need to use learning algorithms on the large simulated dataset you’ve generated. And that’s the third pillar, where we apply AI to reason-out the key drivers of sensitivity or resistance to certain drugs, or which interventions were the most effective and why.
Since all of this is computational, we can basically run another round of experiments at the speed of cloud computing, which is potentially just seconds. Whereas the lab you have to do things like generate the cell colony, apply the drugs, wait for the response, and all of this can take weeks. In a large experiment, it might take months. These lab experiments might generate hundreds or thousands of data points, whereas we can generate hundreds of millions of data points in a day. So obviously, the difference is that in the computational experiments you can try whatever you want, you can do blind experiments and all the things that make sense. You can also try things that you would never try in a lab, because from where you stand it makes no sense, but with computational experiments you can try it anyway very fast and very cheap. In the end, you get a much better sense of what are the best ways to kill certain types of cells, and what are the best patient populations to target my drug at, or what are the best combinations that I should try and in what types of patients.
Obviously, we are not replacing lab experiments. The point is that we run orders of magnitude more experiments than you would ever do in the lab, and provide you the ones most likely to succeed, or uncover the hidden gems you never would have tried.
Wallace: Where did the idea of using AI in this way come from? What is your backstory, and how did you end up founding Turbine?
Nagy: We’re three founders. We have Kristóf Szalay, who is the founder of the AI and the software part, we have Daniel Veres, who is a medical doctor and researched protein interactions, so he’s the father of the cell network, and I came in to try to understand how to use this and how to productize it, how to bring it to market, and what that market would be and what those use cases would be. I’m a person who’s really interested by technology and I like to learn new stuff, and that’s what I started doing when we started on this journey together three years ago. Kris started researching all of this back in 2010, Daniel joined him in 2013, and then we all got together in 2015. The initial impetus for this research was Kris’s personal experience with cancer in his family, and that gave him the drive to understand cancer. Kris and Daniel met in the same research team, which was focusing on network biology at the Semmelweis University in Hungary.
I did two other things before this that were relevant. I helped grow a cryptography startup into the cloud security space. I was the fifth member of that team, which is now 100-strong, and we learned a lot just by building that up, even though it was in a very different field—we had to find a field, find a market, find a use case, and make a company out of it. Then I worked with the “Medical Futurist,” who is an expert called Bertalan Meskó working on where healthcare and technology will meet, and where all of this will go. That gave me the impression that healthcare is a really interesting market where a lot of really interesting technology that you can build to solve large problems. I started looking around for potential projects and I met with the guys in a bar close to the university. We had a really long talk, some of which I did not understand at that point, but it was pretty clear that the technology had a lot of potential to really impact people’s lives. That’s how we got together.
Wallace: How do you know that a simulated experiment is comparable to a practical one? For example, how do you ensure that the drugs and the cancer cells behave in the virtual world as they would in the physical world?
Nagy: There’s two key aspects to that. On the one hand, I mentioned that we scour the literature to build a cell model—what we do is look for known biological facts about protein A’s interaction with protein B. Molecular biology in this respect is actually very well mapped-out, especially in the parts of the cellular mechanisms that are relevant in cancer. But nobody has really integrated all these thousands and thousands of data points into one model of cellular behavior. So what we build into the model is well-corroborated biological fact.
Although we can even add the stuff that’s less well-corroborated and see whether it makes sense and improves predictions. Because the other part is what we call the calibration of the cell model and that has us recreate experiments that you have already run in the lab. So you have a drug, you have a hundred different types of cancer, you’ve working with cell line models, and you have already screened your drug on all of these cell lines. You have applied the drug in a certain dose range to each of the cell lines, and then you’ve read out a so-called IC50 measurement, which is the dose of the drug that killed 50 percent of the cells on a given plate. So you have a hundred of these IC50 measurements.
What we do when we’re calibrating the cell model or the drug is we build the same cell line models. You might provide us with the relevant data on the cell lines, but most of the time these are well-known lab models, so you can find the data for that particular cell line public domain. So we build simulated models for all of those cell lines, and then we take data you have about your drug’s effect, the target that it binds to, and how strongly, and in what dose, and we apply the drug in the same dose range you used in the actual experiments, and we calculate our own IC50 measurements and compare them with the actual lab measurements. When we do this, we get a pretty good measure of how well we’re modelling the drug. If we wildly different IC50s compared to what you observed in the lab, then obviously we’re missing something, so we might add more proteins along the target profile of the drug, or recalibrate some of the network to make sure that we’re modelling the drug effectively. In the end we usually have a pretty close alignment between our predicted IC50s and your actual IC50s.
We’ve done this for about about a hundred different drugs now on almost 200 different cell types. So we know that through this benchmarking work that we model a lot of different drugs across a lot of different cell types accurately.
Wallace: There are lot of complex and costly stages between coming up with a new drugs and treating patients with it. Where in that chain does Turbine sit, and what happens once a drug has cleared Turbine’s testing?
Nagy: If you take a look at the computational biology field or the AI drug discovery field, you’ll see that there’s really two key use cases that most of the people in this space cover. One is taking huge datasets of whatever, say sequence data or therapy response, and then trying to identify novel targets for drugs. The other use case is where you have a target, and you want to find a molecule that binds to it, and you will do some kind of AI-guided screen of the molecular space to find the compounds that you would use in the experiments. These are the first two steps of that really long process of bringing a drug to market.
Once you have an idea of what target to hit and you have a molecule, you start the more complex biological experiments. First you test your drugs in cells, then you test your drugs in animals, and then you have clinical trials with humans. That’s the longest and most expensive part of the drug discovery process. We ourselves sit in the middle of that process, before approval, but when you actually have a compound. We simulate these biological models directly and predict how your drug would act on them. The lab experiments are really long and very costly, so we help you find the right experiments to run with the most chance of success, which cuts a lot of the failed experiment costs you would otherwise have. And as I mentioned, we can also find the hidden gems, the experiments you wouldn’t have run, to increase the potential of your drug. So this answers the first part of your question.
After our work’s done, the drug goes to the validation experiments. For example, in a recent collaboration, we did a screen of about 100,000 experiments. We were looking for combinations of a certain drug and other potential drugs on a particular type of patient. We did a 100,000 screen and then provided a list of 72 experiments that should be attempted, because those were the most promising combinations on the most responsive cells. So the pharma company screened 100. In that case, it turned out that in general, about 70 percent of our findings were validated in the lab, and we uncovered combinations that were hypothesized in literature or had been tried before, but also several that had no corroborating evidence, and that these novel findings were about twice as promising as those that would otherwise have been attempted.
Wallace: How do you think AI is going to impact pharmaceuticals in the long term? For example, are there other stages that you think could be effectively automated effectively in the future?
Nagy: There are really two distinct directions the industry could go in. One you can already see in firms like Berg and Recursion and a couple of other pharma players who have built really strong AI-driven discovery pipelines and discovery processes. You may have the large pharma companies actually not taking the most advantage of these new technologies to speed up their discovery and development pipelines, whereas you have other players who are up and coming, and actually get pretty well funded, who use these more rational processes. Obvious, drug-discovery is a decades-long process usually, shifts are more tectonic, but eventually you may have these new and up-and-coming players replace some of these super-large pharma companies, and you may have much smaller, more nimble players taking the lead, or contributing most of the novel drugs in general.
The other potential direction, and this is where big pharma is trying to go, is where you have the human players take them over, make the most of the technology, and bring to bear AI on a variety of different steps across their discovery and development pipelines. That will help them to keep ahead of the game and keep abreast of these more nimble players, in terms of how efficiently the AI identifies new targets using computational methods, how efficiently they find new molecules, how well they plan their lab, animal, and human experiments, and how well they build those clinical trial selection protocols, and how well informed they are across each of these steps.