The Center for Data Innovation spoke with Anna Huyghues-Despointes, head of strategy and marketing of Owkin, a company headquartered in New York that coordinates a decentralized health research ecosystem. Huyghues-Despointes discussed how Owkin enables better collaboration to accelerate clinical research and improve patient outcomes.
Eline Chivot: What has inspired you to join Owkin? What challenge does Owkin aim to address in the field of healthcare?
Anna Huyghues-Despointes: My background is in data science and I was previously developing trading models for investment funds. I wanted to stay in the field of big data and AI but do it for the greater good. I joined Owkin’s mission to apply AI to medical research to improve patients’ outcomes. And I am very excited about what we have achieved so far and what we are looking forward to.
As we saw with the recent COVID-19 pandemic, speed and collaboration in clinical research have never been more important. The challenge is how to empower researchers to draw insights from millions of patients’ multimodal data points without compromising data privacy and protection. At Owkin, we believe the solution is to bring academic and pharmaceutical industry researchers together in a federated research environment. Our proprietary infrastructure and AI technologies enable researchers to train machine learning models on distributed data at scale across multiple medical institutions without centralizing the data.
Chivot: Owkin uses a type of machine learning called federated learning. What does that mean, and what does this technique allow to do that others cannot?
Huyghues-Despointes: Access to large-scale, heterogeneous, and curated medical data is a major challenge in healthcare. We have dedicated three years of R&D to developing Owkin Connect, our proprietary federated learning (FL) framework to “connect” multiple data sources. This means that we can train machine learning models on distributed data (i.e. data from multiple, global research organizations) instead of having to aggregate or collect it.
Collaborative research of this sort has the potential to massively accelerate the clinical research process. It offers protected data for patients, exhaustive traceability of computations for institutions, maximum collaboration for researchers, and predictive power for data scientists.
Chivot: Owkin is initiating consortia and research collaborations, including blockchain-based collaborations, with academic institutions, medical centers, data scientists, and industrial partners such as pharmaceutical companies. Why is that model particularly powerful? What are the concrete benefits of this type of collaboration for the research process, the work of practitioners, and patients’ lives?
Huyghues-Despointes: At Owkin we whole-heartedly believe that collaboration is the key to advancing medical research. Working in concert with academic centers and researchers, we deploy infrastructure, prepare data, train predictive models, validate results, and co-publish our collective findings in top scientific journals.
We coordinate collaborative projects by arranging them into consortia of institutions and stakeholders with a common goal. We have found this model to be very effective in building trust to bring former competitors together in a bid to advance medical research. Through these projects, we ensure each institution maintains governance of its own data and privacy for its research, while benefiting from training the predictive models on other members’ datasets. The outcome: Better and more powerful predictive models, trained on multicentric heterogeneous data, for better insights, and ultimately, more impactful medical discoveries.
We coordinate and power various federated learning consortia. One is MELLODDY, a pharma consortium that trains machine learning models on chemical libraries from ten major European pharmaceutical companies which receive funding from the EU’s Innovative Medicines Initiative 2 Joint Undertaking (IMI 2 JU). We also lead three academic consortia. One is Healthchain in France. Partners include The Institut Curie, Assistance Publique-Hôpitaux de Paris, Nantes University Hospital Center, and The Centre Léon Bérard), with first machine learning models focusing on melanoma and breast cancer. Another is AI4VBH in London (including King’s College London and Nvidia among other partners), with an early focus on cancer, heart failure, and stroke. A third one is the COVID-19 Open AI Consortium (COAI) which we’ve set up and are leading. COAI partners include Capacity Covid, Complejo Hospitalario Universitario de Santiago de Compostela, Bichat – Claude-Bernard Hospital, Centro Hospitalar e Universitário de Coimbra, and PRISM Research Group. We set it up to bring breakthrough medical discoveries and actionable findings to the fight against the COVID-19 pandemic. In May 2020, Owkin also developed a machine learning model in partnership with Gustave Roussy, Hôpital Kremlin-Bicêtre, and INRIA to predict the severity of SARS-CoV-2 infection from initial CT-scans and clinical variables.
Chivot: Owkin is a French-American startup, with offices in Paris, London, and New York, which gives the organization experience in different markets and types of health infrastructure. Can you share some of the characteristics that make each of them specific, challenging, and strategic?
Huyghues-Despointes: At Owkin we aim to federate the largest research ecosystem, connecting medical research centers and pharmaceutical companies in Europe and the United States. Our goal is to capture fit-for-AI datasets, the most challenging scientific questions, and the most advanced drug treatments. We partner with the top-tier research centers in Europe and the United States, and with the world’s leading key opinion leaders (KOLs) in their field. We have gained access to their exceptional research-grade curated multimodal cohorts that reflect patient and treatment heterogeneity. In addition, we are expanding our network into strategic countries where drugs are priced and launched, or that represent big markets.
Based on this expansion strategy, we chose to be present in New York, London, and Paris, in order to have ready access to the world’s leading centers, KOLs, and research cohorts. Let me go into the details of each infrastructure.
First, about the French market and health infrastructure—Paris is a true healthcare research hub in Europe. It is well connected and contains leading universities and high-tech research institutions such as the European Cancer Center Institut Gustave Roussy. The French healthcare system is heavily regulated by the government, with very strict data protection legislations. Enabling AI collaboration within these strict regulations is a major challenge in France and the rest of the EU. A second challenge to overcome is the segmentation of the French research system, resulting in small datasets available in hospitals throughout France. It is essential to train predictive models on large and diverse amounts of data to ensure high performance and reproducibility. We have overcome both these challenges by implementing our secure and privacy-preserving FL framework to enable machine learning models to be trained on decentralized data from multiple French institutions. With two French founders, we have a fantastic network of connections and successful academic collaborations in France. We are leveraging this early traction to expand in other EU countries.
Second, the UK’s healthcare market is unique due to the National Health Service (NHS) and the National Institute for Health and Care Excellence (NICE). And London is home to some of the leading research universities in the world. UK hospitals are organized into groups of hospitals called NHS trusts. KOLs often work in a few different hospitals within their trusts. The leading KOLs are therefore well known to each other and communicate regularly. This close-knit community can be hard to penetrate as an outsider. In December last year, Owkin joined the innovative AI4VBH consortium led by King’s College London. We are confident that once we have the opportunity to publish initial results from this collaboration, we will replicate the market access and traction we have seen in France.
Third, the U.S. healthcare market is characterized by significantly higher levels of funding for research ($194 billion in 2018) and more lenient privacy regulations than the EU. The National Institutes of Health (NIH) is the primary agency of the U.S. government responsible for biomedical and public health research. Over 50 percent of medical AI today is in North America and there is more and more regulatory approval for the use of machine learning and AI in medicine each year (for example, as seen with the use of real world data in clinical trials). In addition, New York is one of the biggest healthcare hubs in the world, with research institutions and hospitals such as Memorial Sloan Kettering, Mount Sinai, Columbia, New York University, Cornell, and leading pharmaceutical companies located in New York City or across the river, in New Jersey. Centers in the United States have larger volumes of patients as the clinical research space is less segmented. It is therefore possible to get a more representative clinical research cohort from a smaller number of centers, compared to Europe. U.S. hospitals have to adhere to rules on data sharing that are easier compared to European hospitals, and as a result, they routinely monetize their data. As they sell their data for money, this poses a challenge for us at Owkin, as we offer a different business model based on free collaboration and future revenue share. We therefore had to find a way to attract U.S. hospitals to partner with us.
Chivot: The most obvious application for federated learning may be in healthcare currently, but which other areas do you see it expanding in the future?
Huyghues-Despointes: The potential use cases for federated learning are numerous, but one can find strong use-cases within any setting that has ubiquitous compute and data availability over distributed nodes, with a restriction (either through regulation, convention, or consumer pressure) to keep data at its source.
The first area of application I can identify is industrial monitoring. For many large-scale industrial applications, there is a wealth of data produced by local imaging and sensor networks, but the data may be too massive to retain for any length of time, much less share across sites. Federated approaches can help train fault detection models on fresh data continuously by training them in a distributed manner across industrial sites.
A second area is banking and finance. Besides health data, our financial data is some of our most personal. The same is true not just for individuals, but also for corporations and businesses. Banks, regulators, and private individuals have a strong interest in protecting themselves against fraud, so fraud and identity theft detection models are becoming the norm. Federated techniques allow different financial institutions to collaborate in building such fraud detection models.
A third area is about consumer mobile applications. The applications of federated learning to generic mobile application use cases are virtually limitless, and will become more and more numerous with the introduction of robust and flexible FL-as-a-Service solutions, which will be made increasingly available to mobile developers.
A fourth area is advertising. Through properly applied federated approaches, advertisers can obtain better and fresher information about engagement while simultaneously respecting consumer privacy and data autonomy. FL in advertising permits a fresh and more ethically accountable approach to targeted marketing.
Finally (but this is non-exhaustive), a fifth area is automotive. We are surrounded by millions of high-powered computers and sensing devices moving at high speeds. With the advent of self-driving and autopilot technologies, there is both a need for training and a capability to train better and more relevant road-centric computer vision models through federated techniques that line up extremely well with the technologies used for the mobile-device use case of federated learning.