The Center for Data Innovation spoke with Robin Röhm, the CEO and co-founder of Apheris, a company that uses federated data ecosystems to accelerate innovation while protecting data privacy and intellectual property. Röhm discussed the benefits of decentralized data for privacy and innovation as well as its AI and machine learning applications.
Kir Nuthi: What makes Apheris’ model of a federated data platform unique in the data economy and AI space?
Robin Röhm: If you look at most of the major machine learning platforms today, such as Databricks, Snowflake, and the hyperscalers’ offerings, they rely on the idea that data is centralized and accessible by the organization that owns it. However, this is not always the case. For example, different companies may own different complementary data sets, or a company may gather data owned by others, such as participants, customers, and patients. In these cases, it may not be feasible or appropriate to centralize the data due to privacy or security concerns. This is where Apheris comes in. Our solution allows organizations to unlock the value of such data by connecting the places where the data sets reside. We then enable machine learning workflows to be sent across such a federated architecture. This allows data to be jointly analyzed in individual locations while coordinating the sharing of computation results across the federated machine learning platform.
What sets Apheris apart is that our platform is built in a way that integrates seamlessly with the existing data and AI tech stack. An ML engineer or data scientist can use the languages and tools they are already familiar with, such as Databricks or open source tools, to build and operationalize machine learning applications. With Apheris, you can launch pipelines on the federated data network, receive results, and integrate them into existing pipelines without having to rebuild existing systems.
Nuthi: Apheris has often discussed the power of harnessing complementary data. Could you explain how leveraging decentralized data can drive value and accelerate innovation while providing a privacy-friendly alternative to data centralization?
Röhm: Data is most valuable when it is extensive and detailed. For example, in supply chain analysis and customer or patient journeys, the best insights are derived when large datasets span the entire journey. In the health care industry, this could comprise of a large number of patients and a detailed record of their actions, such as lab results, clinical trial data, treatments received in the hospital, or data from their own recorded health-related activities. This broad overview can be very useful, but often, companies only have access to a specific piece of this data.
Complementary data occurs when you connect these data sets to get a more complete perspective. This is the fundamental idea behind complementary data and federated data networks. By combining many data sources, you can train a more comprehensive model that considers a wider range of factors. However, many companies may not consider third-party data or the data that is available to them beyond their own organizational boundaries.
Nuthi: Why did Apheris focus on the health care, pharmaceutical, and manufacturing industries for its data-based solutions?
Röhm: That has more to do with the culture and opportunities than Apheris’ tech and platform capabilities. The critical question was: where do we see the need for AI adoption coupled with highly sensitive data such that it cannot be shared? Pharma and health care were obvious choices because both industries deal exclusively with such sensitive data. There the decision was straightforward, but manufacturing was not as obvious. The manufacturing challenges along a production value chain are very high in some industries. For example, semiconductor manufacturing is an incredibly regulated industry with sensitive intellectual property. What we’re seeing in manufacturing is that the demand for high-quality production is growing exponentially in many sectors like semiconductors, general electronics, high-performance materials, and the intersection of bio and manufacturing. The manufacturing processes are so challenging that these industries need to innovate. We believe these sectors have the power to transform how we operate as a society and how we transition to a sustainable society.
Nuthi: What are some primary concerns and regulatory issues you have encountered regarding the use of AI and the federation of data, and how does Apheris’ use of federated data and its collaborative platform overcome otherwise present regulatory, technical, and commercial barriers?
Röhm: There are two main considerations when it comes to machine learning and analytics: data usage and model usage. In terms of data usage, there are regulations such as GDPR that protect individuals’ sensitive and identifiable data and give users the right to decide how their data is used. These regulations vary by region, with some places, such as Europe, having stricter privacy laws. A federated data platform allows businesses to compute with personally identifiable information. The results that are shared between parties are automatically constructed in a way that ensures security and data privacy so that they do not contain any personally identifiable information. However, it is still the responsibility of the company providing the data to ensure that they have the rights and consent to use the data for a specific purpose. For example, even if a company has the rights and consent from a patient to use their data for research purposes, they may not be allowed to use a federated model for commercial purposes.
In terms of AI adoption, there is a growing concern about whether AI systems are trustworthy, explainable, and fair and questions about how they should be governed. These important questions will become increasingly relevant as AI adoption increases in the next decade, especially in highly regulated industries. At Apheris, we have designed our platform to navigate these concerns through quality assurance processes and related governance. This allows our users to ensure that their AI systems are reliable, transparent, and unbiased, and that they are used responsibly.
Nuthi: Now that Apheris has received €8.7 million in its seed extension round, what is the next series of goals for the company?
Röhm: As a start-up, I tend to think in two-year cycles. Five years onwards are visionary goals.
In five to 10 years, AI acceleration and the tooling and infrastructure around it will dramatically increase. We want to ensure that it’s not just large hyperscalers who benefit from the value created from large data pools but also that other companies can participate in that value creation, build their competitive edge, and protect their own data. This is Apheris’ central role in the industry and is our key mission.
We’re currently doubling down on our horizontal positioning as a machine learning infrastructure company. We’re currently most active in pharma, health care, and manufacturing, but we’re seeing technical uses that span industries. More technical audiences are starting to request the capabilities of federated machine learning platforms, such as the CTO of a scale-up or an IT buyer, who have far more targeted requirements on the technical capabilities they need. This is a positive sign of a market that is maturing.

 
												 
		
