The Center for Data Innovation spoke with Emre Kazim, COO and co-founder of Holistic AI, a London-based algorithm auditing company. Kazim discussed addressing public concerns about the trustworthiness of algorithms, the meaning of “ethical AI,” and his company’s framework for auditing AI systems.
Eva Behrens: What key challenge did you identify that led to the founding of Holistic AI, and how do you aim to address it?
Emre Kazim: So, it might be worth talking a bit about the genesis of the company. I co-founded Holistic AI with Dr. Adriano Koshiyama. Adriano was a Ph.D. student in the Computer Science department at UCL. I was a philosophy postdoc. By accident, I had written about digital ethics and ended up in the Computer Science department. I was tracking regulatory and policy discussions and broader issues such as consent and explainability of algorithms. Adriano was looking at questions like what it means to do technical assessments and robustness testing of algorithms. We experimented with the marriage of the two disciplines; it was learning by doing.
When I went to the Computer Science department in 2018, there was a lot of interest in digital ethics and AI ethics in general society. That’s what we were calling it at the time. People started asking, hang on a minute, what are these algorithms doing? Are they manipulating us? We saw this with the Cambridge Analytica scandal and other high-profile cases, such as the use of algorithms in recidivism calculation in the United States. In another case, Amazon retired a CV sieving algorithm, which was shown to be biased against women. In the department at UCL, engineers were asking how we could create public trust and looking for ways to demonstrate if algorithms are trustworthy.
The core problem that we were trying to solve was how to do a meaningful assessment of algorithms from a technical perspective and communicate it to non-engineers, including citizens, customers, and regulators, to make AI trustworthy. We saw a desire for people to deal with the ethics of algorithms and a demand for algorithm assessments, but there was no empirical evidence base for how to do it. So, Adriano developed an algorithm auditing framework called “Towards Algorithm Auditing” with his co-authors, and it became the reference piece for the field. We piloted that audit framework to see how it worked in practice. Once we had seen its value in the real world, we spun the company out to have maximum impact.
Behrens: Your company’s name suggests that you take a holistic approach to audit AI systems and managing AI risks. What are the components of your auditing process, and what makes it holistic?
Kazim: What we meant by holistic is that we think about technical and non-technical assessments. We don’t just look at how an algorithm performs individually. It is also about the context in which that algorithm operates and about providing good governance for those questions.
On the technical side, we will look at the bias of a system, asking, for example, how the system performs against different demographics. Secondly, we look at explainability. Explainability means finding out why the system produces a certain recommendation or result. The third thing we look at in terms of technical assessment is robustness, which is an umbrella term that includes reproducibility, reliability, and security. And fourth, we look at privacy. Privacy in the AI sense is very different, and traditional data governance provisions are not sufficient in the context of AI. When you introduce machine learning and AI into the stream, it introduces novel privacy concerns.
On the non-technical side, we ask questions like what are we doing this assessment for? What does the documentation look like? How are companies managing their impending regulatory interventions? What does the internal reporting of a company look like? Does the company have good control over its systems? So, we address governance questions on top of the technical assessment.
Behrens: How does Holistic AI define “ethical AI,” and which criteria and processes do you use to evaluate whether an AI system is “ethical”?
Kazim: If you want a definition, we understand AI ethics in a much broader sense of the social, political, and psychological impacts of automation on society.
Initially, everyone used the phrase AI ethics. But we actually moved away from using it. I think rightly so ethics is a democratic term, insofar as everyone will have their own ethical worldview and will advocate for their own position. There is a strong relationship between ethical advocacy and political and social advocacy. So, if we say this is ethical, or this isn’t, we enter into that domain of contest. Ethics is too contentious. Instead, we moved to terms like responsible AI, trustworthy AI, and more formal language with “auditing.” Auditing is a process that the industry knows and understands. AI risk management is another term that already has existing analogies in the industry.
The second element is that there are different ways of solving the problem of “AI ethics.” At a particular point, we found that people were creating AI ethics boards. And AI ethicists and consultancies were coming in and suggesting they had clear answers. But we believe that if we are going to be making decisions on the ethics of an algorithm, we shouldn’t leave it to tiny groups of people to do so. We were never interested in saying we are the experts and have all the answers. I understand why you would have an ethics board for edge cases that are legal but still morally dubious. But for your general algorithm performance assessments, we should have a systematic, scalable, rational system. By doing that, we also want to create transparency.
Behrens: How does algorithm auditing work, and how is it different from, for example, financial auditing?
Kazim: This goes back to the previous question. One of the reasons why we chose the term AI audit was to show that we are not doing ethical assessments but tangible assessments of algorithms. But the analogy is limited because financial auditing is usually just accounting for the state of play, and we are doing more than that.
So, what are we doing when we do an “audit”? First, we ask where the system is used and in what context. For example, if we look at a recommender algorithm, it could be used for a TV program or another product. Secondly, we look at who is using the algorithm and who is responsible for it in the business. Then, we identify where the risks lie. For each risk assessment, we check what risks are relevant for this kind of algorithm. For example, if we were assessing an HR system, we would want to know that it is private and explainable and if it is biased. But if we are assessing a trading algorithm, there is no point in assessing its bias; it has no relevance to it. We would focus on robustness and performance.
Then we test those risks. Let’s say we find that an HR system has an issue of bias; then we investigate that. We dig down, we verify. And if we find problems, we provide mitigation strategies. And then, subject to our mitigation strategies being implemented, we will say this system is okay, this system is assured. Our process is more similar to a medical diagnosis than just an audit of the actual state of play. We also use the phrase “assurance roadmap”; assurance is a more accurate term for what we do.
Behrens: In several jurisdictions, including the EU, the UK, and some U.S. states, policymakers are working on laws to regulate AI systems. What role will private sector auditing companies like Holistic AI play in the future as governments move to enact and enforce regulation?
Kazim: Answering this question is a bit like looking into the magic ball because I think we’re some years away from that. Algorithm auditing is such a new area that we still need to develop maturity in the space.
I think we will see a vibrant ecosystem, a vibrant community or economy of different companies, of independent third parties that can provide audit work. For a competitive third-party assessment ecosystem, we would probably need a generally accepted best-practice approach. The question is what best practices will look like. So, companies wouldn’t be competing on what it is to do an assessment. They would compete on how to deliver assessments. I know many law firms will try to get involved, and a lot of the big consultancies will be interested in this space eventually. It would be fantastic to have a really innovative ecosystem, but we are still some years away from that.
I think the EU AI act will become the global de facto AI regulation, similar to what GDPR was for privacy. But, the servicing of that is still an open question. One of the problems early on was that people thought we will just do the equivalent of a DPIA, a Data Protection Impact Assessment. But I think we have learned that DPIAs serve a limited purpose. Maybe data was always more complicated than we thought, but people are finding that AI, machine learning, is significantly more complicated. And that’s why for us, looking at the technical assessments along with the non-technical reporting is really at the heart of working on the problem.