The Center for Data Innovation spoke with Nikola Mrkšić, co-founder and chief executive officer of PolyAI, a company based in London which develops a machine learning platform for conversational artificial intelligence. Mrkšić discussed how automating customer service with voice-activated systems can improve dialogue interactions.
Eline Chivot: Why did you create PolyAI, and what was the main problem that you set out to solve?
Nikola Mrkšić: I’m originally Serbian, and I studied computer science and mathematics at the University of Cambridge. I gained an interest in machine learning and did a PhD to research how to build conversational agents. I worked in parallel at VocalIQ, a spin off of our research group in Cambridge, where I was the first technical hire. My PhD supervisor was one of its co-founders. VocalIQ was acquired by Apple in 2015, and I worked for Apple’s Siri for two years. I then started PolyAI with my co-founders Tsung-Hsien Wen and Pei-Hao Su, both Taiwanese, who also did their PhDs at Cambridge. They had worked at Google and Facebook AI respectively, on deep learning technologies for dialogue management and natural language generation. I worked on applying deep learning to natural language understanding (NLU). Combining our expertise from these areas,we worked on building state-of-the-art data-driven dialogue systems.
When we started PolyAI, we knew we had a lot of great technology but it was a blue sky in terms of how we were going to apply it. Early on, we worked with several large tech companies to pioneer new kinds of virtual assistant technologies. As we scaled up, we brought most of our colleagues from Cambridge University into the team, and thanks to that, we were able to successfully raise two rounds of funding to grow the business.
We decided to focus on customer service, where conversational AI can solve real world problems today. Customer service is a huge market but applying data-driven systems is really hard because companies do not want to mess with their customers, especially when it’s about bookings, reservations, and other aspects that have to work flawlessly to maintain customer satisfaction. There is a fear of not being able to control the responses an automatic dialogue system comes up with. One barrier was therefore convincing businesses that the technology is ready and should be rolled out.
We’ve put a lot of thought into the impact this kind of technology will have on people working in call centers. Over the last few years, we’ve seen first hand that few people really want to work there. Typically, people who work these jobs only stay for two or three months before moving onto other opportunities. The high turnover costs call centers a fortune in terms of training and hiring, making it very difficult for them to be competitive in their offerings.
Conversational AI, as it exists today, can help these call centers scale up their operations, answering calls 24/7 and handling mundane, repetitive tasks while agents focus on high value conversations. But the technology is nowhere near ready to completely supplant humans.
Chivot: What can explain the relatively high market penetration rate of voice assistants, despite the technologies not being perfect yet?
Mrkšić: That is what the whole field of conversational AI and PolyAI is about: Building fail-safe mechanisms to ensure that these technologies, albeit faulty, still create something that is helpful in a real-life situation.
Building conversational agents is hard. Right now, if used in a quiet room, by a user with a native accent, speech recognition works better than humans, with an error rate of less than 5 percent. But if you have a foreign accent, like I do, the error rate will be higher. Because of this, we need to build fail-safe recovery mechanisms into these solutions, like airbags in cars. One way of doing this is prompting the user to verify an input, for instance asking “Did you mean…”, or “Did you want me to…” and other such questions.
Chivot: The agents are based on PolyAI’s machine learning and natural language processing (NLP) technology. Can you explain why these are particularly well suited for conversational AI, what their technical challenges are, and how they are developing?
Mrkšić: These terms are often conflated or tossed around arbitrarily, and the current hype around AI is not helpful. Traditional AI in the 1990s was based on logic and symbolic mechanisms for describing the world, using a very different set of technologies. What started to become interesting over the last few decades is machine learning, which is all about statistical pattern recognition in many different areas. That means that if I (as a system) have seen and registered a similar dialogue between a customer service representative and a customer many times, I can learn to replicate that behavior. This is learning by imitation, and then hopefully, learning to generalize so the system can figure out how to react in different situations.
NLP is one subfield of both machine learning and computational linguistics. If you look at traditional professional communities for NLP, the main one, the Association for Computational Linguistics (ACL), grew based on the field of linguistics initially—not machine learning per se. But today if you go to these conferences, you’ll see that the dominant methodology people use to build systems and explain phenomena is machine learning—because it’s a convenient paradigm for solving these problems.
Also, we’ve gotten really good at creating tools that are easy for anyone to use. Ten years ago, to train a neural network, you needed to understand them in-depth. Today, all you need to do is pick a few blocks, and select what you need or want to do. As things have gotten simpler, and the methodology was put in place as the right “equipment,” the field started to attract a much wider set of researchers from diverse areas, in this case linguists. That has propelled the use of deep learning, which is all about using artificial neural networks to solve different problems.
At PolyAI, we are heavy users of deep learning, and we’re all about creating data-driven models. That’s important to us because when you do things in this way, you can have meaningful evaluation. When you can evaluate accuracy or have a goal success rate, you can know how well your system works and how to improve it next as a result.
The challenges we are faced with are similar to those faced by everyone working in conversational AI. This is a relatively new field. The label of “conversational AI” wasn’t used five years ago—we talked about spoken dialogue systems. Many of the solutions out there right now are not a product of machine learning, they are complicated and heavily hand-crafted. Today, and twenty years ago, people built conversational agents with very simple programming. So there’s going to have to be a lot of work, know-how, and academic development to get this technology to a point where it’s really well understood and readily available.
Today, the subproblem where machine learning has really had an impact—and which I worked on throughout my academic career—is natural language understanding (NLU). Extracting semantics from sentences is difficult, and NLU is the piece that allows conversational agents to understand what the user is trying to achieve. At PolyAI, we’ve built state-of-the-art systems that outperform our competitors across a number of world languages.
The bigger challenge, which hasn’t been cracked yet, and where I think PolyAI is ahead of the game, is the design of what is known as dialogue policies or agent behavior. For instance, how do you teach an AI agent to upsell flight tickets or negotiate a different time for a restaurant reservation if the required time is not available?
Right now, we are working with a big restaurant group in the UK which takes phone reservations using our system. I may ask for a reservation at 7:30pm, for me and my girlfriend. A good NLU module can see that this is an order for two people. However, if this time is not available, the system can try to come up with clever proactive suggestions to entice me to make the booking for a later time and spend time waiting (and spending money) at their bar.
The way consumers use and interact with conversational systems is not always clear or straightforward, but we’ve improved the product massively and managed to raise the number of bookings: AI agents pick up every call and deal with it smoothly and effectively.
But it’s still hard to put these technologies into production, and such data-driven systems are not widely available yet. We are very optimistic—more people will become familiar with them, and there will be more opportunities for their application. Few companies can build a real, good, and enjoyable voice experience, and we are managing to achieve that with just 30 people—compared to the army of 10,000 people that work at Alexa!
Chivot: The UK remains Europe’s biggest tech hub. How do you expect the AI startup ecosystem will evolve as the country is leaving the European Union?
Mrkšić: Obviously the EU has made it easy for companies like ours to recruit and work with talented people from various other countries. PolyAI is a team of 30 people of 20 different nationalities. If you find someone really good in Slovakia, Bulgaria, or France and you make them an offer to join the team, they can basically move to London the next Monday and start working with you. We also have Indian and Russian colleagues, it has taken us two or three months to get them a visa, and it costs around £4,000. We do this routinely, but many startups in London are financially squeezed, and just aren’t able to do this.
People want to stay with an employer for as long as they have something to offer each other, but they also want to be able to move freely and easily to other opportunities which may come up. The issue of hiring someone with a visa means that this person is tied to the employer, which isn’t great because you may have someone stuck who doesn’t want to be there. And for non-EU employees, it’s very unfair: If left without a job, in the UK they have only two weeks to find a new role. The UK staying in the European Economic Area post-Brexit, though unlikely, would certainly be very good and helpful for startups, especially the early stage ones.
However, in terms of the outlook for the British tech ecosystem, I’m not worried at all. Britain is very pragmatic and has always found ways to remove bureaucratic obstacles much faster than most European countries—after all, that’s the reason why many of us are here in the UK.
From a regulatory perspective, frameworks like the GDPR aren’t making Europe more competitive. We have made great efforts and paid substantial legal fees to make sure we are compliant, but this can be a roadblock when dealing with large enterprises that might have been burnt by such regulation before. And while we deal with such regulation, American and Chinese companies are racing further ahead of European ones.
The GDPR isn’t bad in its own right. But as companies are afraid of sanctions and other penalties, it has made big enterprises in Europe—which were already more conservative when it comes to working with startups—even more conservative. I don’t know how and whether that type of regulation will change for the UK with Brexit—that depends on current negotiations.
Chivot: One perspective that has dominated the debate around AI-induced automation is the concerns about its effect on human capital. Detractors see AI as a disruptive technology likely to lead to massive job losses and increased job polarization. What is your view?
Mrkšić: People tend to see AI as much more powerful, sophisticated, and developed than it actually is right now. We’re very far from an AI that can improve itself—and we’re not even sure if we’ll ever get there. Historically, most new technology has led to automation, and with it created the need to upskill jobs. Inevitably, this wave of AI technology will cause some stir on the labor market, and the question is only how fast this will happen, and how well our economies are set up to deal with it.
I suspect we’ll find universal basic income come up more often in discussions around the reform of our frameworks around work. I’m no expert on this, but it certainly will be necessary to focus on continuous education and on changing our mindsets when it comes to our views on careers. We have already accepted that our jobs and job descriptions will change regularly, that we won’t be working for one single organization our entire life, that we will have to opt for other types of employment contracts, and that in between jobs we may have to go back to studying so as to maintain and increase our knowledge and stay relevant and competitive. I think this is an opportunity to learn and have more fulfilling careers. The diversity of knowledge we can build by doing different things will be very valuable in each of our next roles, and to us as individuals.
I don’t think things will be as disruptive or change as fast as people fear. If you spend a few months creating machine learning models, you’ll see how difficult it still is to get a model to understand that “cheap” and “cheaper” have similar meanings. Another example—machine learning models can beat chess grandmasters, but we’re still not very good at getting robotic hands to move the figures. Have no fear—we’re not there yet.