The Center for Data Innovation spoke with Russell D’Sa, co-founder and CEO of LiveKit, a California-based company that helps organizations innovate by providing the tools for real-time transfer of audio and video data at scale. D’Sa discussed how LiveKit simplifies scaling video and audio transfers, how the platform is powering multimodal AI applications like ChatGPT, and how LiveKit intends to build the “nervous system” for future AI innovation.
This interview has been edited.
Martin Makaryan: How does LiveKit help drive data innovation?
Russell D’Sa: LiveKit is the backbone of many data-driven applications. We enable innovative products and services by providing the tools for real-time computing, specifically for real-time voice and video applications. For instance, ChatGPT’s voice mode runs on the LiveKit cloud. When you open the ChatGPT app and tap the voice mode button, your phone connects to a LiveKit server. When you speak, LiveKit transfers your audio to the AI model, which converts it to text, processes it through the language model, converts the response back to speech, and sends it back to your device over LiveKit’s network.
Another example is how LiveKit will enable emergency dispatch centers to better respond to 911 calls. In iOS 18, LiveKit will be directly embedded into the FaceTime experience for emergency calls, allowing callers to stream audio, video, and GPS data to dispatch agents, which will allow them to provide visual guidance in emergencies, such as coaching someone through CPR. In this way, LiveKit enables AI systems to see and hear by transmitting real-time audio and video data, allowing AI models to process and respond instantly.
Makaryan: What makes LiveKit so useful for developers?
D’Sa: LiveKit simplifies real-time communication for developers. LiveKit runs on WebRTC, Google’s powerful but complex set of protocols for transmitting high-bandwidth data with very low latency. WebRTC allows web browsers and mobile applications with real-time communication via application programming interfaces (APIs), but WebRTC can be difficult to use for developers as it comprises several sets of protocols and does not scale well on its own. LiveKit simplifies its use through a network of servers around the world that communicate with each other, optimizing audio and video transfer. The value that we bring is connecting to the closest server geographically to transfer their data and measuring the network for latency and fastest paths in real-time.
We also give developers the tools to make video and audio interfaces faster and high-quality. With the launch of LiveKit cloud, we are also enabling developers to integrate, deploy, and scale LiveKit in a short amount of time, allowing them to focus on their products. We launched our cloud service after researching how companies and people are using the LiveKit platform and finding out that many are struggling to expand the scale of data they are transferring with the WebRTC architecture.
Makaryan: What was the inspiration behind LiveKit?
D’Sa: We started LiveKit in the summer of 2021, right in the middle of the Covid-19 pandemic. The inspiration came from a side project I was working on in 2020 and I found that there was not anything open source that made it easy to build real-time audio and video applications, which many were working on during the pandemic to accommodate our transition into a more virtual life. The Covid-19 pandemic changed our everyday lives in many ways. For example, videoconferencing and virtual classes are still very popular even though the pandemic is no longer a huge concern. People are even going to weddings virtually nowadays, so the challenge is creating the kind of fast, reliable, and optimized infrastructure to transfer video and audio in real-time to power various applications, including the multimodal interfaces of generative AI tools, such as ChatGPT.
Makaryan: How have AI advancements shaped your vision for LiveKit?
D’Sa: AI advancements in recent years and the spread of generative AI applications have created new opportunities for LiveKit. While companies like OpenAI, Anthropic, Google, and Apple are building the “brain” behind AI, LiveKit is building the “nervous system.” What I mean by this is that we are powering AI tools with the ability to see, hear, and speak by connecting cameras, microphones, and speakers to these AI models, expanding the availability of data in general and enhancing various experiences for consumers. This change represents a shift in how we interact with computers. Instead of adapting to computers through keyboards and mice, we will interact with them more naturally through cameras and microphones. LiveKit has the opportunity to play a critical role in the infrastructure for multimodal AI.
Makaryan: What challenges have you faced as an innovator?
D’Sa: This is my fifth company, and I have faced different challenges throughout my career. In my early 20s, my motivation was fame: entrepreneurs like Steve Jobs, Larry Page, and Sergey Brin inspired me and I wanted to imitate their career path. Priorities changed in my 30s, and now, in my 40s, I have realized that I can have an impact and work on things I care about without fame or excessive wealth. This shift in motivation has been crucial. Another challenge was learning to start companies organically, based on ideas or movements I care about, rather than starting companies just for the sake of it. With LiveKit specifically, my biggest challenge recently has been maintaining focus. There are many exciting opportunities in our space, especially with AI, but it is crucial to focus on nailing the fundamentals in our current area of operation before expanding into new areas.