The Center for Data Innovation spoke with Nirmal Govind, director of streaming science and algorithms at Netflix in the San Francisco Bay Area. Govind discussed what the future of Netflix’s recommendation algorithms might look like, as well as the challenges with ensuring Netflix users get the best possible experience across a wide variety of different devices and Internet connections.
This interview has been lightly edited.
Joshua New: Users streaming Netflix videos famously takes up over a third of all Internet traffic in the United States during peak hours. How does data science play a role in accommodating such incredibly high demand?
Nirmal Govind: I’ll answer this in three parts. First, data plays a very important role at Netflix. We half-joke that everyone at Netflix is a data scientist, but the essence of it is that we use data to drive a lot of what we do at Netflix. Data science models and algorithms are used directly in our product, in workflows and tools used internally to make decisions, and also to guide intuition. So, at a high level, data science directly or indirectly affects the various aspects involved in enabling streaming in general, and at such a high volume.
Second, let’s look at two of the many areas where data science comes in. On the server side, our content delivery network, called Open Connect, has to handle the high volume of playback requests, and this means determining the optimal strategy for serving traffic to Internet service providers, the best location to serve a particular play, and also longer-term capacity decisions. On the client-side, our adaptive streaming algorithms are tuned to deliver the best experience for each user based on the user’s current bandwidth and other contextual considerations. We use data to build machine learning models and algorithms that inform and enable these types of tactical and strategic decisions.
Finally, experimentation is a rather important way in which data science impacts streaming. We rapidly innovate by running experiments and carefully analyzing the data to determine the algorithms or parameters that result in the optimal streaming experience. In addition, we also work on methods and models to improve the science behind experimentation. For example, in addition to A/B testing, are there other ways to run experiments so we can learn faster?
New: One of your responsibilities is improving Netflix’s digital supply chain—the process of producing content at studios and delivering it to your customers, and all the processes therein. What are some of the challenges that arise in this process?
Govind: From a data science perspective, we’re more focused on what happens after the content is received at Netflix, though this is changing as we’re now producing original content as well.
We have several partners that deliver various types of content to us—movies, shows, documentaries, and so on. The assets received are encoded before they go live on Netflix. A challenging but super-interesting problem from a data science perspective is to determine where the quality issues are. We’ve been working on machine learning approaches to detect issues with video and audio quality, text such as subtitles, captions, and forced narratives, and encoding errors, both before and after the content goes live. There’s an abundance of data, with member feedback reports and social media, in addition to viewing behavior data and metadata about the content itself. But the challenge is extracting the right signals from this data. We’ve had good success over the past year in this area but there’s a lot more that remains to be done.
With originals—content produced exclusively for Netflix—we’re streaming content that has never been seen before, and we’re also involved with content further upstream in the production lifecycle. This is a great opportunity to take a look at some very interesting data around the content production supply chain and identify how we can use data science to improve it.
New: Netflix applies machine learning models and natural language processing algorithms to help improve the quality of content, such as ensuring subtitles are accurate. As Netflix’s audience is increasingly international, what are some of the challenges of applying these techniques to different languages?
Govind: Natural language processing is an area that we’re investing in as we expand to more countries around the world. Building language-based models that are truly global is hard, especially when you consider languages like Japanese with multiple character types and scripts. Most of the data is unstructured and understanding context is pretty important for what we’re trying to do. We’re actively looking into approaches to help tackle this problem so we’ll have more on this topic in the future.
New: Most people visibly encounter data science at work at Netflix through the recommendation algorithms. Beyond viewing history, what are some of the data sources that play into these recommendations? What other factors could be used in the future?
Govind: Recommendations is a fascinating area and is still very much a focus at Netflix, almost a decade after the Netflix Prize—an open competition we hosted to develop rating algorithms. With the limited amount of time in our users’ daily lives, it’s important to get the right content in front of them at the right time.There’s an amazing group of people at Netflix who are committed to improving this aspect of our product and are constantly trying out new models and algorithms. A great example is the new “Trending Now” section that uses more real-time data and context such as time of day to recommend content that might appeal more at a given time.
Viewing history is one of the data sources and we continue to uncover new signals within the viewing data. As we expand globally, we still have much to learn about tastes around the world and how those tastes can be leveraged to uncover new recommendations in regions we’re already familiar with. In addition to viewing data, the algorithms also look at searches, ratings, and movie and show metadata. The future is hard to predict but given we’re getting close to self-driving cars, maybe self-playing videos that you’re absolutely going to love isn’t out of the question!
New: And beyond personalizing content recommendations, you also focus on personalizing the overall streaming experience for users, such as ensuring their video is optimized for their Internet speeds or screen resolution. Could you walk me through that process?
Govind: Imagine a member with an ultra high-definition TV, sitting at home with a stable high-bandwidth connection. Now, let’s take a user on a mobile phone at a train station on a cellular network. It’s quite possible that the expectations of these two members are different when it comes to streaming quality. One may want a very high quality picture right from the start while the other may just want playback to start right away even if that means lower quality for a short duration. The goal with personalized streaming is to ensure that each member’s streaming experience is the best that they can get given their context and their expectations.
This problem is even more challenging as Netflix expands globally and has an even more diverse member-base with different network types, device characteristics, and usage contexts. Models based on this wide variety of data are necessary to ensure that the streaming algorithms can understand the varied contexts and appropriately tune the experience for each member. This is an exciting space and the sheer amount of data that streaming generates makes it an endless source of fun problems to solve!