The Center for Data Innovation spoke with Tom Lee, policy lead at Mapbox, a geospatial data company based in Washington, D.C. Lee discussed how Mapbox is developing navigation for autonomous vehicles and the value of geospatial data for humanitarian aid.
Joshua New: Mapbox launched Mapbox Cities in 2016 to help teach cities how to use data to solve urban challenges. Can you explain how this works? Are there any interesting success stories?
Tom Lee: The Cities program is a chance for us to partner with municipal governments on specific problems. We offer data and tools, and in return we get to learn how cities are thinking about applying technology to the challenges they face. Having this dialogue is really important: local governments are the creators and maintainers of a lot of very important geodata, while companies like Mapbox have unique data offerings that can bring new solutions to cities’ problems. But it’s not as simple as emailing each other zip files: there are questions about data quality, sustainability, and security that immediately come up. These partnerships are a great opportunity for figuring out workable answers.
Our Cities team has more projects going on than I can keep track of, but as a D.C. cyclist whose body contains more titanium than I’d like, our work with the city on Vision Zero safety analysis [ed.’s note: Vision Zero is a nationwide effort to eliminate all traffic fatalities and serious injuries] is what immediately comes to mind. I think it’s a great example of bringing new capabilities to longstanding questions and of the learning that can happen on both sides when you do.
New: Can you describe how Mapbox is building maps for autonomous cars?
Lee: We’re building a bunch of different things to support the move to driverless cars. There are five official levels of automation, ranging from 100 percent human-driven to 100 percent robot, and the system’s mapping needs change along the way. We’re creating technology that’s relevant to each of these steps, from the way the car’s center console renders maps for human drivers to network-efficient vector tile caching strategies for the high-def map data that robots crave.
The move toward full automation is changing how we think about maps, too. There is a smaller margin for error at the higher levels of automation; you can’t count on a human to pick up the slack. That means map data needs to be more precise and accurate than ever before. We think one big part of answering that challenge is moving away from the old idea of a map as something static. It used to be that atlas makers printed new maps once a year. That’s not going to cut it for autonomous vehicles. We think the only way to keep the map sufficiently up to date is to measure the world on a constant basis. So we’re building a living map that redraws itself in real time, based on the hundreds of millions of miles of anonymized GPS data that we collect every day.
But there will still be some things that GPS data can’t capture. One example is road closures: if a street is inaccessible because of a parade or a marathon or construction, that fact has to be recorded and made available to everyone running an autonomous vehicle ahead of time, not when traffic starts to pile up. Governments and industry haven’t figured out the best way to collect and distribute this information yet. But it’s critically important that we get this right—the ability to keep our public infrastructure open to everyone is at stake.
New: Earlier this year, you wrote an article explaining why open data is crucial for the geospatial industry. Why is that?
Lee: I think open data is basically magic. Most of the time when you spend money on something—food, or maybe someone’s time—you wind up with a finite amount of it to distribute as best as you can to the people who would find it useful. When you spend money creating open data you wind up with an infinite amount of it—you can make as many copies as would be useful. You never wind up with any less. It’s incredible.
And it’s incredibly useful—for lots of industries, but especially geospatial. All of us depend on our world being navigable, and so it makes sense to create shared resources that enable that. Having basic open data about our highways or rivers or terrain provides a kind of infrastructure to build on, just like our roads and electrical grid and sewer systems. It would be hugely wasteful if every business had to build its own pipes back to the water company—you’d wind up with tons of mostly empty pipes pointlessly running along the same routes and fewer businesses overall. This is a lesson we’re all still learning: if you walk around San Francisco for more than a couple of hours these days, you’ll probably see cars from a bunch of companies doing the exact same LIDAR mapping tasks.
The same thing might happen soon with U.S. Department of Agriculture’s (USDA) National Agricultural Imagery Dataset (NAIP). NAIP provides aerial imagery of the continental United States and though it was designed for agricultural monitoring, it’s used by a huge number of businesses, researchers, academics and local governments. USDA is considering making NAIP closed data. If that happens you’ll see resources wasted on buying the same imagery over and over—and worse, fewer innovative uses of imagery overall.
It takes resources to create data, and of course we have to draw the line somewhere. But facts about the physical layout of our world are useful to a huge number of people, so finding ways to create them as open data can be a great investment.
New: Mapbox frequently supports humanitarian aid efforts. Can you explain why mapping is so valuable in this sector?
Lee: Much of disaster recovery comes down to logistics, and good maps are essential parts of that. It’s a question of where to send that aid, and how. In a disaster it becomes incredibly hard to know what’s going on and where to send help, whether it’s because the cell network is down like it was in Puerto Rico or because the map has actually changed like it did after the Nepal earthquake, with roads and buildings disappearing. We help source fresh imagery of affected areas and get it into the hands of crisis mapping projects like the Humanitarian OpenStreetMap Team, whose volunteers trace new roads, tag damaged buildings or even identify potential helicopter landing sites.
What’s really exciting to me is that this field has moved beyond being reactive. I talked a little about data as infrastructure—that idea is relevant here, too. Although they’re both horrible situations, the recovery from a hurricane in Houston is very different than recovery from a hurricane in Haiti, and that comes down to infrastructure: physical systems and institutions that can respond. Places with weak infrastructure tend to have weak data, too. They’re just not mapped as well. The Missing Maps project identifies vulnerable parts of the world and works to improve their map data before disaster strikes, so that the response to a catastrophe can be more effective.
New: You have a long history of promoting civic-minded applications of data, leading Sunlight Labs for six years before you came to Mapbox. How have attitudes towards data changed in this space? Has there always been a strong desire to use data to address civic challenges, or is this a relatively new phenomenon?
Lee: The idea that collecting data and looking at it cleverly can improve things has been around a long time—at least since Frederick Taylor started measuring how quickly steelworkers could shovel coal. And it works! It’s just that it’s hard for us not to take it for granted after it does. We don’t think about the National Weather Service when we check our forecast apps, or the Census TIGER/Line dataset when we pull up driving directions, or U.S. Geological Service when we plan a hike. And that’s good: it means those data sources are getting out of the way, making it easy for people to benefit from them without thinking about it or jumping through hoops. But it does mean that we have to keep fighting to remind people why these systems are important and should be sustained.
I do think we’re at the cusp of something new, though. Millions of people carrying around networked sensors—we haven’t had that before, and it’s incredibly powerful. The applications this data can enable, like autonomous vehicles, are going to be transformative. We’re going to start relying on them quickly, and after that it will be difficult to change the systems that underpin them. So it’s vitally important that we get those systems set up correctly the first time. That means thinking carefully about the data we’re going to need, where it comes from and flows to, and how we can make it accessible to everyone.