The Center for Data Innovation spoke with Kevin Merritt, the founder and CEO of Socrata, a company headquartered in Seattle that provides cloud-based open data solutions.
Daniel Castro: Socrata recently launched the open data network. Can you tell me a little bit about this initiative?
Kevin Merritt: This is a pretty exciting initiative to really connect the data publishers with the consumers of data and try to get their data into circulation where it can have the broadest impact on the everyday lives of individuals. The open data network is focusing a little bit more on what I’ll call the consumption side of government data publishing.
For the last five years, Socrata has worked with the publisher side almost exclusively. We’ve signed up a number of major cities and counties, and states and national agencies to use our open data platform to kind of solve all of the hairy back office problems related to getting data out of their enterprise data silos and into a useful form on the web.
Now, what we’re doing is focusing on the consumption side. Probably the easiest way to describe it is by example. If you are a young couple and your family is expanding, you’re now thinking about maybe moving out of the small apartment you live in. You want to buy a new home. What kinds of information will influence where you’re going to buy that home? Well, certainly it’s the value of the house. You’ve got a budget that you have to live within. I would add to that the safety of the neighborhood, the quality or performance of the schools in the neighborhood, and things like access to public transportation probably influence what neighborhood you choose to live in and ultimately what home you choose to buy.
Today, the state of the art is that the government publishes that data on government websites. If you’re really so inclined you can go and find that data on government websites and make some informed decisions. What we’re trying to do is actually take that to the next level and get data to where people are already going. Here in the United States, many people use Zillow as a great starting point to go and find a neighborhood and a home to buy. As part of the open data network, we signed a partnership agreement with Zillow to provide them better, streamlined access to not only housing data and things like property value data, but also public safety, school performance, and public transportation data so that it can inform and update the consumer web service that Zillow offers.
Housing is the first vertical that we introduced into the open data network. The first industry, if you will. We’ve got 20 or 25 other industries that are coming behind it. These are the kind of natural categories of data that governments publish. Housing is one, transportation would be another, public safety would be a third, financial data, environmental data, school performance data, and so on. Twenty or 25 of these different industries will all be part of the open data network and we’re signing partnership agreements with companies largely that want to use this data in a more efficient manner. We’re working towards helping governments publish their data in formats that adopt open data standards to make it easier for the data to be used.
Castro: We saw that San Francisco, Dallas, and Kansas City were three of the first governments to announce their participation in the Open Data Network. What are some of the interesting data sets that these cities have already released?
Merritt: Yes. Kansas City, Dallas, and San Francisco collectively have more than a thousand housing data sets already available on their open data sites that are powered by Socrata. Now those data sets are available through the Open Data Network as well. Some examples of the kinds of data that are in these housing data sets are building permits, property information maps, and a pipeline report of new construction going on in San Francisco. All of these data sets in San Francisco have been super important to their mayor, who’s responding to what I’ll call a housing availability and affordability crisis. It’s very difficult right now to find housing in San Francisco and the mayor is trying to address those needs by making this housing data readily available.
I mentioned Zillow already. Zillow and other Open Data Network partners have showed great interest in this highly cleansed data that we’re providing from Kansas City, Dallas, and San Francisco, as well as some safety data. All of this data conforms to a standard called House Facts. House Facts is an open standard. It’s not designed by Socrata, but it’s something that we have been able to take to our platform to make it super easy for those participating governments to publish their data in conformance with that standard.
As an example of who’s using that data that’s now published through the House Facts standard, there’s a company called Civic Insight, which came out through the Code for America incubator. They’re operating pretty prolifically in Dallas using that House Facts data.
Castro: You just announced a new partnership with the National League of Cities. What do you hope this partnership will achieve?
Merritt: Yes, I’m super excited to have partnered with the National League of Cities. This is an organization that has been doing some enormous advocacy group on behalf of, I think, 19,000 cities in the U.S. over the last couple of decades. Their work is super important. I’ve gotten to know them over the last two or three years. Now that we’ve launched the Open Data Network, it became really obvious how Socrata and the National League of Cities can really work together in a couple of dimensions.
First, if you think about one of the core missions of the National League of Cities, it’s to be able to compare, share, and benchmark performance data across cities. If you want to understand why Salt Lake City spends 125% as much per person on parks and recreation as Kansas City (and I don’t know, I’m just making it up), those are the kinds of things that the National League of Cities wants to understand.
As a member of the Open Data Network and as part of this partnership, we will help promote standards mostly around financial performance data. Right now virtually every government puts their budget and expenditures online but it’s hard to make any kinds of comparisons. It’s hard to share the data. It’s hard to look at it across jurisdictions. National League of Cities will help establish some standards around reporting of financial data so that we can compare and share and benchmark across those jurisdictions.
Then from the perspective of Socrata, if you think about the customers that Socrata works with today, they are really large, major cities: New York, San Francisco, Los Angeles, Seattle, Boston, Austin, Raleigh, and so forth. National League of Cities can really bring us into their 19,000 city membership in a much more efficient way. They’ve got great outreach, great communication mechanisms, and great industry events. This will allow us to bring our technology into these smaller mid-size cities far more effectively than they could do on their own.
Castro: Kevin many of the early adopters of open data were these big cities. As small cities adopt open data policies, what do you think are the new challenges they face and how should they approach those challenges?
Merritt: I’m often asked if open data is a concept that only applies to major cities. We’re seeing evidence that it’s not. The small- and mid-size cities are doing it as well. I’ve spent quite a bit of time in Europe this summer. We just signed three or four cities of customers in the Barcelona region, none of which have more than 75,000 inhabitants as a population. One of which has 30,000 inhabitants — so some fairly small cities.
Transparency and accountability are important regardless of the size of city that you have. What we’re finding is local data has value regionally — not just within cities. Here in the greater Seattle area, the city of Seattle has jurisdiction over something and King County has jurisdiction over something else. I actually live in the city of Bellevue so if I want to figure out transit data, I’m going from Bellevue into King County into the city of Seattle. Open data has regional value and we need to make sure that we are meeting the needs of these smaller cities as they think about participating in a broader, regional open data initiative.
Some cities are doing it through policy and legislation. Some cities are enacting it. The good news is enacting legislation or policy in small cities is fairly efficient. It’s usually a couple of city council meetings for something to be approved and enacted. They can quickly adopt open data as a strategy and implement the policy of air cover in a course of 60 or 90 days. Then, what I find is the biggest challenge that they have is just getting the data out of the underlying systems. Many of these smaller cities don’t have a large IT staff so they need effective tools, which, very candidly, Socrata can provide to help them get data out of their underlying systems, and available into a web format that is useful to the audiences that want to use this data.
Castro: One challenge governments have is measuring their progress on open data. Counting the number of data sets is not necessarily the best indicator. How do you think government agencies should assess their progress in this area?
Merritt: I think that’s a great question and I would start by saying one data set online is infinitely more than zero data sets online. Every government has to start somewhere. We just had a new customer launch in England in the city of Bath. They literally launched with one data set. It happened to be a very important data set about air quality and they ran a little hack-a-thon and generated ton of interest in having developers in the community try to analyze that data set.
I guess the second thing that I would offer is that anything you do without measurement will have unpredictable results. If you’re committed to something, you need to find some way to measure your progress and your performance. Early on, the number of data sets seems to be an easy to measure attribute and something that you can track. I wouldn’t discourage it. I would just highly encourage governments to evolve their thinking about measuring the success of their open data initiative in terms of usage. What I mean by that is not so much the number of data sets or even the number of page views, but it’s more about measuring the actual usage of the data as it is disseminated and circulated through the ecosystem and through the ether. if you will. Kind of going back to the first question about the open data network, if you’re a government and you’re publishing your data, and it’s getting into Zillow and it’s informing how people are buying their houses, and it’s even informing schools trying to improve their performance because they know it influences house values, you’re fulfilling your mission around putting open data online.
I would argue, in that context, the only real way to measure success is usage. One of the capabilities of the Socrata platform that our customers have access to is an analytics-like dashboard that shows you not only the number of data sets you have published, the number of page views, and all kind of the obvious things, but also things like the number of API calls, the number of developers who are building applications against your open data sets that are published online, and other measures of usage that are far more meaningful than just the number of data sets.
Castro: You talked earlier a bit about housing data and your success with Zillow. What other industries do you think offer the most low-hanging fruits for using open data?
Merritt: The two other ones that I would really highlight are transportation data and health data. In 2009, when Socrata decided to focus exclusively on open data, I got on the phone and I called a lot of cities to try to educate them a little bit about open data and tried to inform them about who Socrata is. At that time, pretty much every city in the United States was funding the development of a transportation app to basically tell you the bus schedule on your mobile phone. I cannot think of a city in the U.S. of, say, more than 250,000 inhabitants that is funding the development of a mobile application any longer.
Now what they’re doing is just putting their transportation data online and they know that Google and Bing are going to come by and ingest that data to have it inform Bing Maps and Google Maps. There are civic developers that are building great transit apps — here in Seattle we have a great app called One Bus Away. Then there are companies emerging like City Mapper in the UK, and Embark, which was recently acquired by Apple that are building businesses around creating great transportation apps for the mobile device.
In healthcare, I just think that there’s enormous opportunity to improve the delivery of healthcare, to reduce the cost of healthcare, to improve efficiency of healthcare by making this data readily available online. The good news is our biggest segment in the national level is health data and we’re fortunate to work with a number of state-level health agencies. They all have an enormous appetite for putting their health data online. There’s some privacy and confidentiality concerns that do need to be mitigated and they’re working on those.
Think about Medicare: $600 billion a year in entitlements and some significant percentage of that is lost to fraud. When you can put all those claims and data records online while protecting the privacy of individuals, you can imagine that the algorithms that have been effective in the credit card industry will also be applicable to analyzing health data and we’ll be able to eliminate or certainly reduce some of that fraud. Health and transportation would be two next low-hanging fruit areas that I would identify for the open data movement.