The Center for Data Innovation spoke with Will Matthews, Head of Global Data at Bloomberg Government. Matthews discussed how combining disparate data sets can reveal valuable insights about the legislative process, as well as the important role of combining human expertise with automated analytical tools.
This interview has been edited.
Joshua New: You’ve been analyzing and building tools with legislative and regulatory data for over five years. How has your work changed since the open data movement took off in the past few years?
Will Matthews: When machine-readable legislative data didn’t exist, we had to come up with a way to parse it or collect it from THOMAS and other sources, and we’ve spent the last five years essentially just building on top of that process. As an increasing amount of this data comes out in more usable formats, or as new technology is developed, we can re-tool our processes to get the same results without as much effort. However, our overall approach has not changed much. We have always focused on looking for the “next thing”—new types of information that we can use to build or improve our products—and that data is hardly ever published in a clean or useable way. So we’re always going to be crawling websites, relying on some manual input, or having analysts review new information to add context because we’re always trying to stay on the cutting edge.
New: A lot of arguments for making more legislative data publicly available focus on social benefits such as increased transparency or holding lawmakers accountable. But there is also a very strong economic argument for open legislative data. How does this data help you create value for your customers?
Matthews: A lot of what our customers pay us for is to make publicly available data more useful to them, such as by adding context and finding connections between siloed government data sets, which can be difficult to pull together. For example, we have a whole workflow for lobbyists that allows them to identify companies that have used lobbying services for particular issues and identify what specific legislation these companies are interested in based on information in their lobbying filings. We link this data to actions taken on a particular bill and the public positions that company has taken on that bill, which paints a pretty good picture of the influence they may have had. Essentially, we focus on figuring out how to group certain data sets together that provide useful insight into the broader policymaking context.
New: Your team combines both structured and unstructured data from Congressional calendars and records, regulatory developments, news stories, and other sources. What kind of things can you learn by combining all of these different data sources?
Matthews: We can define the whole scope of influence that a member of Congress has on a piece of legislation as they try to move it through a subcommittee, then through full committee, and then on the floor. We reconstruct that whole timeline so you can see what they’re doing and when they do it. You can also see the whole scope of influence third parties have on policy issues. For example if a company reports on a committee action, puts out a press release on pending legislation, or discloses lobbying activity, and they don’t get their desired outcome, you can see what the company’s next approach is. If a law was passed that they didn’t like, they might refocus their actions on the regulatory side of things and we can see what they’re saying when they file comments. We can even draw a connection far down the line if they try to influence an election involving a member of Congress who voted against an issue they supported.
New: How does human expertise play a role in turning this data into useful insights?
Matthews: A really important part of this process is having people that understand the information they’re working with. We get data on bill information and actions reported by Congress, but we also have people watching what’s happening on the Senate floor, or watching what’s happening in committee markups and votes and collecting information that Congress isn’t required to disclose. To do all this, we need people who understand the policies they are observing to provide context to all this data so if, for example, a proposal comes up in a subcommittee and a slightly different proposal comes up in a different committee, they can make that connection and understand how specific legislative langauge changes over time.
Of course, anybody can watch what’s happening on the House or Senate floors on C-SPAN. People come to us because we can distill all of this into actionable information so that they can use their time more productively and focus on the issues important to them.
New: Since so much of the data you pull in is not readily available in usable formats, can you describe what’s involved in making this data useful internally, before you can actually work with it to provide those kinds of actionable insights to your customers?
Matthews: We work very closely with our engineering teams to help develop technical processes and algorithms that can extract and normalize a lot of data before it ever even gets to our desks so we can make sure we’re using our time as productively as possible. Our analysts also help develop workflow tools that rely on machine learning by providing the “golden copy”—the best possible information about a particular issue—to ensure that these algorithms are training on the best data available. This is really important because we want to make sure that we’re always pulling in clean, usable, or reliable data. Until the government starts structuring and publishing all of this legislative data in the cleanest and most usable ways, which I don’t think anybody expects them to do any time soon, we need to scrutinize it pretty heavily. Other times, it can be third parties that create challenges to working with this data. For example, companies might not always describe themselves the same way when they disclose lobbying activities. If 100 employees at McDonald’s make campaign contributions, you can bet that McDonald’s will be spelled a couple different ways or with the apostrophe missing. We need to make sure we can standardize all of this information. We rely on automation to do a lot of this work, but our human experts are invaluable for making sense of this data.