The Center for Data Innovation spoke with Jay Ulfelder, a political scientist and international relations forecasting expert who runs the popular social science blog Dart Throwing Chimp. He worked for the Science Applications International Corporation as research director for the Political Instability Task Force from 2001-2010, after which he became an independent consultant. Ulfelder weighs in on how political scientists should approach complexity and discusses the intricacies of forecasting mass atrocities using sparse datasets.
This interview has been lightly edited.
Travis Korte: First, can you speak about some of the projects you’re currently working on?
Jay Ulfelder: At this point, I’m working more or less full time with the U.S. Holocaust Memorial Museum’s Center for the Prevention of Genocide to build a new public early-warning system for mass atrocities around the world. We expect to launch that system this spring. For the past year, I’ve also been working with the Good Judgment Project to help formulate and write questions on international affairs for thousands of participants to make forecasts about. I’ve also continued to do occasional piece work for my old research program, the Political Instability Task Force.
TK: Talk a little bit about your approach to the mass atrocity early-warning system. How much of an issue is data availability, and what are some of the other challenges mass atrocity forecasting presents? Are there certain types of events that are easier to predict than others?
JU: The problem of forecasting mass atrocities has two features that make it exceptionally hard to do well. First, the phenomenon itself is complex and rare, and second, we don’t have very good measures for many of the things theory tells us to consider, including the mass atrocities themselves.
The scarcity of data on relevant features—things like the presence and behavior of armed groups or the tenor of political discourse—gets very frustrating, but it’s also a very expensive problem to ameliorate, so it’s not something we’re in a position to fix.
The scarcity of historical examples on which we can train our models is also frustrating in a statistical sense, but it’s also oddly liberating. With a small number of examples of a rare event, we can expect relatively simple models like logistic regression to perform about as well at forecasting as more complicated and difficult-to-implement techniques, so we don’t have to spend a lot of time worrying about optimizing at the margins.
As for approaches, I’m all about ensembles. For the statistical risk assessments, we’re averaging forecasts from two logistic regression models representing the two competing theoretical strains on the subject and then a Random Forest that uses all of the variables in those two logit models. For the larger early-warning system, we’re complementing the statistical forecasts with output from an opinion pool, which gives us the unweighted average of forecasts from a collection of people with relevant expertise. We’re also experimenting with wiki surveys as yet another way to elicit estimates of relative risk from an even larger crowd. Within each of these modules and then for the system as a whole, the principle is to collect, compare, and combine as many plausible forecasts as we can.
TK: You’ve written before about the limits of understanding inherent in large, complex systems, such as those you work with in an international relations context. If there are fundamental limits on how well we can understand and predict human social dynamics, how ambitious should we really be about forecasting them?
JU: I think we should be very ambitious about trying to forecast as well as we can on topics of social importance. Political science has eschewed forecasting for so long that even simple and only moderately accurate models often represent a big improvement over the status quo. And on things like war and atrocities and unrest and democratization, those marginal improvements can help produce better policy and advocacy.
At the same time, I think we also need to be realistic about the limits of what we can achieve and know when to set a problem aside for a while. Once you’ve worked the same data sets over and over with a variety of techniques, there’s not much point in running yet another regression or tweaking the parameters one more time. Better to focus your energy elsewhere until you get some new features to add to the mix or a significant new batch of examples.
TK: Relatedly, given these limits, what do you think the quantitative study of political development and instability will look like when the field is “mature?”
JU: Actually, I reject the premise of this question. I’m not sure the quantitative study of political development will ever mature, in no small part because the systems that field examines keep getting more and more complex and at an accelerating rate. That never-ending evolution affects both the object of study and the tools we use to study it, and I can’t imagine it stopping or plateauing in a way that would allow us to characterize the field as having matured.
TK: Finally, the term “computational social science” and its associations have recently been a topic of considerable discussion online. Do you identify with the term? How would you describe the field, if such a thing exists, that you work in?
JU: I’ve only recently seen the term “computational social science” and have mixed feelings about it. I’m thrilled to be living and working in this field at a time when the analytical tools and data we have available to us are expanding so rapidly, and I think that term celebrates those developments. But I also trained as a qualitative comparativist—I did an area studies degree on the USSR and Eastern Europe as an undergrad and wrote a dissertation that relied on qualitative and statistical methods—and believe that kind of thinking is still an important part of theory-building, too, and I don’t know how you get science without theory. So I guess that instead of adding “computational” to the rubric, I’d rather underline the “science” part and simply assume that using the best available information and methods is implicit in that.