The Center for Data Innovation spoke to Eric Vance, director of the Laboratory for Interdisciplinary Statistical Analysis (LISA) 2020 Program, which has created a network of statistical collaboration laboratories in developing countries. Vance discussed how the LISA network fosters collaborative education and research in data science to solve real-world problems.
Hodan Omaar: How does the ability to innovate with data differ around the world and how does the LISA network address any gaps?
Eric Vance: I think the ability to innovate is present everywhere in all regions and corners of the world. However, proficiency with mathematics, statistics, and computing varies within any country and between countries based on access to education and the relative importance given to each field. The different capabilities regions have with mathematics, statistics, and computing can create differences in the skills to collect, model, and analyze data, which in turn manifests as differences in the ability to innovate with data.
The LISA 2020 Network addresses these gaps by supporting the creation and sustainable operations of statistics and data science collaboration labs in developing countries. The fundamental idea of LISA 2020 is that individual statisticians or data scientists can have an enormous positive impact by collaborating with researchers, businesses, and policymakers to develop data-driven innovations. A collection of these collaborative statisticians and data scientists can have even more impact because no single person always has the needed data expertise, but together they may. Our laboratories, which we call “stat labs,” support interdisciplinary research and innovation by bringing together local statisticians and domain experts, and providing them with the training and tools they need to solve problems for real-world impact.
Because stat labs can use the projects they work on to educate and train students and other staff, they can also build their own capacity for data-driven development. Not only do we multiply the impact of stat labs by creating a network and sharing best practices, we also improve statistical skills and data literacy in the larger community by teaching short courses and workshops.
Omaar: One of the interesting projects your network is pursuing is exploring ways to improve the electoral process for voters in Nigeria. Can you talk about the work you are doing and what lessons other countries can draw from it?
Vance: Voting participation in Nigeria has been on a downward trend for decades, imperiling the quest for good governance in a representative democracy. One of our LISA 2020 Network stat labs, the University of Ibadan’s Laboratory for Interdisciplinary Statistical Analysis (UI-LISA), collaborated with the Independent National Election Commission (INEC) of Nigeria to investigate the factors responsible for voters’ increasing apathy. They also answered questions about the quality of the voter register and the conduct of the registration, accreditation, and voting processes.
Answering these questions was a multi-step process: First, INEC produced data about voters and non-voters from their administrative records and collected new data through well-planned surveys that UI-LISA helped design. Second, UI-LISA modeled and analyzed these data to produce findings, conclusions, and recommendations for policy changes. Fortunately, INEC is also a policy decision-making body, so they are in a position to conduct the third step, which is to transform the statistical evidence they helped produce into action to improve policy for national development. This step is arguably the hardest and most important and is still ongoing.
Some of the recommendations that emerged from the analysis were technical and procedural in nature, such as reallocating registration areas and polling units using geostatistical algorithms to better reflect the current distribution of the voting-age population. Other recommendations are attempts to address the “messy” human problems of apathy and distrust through targeted enlightenment and educational campaigns of specific demographics. A general lesson for everyone is that we can best innovate for data-driven development when we collaborate within the intersection of data producers, data analyzers, and data decision-makers.
Omaar: From your perspective, what can be done to encourage the positive impacts of data-driven innovation while minimizing risks from harm?
Vance: One lesson we have learned is that local data producers, statisticians, data scientists, and decision-makers can be very effective in solving local challenges because they better understand the context and ramifications of the work.
Another thing is that everyone involved with a data-driven innovation should be conscious of both the positive impacts and the potential harm of their work. I teach data science to undergraduate students at the University of Colorado Boulder and they collaborate on multiple data science projects every semester. For every project, they are required to reflect on who might benefit from their analyses, who might be harmed, and why. Building ethical thinking throughout data science education will help—somewhat—to minimize risks of harm.
Omaar: What are the main challenges to improving equal access to data resources around the world?
Vance: I think the main challenge is equal access to quality education and training in mathematics, statistics, and computing. Countries lacking educators in these areas risk falling further behind as data-driven innovation accelerates.
I see the stat labs of the LISA 2020 Network as potential “leapfrog” innovations that can enable countries to catch up and raise the levels of data innovation globally. Projects that stat labs currently work on—sometimes with mentoring from across the network—can help the labs, and especially the students within the labs, learn to solve locally relevant problems. For example, several students worked on the UI-LISA electoral participation project. These students learned techniques for survey sampling, standard statistical analyses such as t-tests and chi-squared analyses for contingency tables, and data visualization. These skills can be useful for so many future projects, and with each new project, the statisticians and data scientists of the stat labs learn more methods or innovate new ones. Working on projects with appropriate mentors can help them to quickly become very good at applying statistics and data science to innovate local solutions to local challenges.
Omaar: Looking forward, what do you hope to achieve in the next 5 years?
Vance: Our goal from 2012 to 2020 was to create a network of 20 stat labs by 2020, hence the name LISA 2020. We accomplished that by having 28 full member stat labs from 10 developing countries in our network by the third UN World Statistics Day (October 20, 2020). Now we have 34 labs in our network, with 14 more in the process of becoming full members.
Our immediate goal is to help strengthen and sustain each stat lab by enhancing their overall quality. We are working towards improving the education and training we provide to students working in the stat labs, especially in the areas of statistical computing and how to better collaborate with domain experts. Along with my colleague Heather Smith, we have developed the ASCCR framework to better learn and teach interdisciplinary collaboration in statistics and data science. This framework has five components (Attitude, Structure, Content, Communication, and Relationship) and works well in the United States. I’d like to collaborate across the network to translate or adapt the framework as needed to be more appropriate for the local contexts of collaborative statisticians and data scientists around the world.
By the fourth World Statistics Day (October 20, 2025), I hope that our network of stat labs is vibrant and strong and that we will be in a good position to expand to all countries with, for example, our stat labs in Nigeria serving as mentors for new labs in Europe or North America. It’s not just researchers, businesses, and policymakers in developing countries who could benefit by collaborating with expert statisticians and data scientists. It’s true everywhere.