Companies in virtually every industry are increasingly leveraging data to operate more efficiently and effectively. The health care industry, for example, stands to gain $300 billion in value every year from better use of data, according to the McKinsey Global Institute. One of the major benefits of data is its usefulness in predicting outcomes, as the New York Times statistics guru Nate Silver proved in the 2012 elections. Companies now use data to predict everything from which movies a consumer will enjoy watching to when a machine is likely to need repairs.
As society increasingly relies on data to make predictions, some fear that they will be caught on the wrong side of an equation: a healthy individual might be denied medical coverage, a reformed criminal might be denied parole, or a financially-stable consumer might be denied credit. These fears are misguided. While critics are correct that no algorithm will be right 100 percent of the time, what many people do not realize is that we already live in a world of predictions, we are just not always that good at making them. Fortunately, better use of data analytics can help us make fewer errors.
Take the recent case of Sarah Murnaghan, a 10-year-old girl in Pennsylvania with cystic fibrosis who needed a lung transplant. For patients 12 and older the United Network for Organ Sharing requires doctors to assign a lung allocation score [PDF] based on the statistical likelihood of a successful transplant. The higher the score, the more likely the patient is to receive a lung. In contrast, children under the age of 12 receive donated lungs on a first-come, first-served policy that only takes into account blood type and distance from the donor hospital. The catch is that this applies only to donors who are also under the age of 12, but the vast majority of lungs come from older donors. This meant that Sarah was unlikely to receive a lung.
The rule, while presumably based on scientific data about organ donations to young children, is an example of the kind of unsophisticated predictive analytics that we use today. This rule used a single data point, the patient’s age, to prioritize organ donations. Her parents argued that this was unfair and a federal judge agreed with them. Accounting for other factors, doctors still might conclude that the odds of success were low. But by using more data, doctors could reduce the likelihood of excluding someone who would have been a viable recipient. (Sarah has since undergone two lung transplants.)
While it is easy to see the unfairness in this situation with just using age, we should not assume using more data eliminates biases. Better prediction means that we will get the answer wrong less frequently, but it does not mean we will ask the right question. For example, should we allocate organs to those who have waited the longest, those who are most likely to die while waiting for a transplant, or those who have the highest chance of survival? And what happens if the rules systematically exclude certain individuals? Underserved populations who receive inadequate information about transplants may not be added to the waitlist until long after their diagnosis. These individuals are at a disadvantage when organs are allocated on a first-come, first-served basis.
As health care providers and others increasingly use advanced data analytics to make decisions, we should remember two things. First, prediction is not something to fear—we have always done it, it will always be imperfect, but technology can help us do it more accurately. Second, data analytics should not be a black box—we need to challenge its underlying assumptions and values. This will allow us to gain the benefits of data while avoiding both the tyranny of the bureaucrat and the tyranny of the algorithm.
This article originally appeared in IEEE Spectrum.