When researchers at the Advanced Laser Interferometer Gravitational-Wave Observatory (LIGO) revealed in February 2016 that they had confirmed the existence of gravitational waves, the scientific community lauded this impressive achievement. But the paper the LIGO team published detailing their findings was a remarkable achievement itself—1,000 researchers from all around the world contributed their scientific expertise to produce the paper. Similarly, in May 2015, teams working on the Large Hadron Collider published a paper with a record-breaking 5,154 authors thanks to collaboration tools that allowed the teams to share their data with each other to develop the most precise estimate of the size of the Higgs boson to date. The rise of cloud computing and other data technologies have made it is easier than ever for researchers to collaborate on such large scales, accelerate the pace of discovery, and solve more challenging problems than ever before.
From particle physics to weather forecasting, scientists have historically been limited in what they could accomplish by their technical capacity to store and analyze data. But now, cloud computing has substantially reduced this technical barrier to scientific discovery by making low-cost and scalable storage and powerful analytics services available to scientists of all stripes, regardless of location or access to substantial technical and financial resources. And as Angel Pizarro, technical business development manager for the scientific and research computing team at Amazon Web Services (AWS) said at a recent panel discussion hosted by the Center for Data Innovation, as long as the scientific community desires these tools, cloud computing providers will be eager and able to scale up their infrastructure to meet this demand.
This proliferation of cloud services is particularly beneficial for data-intensive scientific disciplines, such as genomics and climate modeling, that rely on massive data sets to produce new insights. The technical demands of these fields, says Dr. Phil Bourne, associate director for data science at the National Institutes of Health, require researchers to “take computing to the data, not data to the computing,” if they intend to work with this data or collaborate in any meaningful capacity. And that is exactly what cloud computing can offer—AWS participates in the 1000 Genomes Project, for example, to allow researchers to access and analyze a massive trove of genetic data entirely in the cloud without the need to invest in powerful computing systems to work with the data locally.
The removal of this major technical barrier to scientific discovery and collaboration and its democratizing effect can reduce what Pizarro describes as “the time to science” because researchers can increasingly work at a pace not dictated by technological constraints. In addition, it allows scientists to tackle new and bigger problems. Jerry Sheehan, assistant director for scientific data and information at the White House Office of Science and Technology Policy, noted that by increasing the scientific community’s access to data, the scope of the potential of their research broadens dramatically. However, cultural and regulatory barriers still exist that limit the extent to which the scientific community can take advantage of this potential.
As data becomes an ever more important factor for scientific discovery, many in the scientific community take steps to protect against “data parasites”—researchers who use another’s data for their own work without meaningful reward for the original owner of this data. Ben Shneiderman, professor of computer science, at the University of Maryland and author of The New ABCs of Research: Achieving Breakthrough Collaborations, blamed this cultural challenge on an outdated attribution system in science, as even researchers who want to give credit to the data’s original creator have no effective method of doing so. Given the fierce competition for research funding, many scientists are understandably protective of the data they have devoted considerable effort to collecting. To help reduce this cultural barrier towards a more open scientific environment, Shneiderman proposed a type of assist system for research attribution. Just like a basketball player is credited with an assist if they help another player score a basket, researchers should receive a type of credit when others use their data to make important discoveries.
The regulatory barriers to open science, which include rules limiting how certain data can be shared and with whom, are less clear-cut than technical or cultural barriers, but still an important challenge that governments should address. For example, the U.S. government holds large amounts of valuable data that could aid a wide variety of research, if only researchers could access it. Sheehan noted that while 75 percent of federal scientific agencies have open access plans, openness is still not the norm in terms of sharing government data with researchers due to rules restricting access to sensitive data. And in Europe, Dr. Rene von Schomberg, team leader of science policy for the European Commission, pointed out that there simply is not enough political momentum to put open science on policymakers’ agendas. So while Europe may devote substantial amounts of funding to scientific research—€80 billion ($87 billion) in funding from the European Union for the period 2014 to 2020—lawmakers do not readily consider the benefits of sharing data when shaping policy. However, if the United States, European Union, and others can successfully align and standardize national and international policies to promote data sharing, said Sheehan, the resulting international collaboration would greatly benefit scientific discovery.
Private sector data technologies are helping the scientific community overcome major technical barriers for research, but governments should help solve the cultural and regulatory barriers that remain. With continued support from policymakers and cloud providers for robust public-private partnerships, the scientific community will have the tools it needs to tackle some of the world’s most pressing challenges.
Image: Har Gobind Singh Khalsa.