When it comes to open data, the U.S. government’s scientific agencies are usually ahead of the curve. Usually. In July 2015, the National Institute of Standards and Technology (NIST) launched the Reference Data Challenge to spur the development of apps that could make its repositories of Standard Reference Data (SRD)—quantitative information about physical science—more accessible and to lower barriers to taxpayer-funded research. Although the competition went off without a hitch, it highlighted a strange contradiction: NIST wants to make SRD as accessible as possible, and although it makes much of its SRD catalog freely available online, it still restricts access to a substantial portion of this catalog by putting it behind a paywall. Though there may have once been a legitimate justification for restrictively licensing this data, failing to make it freely available online not only contradicts federal open data policy, but also goes against NIST’s stated objective of driving innovation by increasing the accessibility of taxpayer-funded research.
Reference data is immensely valuable because it serves as a common and authoritative resource for researchers in the physical sciences. This information allows scientists and engineers who do not know how to generate this data or evaluate the quality of other data sources to devote more of their time to research, rather than wasting time unnecessarily duplicating it for themselves. For example, if a pharmaceutical company or regulatory agency wants to analyze the chemical purity of a particular drug, they can use a technique called atomic spectroscopy to identify tell-tale physical properties of different elements. It would be ridiculous for Pfizer, GSK, Merck, and the Food and Drug Administration all to develop their own comprehensive compendium of atomic spectra information to interpret what they find. Instead, they can simply turn to NIST’s Atomic Spectra Database to help analyze their findings.
Congress established the SRD program through the Standard Reference Data Act of 1968 after recognizing the value of making scientific reference data available to the scientific community and the public. To fund the program, the law allows the Commerce Department (which oversees NIST) to sell standard reference data, provided that “to the extent practicable and appropriate, the prices established for such data may reflect the cost of collection, compilation, evaluation, publication, and dissemination of the data.” In practice, this means that the Commerce Department can assign copyright to NIST’s reference data and make a license to this copyrighted SRD available for purchase or as a subscription.
Over time, NIST’s philosophy shifted to emphasize data as a public resource instead of a revenue stream. That’s why NIST at some point began to make large portions of its SRD catalog freely available for the same reason it launched its Reference Data Challenge—to fulfill its mission to “promote U.S. innovation and industrial competitiveness by advancing measurement science, standards, and technology.” Making the government’s valuable reference data freely available to lower barriers to scientific research is an incredibly effective strategy to do so. But NIST’s lukewarm half-measures raise an obvious question: Why has the agency not committed fully to this goal and made its entire SRD catalog freely available?
Beyond the tension between NIST’s mission and SRD copyrighting, the agency’s paywalls also contradict the spirit, if not the letter, of federal open data policy which require agencies to apply open licenses to their data by default. However the Commerce Department’s ability to apply copyright to reference data is enshrined in the U.S. Code, while federal open data policy does not carry the weight of the law. Given that Secretary of Commerce Wilbur Ross has discretion about whether or not to apply restrictive copyrights to this information, he could simply decide to make this data freely available in accordance with NIST’s mission and the spirit of open data.
It is strange that the Commerce Department has not already done so, especially considering the more innovation-friendly open data practices of other agencies. While cost-recovery concerns might explain bureaucratic reluctance to tear down NIST’s SRD paywalls, other scientific agencies such as the National Oceanic and Atmospheric Administration, which invests dramatically larger amounts of money in data collection, have gone to great lengths to ensure that the public can freely access their data. Simply put, the benefits to science, innovation, and the public that can be had from making such data freely available far outweigh the costs NIST would incur by developing its SRD catalog. Secretary Ross should apply an open license to all of NIST’s SRD catalog to maximize the benefits this crucial scientific resource can offer. American taxpayers already funded the creation of SRD; forcing them to pay up twice goes against the ideals of open data, NIST’s mission, and the public interest.