Data scraping, or the automated extraction of data from webpages or other human-readable documents, helps people gather publicly available information quickly and efficiently. Researchers and other groups use web scraping tools to pull data from public websites as an alternative to using more laborious manual methods. Unfortunately, the South Carolina Court Administration—the head of the state’s judicial system—bans data scraping on its website of public case records (known as the Public Index). In response, the American Civil Liberties Union (ACLU) and the South Carolina chapter of the National Association for the Advancement of Colored People (NAACP) filed a federal lawsuit on March 30 arguing that these restrictions inhibit access to public information. Regardless of whether the lawsuit succeeds, the state should eliminate these unnecessary restrictions on access to public sector data from the judiciary.
Some organizations have challenged web scraping under the Computer Fraud and Abuse Act (CFAA), a federal law on cybercrime. The CFAA states that anyone who “intentionally accesses a computer without authorization” or “exceeds authorized access” is in violation of the statute. Recent court decisions narrowed the interpretation of the CFAA by clarifying that merely breaching terms of service agreements or workplace policies do not violate CFAA. This interpretation suggests that using web scraping to collect public data is not a violation of the law because the information is already publicly accessible. The ACLU and NAACP’s lawsuit moves the debate about web scraping beyond CFAA.
South Carolina’s Public Index serves as the primary database for all legal documents in the state. Members of the public can search and find information on court records and filings pertaining to individuals and organizations from all 45 counties in the state, covering all case categories from family law to housing issues to criminal matters. Before accessing the database, individuals must accept the terms of service. These terms state that:
“Access to the South Carolina Judicial Department Public Index websites by a site data scraper or any similar software intended to discover and extract data from a website through automated, repetitive querying for the purpose of collecting such data is expressly prohibited.”
This legal restriction is dubious at best and harmful at worse. In its lawsuit, the NAACP claims that the ban on data scraping limits its ability to perform research on racial disparities in eviction cases in the state. Without streamlined access to these public documents, the NAACP faces the much more difficult task of manually downloading each record individually. Moreover, open access to this type of large legal dataset can help others, such as researchers or journalists, understand legal trends, analyze policy outcomes, and identify critical areas where legal needs are being unmet.
But it is not enough for South Carolina to merely allow web scraping in its terms of service, it should also allow users to more easily download its data, either through application programming interfaces (APIs) that allow users to directly query certain data or bulk downloads. Facilitating access to this government data is an important part of encouraging open data and government transparency. While occasional technical restrictions on data scraping may be reasonable, like for preventing denial-of-service attacks, the state should always make available options for legitimate users. In this case, South Carolina continues to restrict access to public data despite clear interest from its constituents. Web scraping is an important tool for accessing publicly available information, and no branch of government, at the federal or state level, should impose unnecessary technical or legal restrictions on this activity. South Carolina should update its policies on access to its court records and work to ensure open access to these records.