Web scraping—the use of automated tools to extract data from websites—helps businesses, researchers, and others quickly and efficiently gather publicly available information from the Internet, such as consumer product reviews or social media posts, that would otherwise require significant labor to collect. Unfortunately, uncertainty about the legality of scraping under the Computer Fraud and Abuse Act (CFAA) has limited how and when organizations gather publicly available data. However, a recent Supreme Court decision has provided more clarity on the CFAA, opening the doors for more legal web scraping.
Enacted in 1986, the CFAA states that anyone who “intentionally accesses a computer without authorization” or “exceeds authorized access” is in violation of the statute. However, the CFAA does not clearly define what this means in practice. For decades, a circuit split meant that different parts of the country operated under different interpretations—criminal penalties could apply to those who scraped websites in some states but not in others.
In 2016, LinkedIn filed suit against hiQ Labs, a start-up that provides analytics to employers about their workers, such as predictions about employee attrition, for scraping publicly available data in user profiles. LinkedIn argued that scraping data amounts to hacking under the CFAA since the company prohibits this practice, while HiQ argued that the CFAA does not apply to publicly available data. The Ninth Circuit Court of Appeals ruled in hiQ’s favor, but LinkedIn appealed. The Supreme Court announced on June 14 it would send hiQ Labs v. LinkedIn back to the Ninth Circuit in light of a recent ruling in a similar case that clarified a historically controversial section of the CFAA.
The relevant case, Van Buren v. United States, involved a police officer who accessed information in a law enforcement database for a third party in exchange for money. He argued that while his actions were improper, he had lawful access to the section of the computer which he accessed. Therefore, his actions should not be prosecuted under the CFAA as hacking. On June 3, the Supreme Court agreed that his actions did not fall under the category of “exceeds authorized access.” Through the majority opinion, the Court narrowed the meaning of the CFAA, rejecting the view that violating terms of service agreements or workplace policies constitutes hacking.
This is an important ruling because it clarifies that simply violating computer-use policies, such as using a work computer for personal purposes in violation of company policy, may be grounds for termination but is not a violation of the CFAA. The decision also suggests that using web scraping to collect public data is also not a violation of the CFAA, although the courts will still need to clarify whether circumventing technical measures triggers a violation. For example, if a website blocks the IP addresses of web scrapers or implements CAPTCHAs to prevent bots from accessing their sites, can web scrapers lawfully use tools like web proxies to evade these measures? If these websites are otherwise available to the general public, then the CFAA should not penalize web scrapers for using these tools.
It is important to note that allowing web scraping does not force companies to turn over proprietary information or violate consumer privacy. Companies and individuals can still choose what information to make publicly available and what information to keep private, but if they make data publicly available, they cannot use the CFAA to impose restrictions on how others use it.
Web scraping has many uses in a variety of fields, so much so that learning this skill is a core requirement in many data science programs. For example, researchers can use data collected through web scraping to perform content analyses on politicians’ social media, compare pricing trends, or track business developments. Investors may use web scraping to identify important trends and opportunities, and cybersecurity firms may use web scraping to spot.
Neither the courts nor Congress should make it illegal to collect publicly available data. The Supreme Court was correct in its interpretation of the CFAA, and now, it is up to the Ninth Court to use the new interpretation to settle the web scraping debate once and for all.
Image credits: Flickr user