About Us
Publications
- Reports
- Filings
- Commentary
- Interviews
Blog
Issue
- AI
- Data Economy
- Education
- Ethics
- Government
- Health
- Internet of Things
- Privacy
- Open Data
- Public Safety
- Retail
- Science
- Social Good
Regions
Events
Press

About Us
Publications
- Reports
- Filings
- Commentary
- Interviews
Blog
Issue
- AI
- Data Economy
- Education
- Ethics
- Government
- Health
- Internet of Things
- Privacy
- Open Data
- Public Safety
- Retail
- Science
- Social Good
Regions
Events
Press

Home IssueArtificial Intelligence How Rules for Publicly Available Data Are Shaping the Future of AI

How Rules for Publicly Available Data Are Shaping the Future of AI

by Daniel Castro March 12, 2026

Artificial intelligence (AI) systems learn by analyzing vast quantities of digital information. As governments debate how to regulate AI, a central question has emerged: Should developers be allowed to train models on information that is publicly available on the Internet, even when that information contains personal data?

The answer will shape not only privacy protections but also the future trajectory of AI development. Publicly accessible websites, open databases, government records, and other online resources form a critical pool of knowledge that AI systems rely on to understand language, reason about the world, and verify information. At the same time, the ability of AI systems to analyze this information at scale raises legitimate questions about how AI systems should use and protect personal data.

Different jurisdictions have begun to approach these questions in different ways. The United States generally treats publicly accessible web data as available for automated collection unless site owners impose technical barriers. The European Union, by contrast, places broader restrictions on how organizations may process personal data—even when that information appears on public websites. As AI capabilities advance and agentic systems begin interacting with information and services across the Internet, these policy differences increasingly will influence where AI development occurs, and which countries will capture the economic benefits of AI adoption.

Policymakers can protect individuals while preserving the open information ecosystem that supports innovation. This approach can be grounded in three key principles:

Focus on outputs rather than training inputs. Address harmful uses of AI systems—such as revealing sensitive personal information—instead of restricting the collection of publicly available data for model training.
Encourage transparency norms for autonomous AI agents. Promote voluntary industry practices for AI developers to help people understand when they are interacting with automated systems, while allowing flexibility for evolving uses of agentic AI.
Create a safe harbor for responsible use of publicly available data. Provide legal certainty for developers that respect machine-readable opt-out signals from websites and that use automated tools to filter sensitive personal information during data preparation.

The Internet has long served as a shared source of public knowledge. In the AI era, it has become a foundational input for building systems that can reason, retrieve information, and interact effectively with the world. Policymakers in the United States, the European Union, and everywhere else that aspires to be at the forefront of AI development and adoption should ensure developers can continue using it that way.

Read the report.

0

Daniel Castro

Daniel Castro is the director of the Center for Data Innovation and vice president of the Information Technology and Innovation Foundation. Mr. Castro writes and speaks on a variety of issues related to information technology and internet policy, including data, privacy, security, intellectual property, internet governance, e-government, and accessibility for people with disabilities. His work has been quoted and cited in numerous media outlets, including The Washington Post, The Wall Street Journal, NPR, USA Today, Bloomberg News, and Businessweek. In 2013, Mr. Castro was named to FedScoop’s list of “Top 25 most influential people under 40 in government and tech.” In 2015, U.S. Secretary of Commerce Penny Pritzker appointed Mr. Castro to the Commerce Data Advisory Council. Mr. Castro previously worked as an IT analyst at the Government Accountability Office (GAO) where he audited IT security and management controls at various government agencies. He contributed to GAO reports on the state of information security at a variety of federal agencies, including the Securities and Exchange Commission (SEC) and the Federal Deposit Insurance Corporation (FDIC). In addition, Mr. Castro was a Visiting Scientist at the Software Engineering Institute (SEI) in Pittsburgh, Pennsylvania where he developed virtual training simulations to provide clients with hands-on training of the latest information security tools. He has a B.S. in Foreign Service from Georgetown University and an M.S. in Information Security Technology and Management from Carnegie Mellon University.

Sign up for our weekly newsletter

Email Address

Featured Reports

How Rules for Publicly Available Data Are Shaping the Future of AI
Public Sector AI Adoption Index 2026
How Data-Rich Workplaces Can Improve Worker Safety, Health, and Experience
How Experts in China and the United Kingdom View AI Risks and Collaboration
What Does The UK Public Think About AI?
From Cart to Claim: Addressing Product Liability in Online Marketplaces
What Does The U.S. Public Think About AI?
Picking the Right Policy Solutions for AI Concerns

Facebook
Twitter
Youtube
Email

Back To Top

How Rules for Publicly Available Data Are Shaping the Future of AI

5 Q’s with Oded Falik, CTO of Strand AI

10 Bits: The Data News Hotlist

You may also like