Audio Watermarking Won’t Solve the Real Dangers of AI Voice Manipulation

by Justyna Lisinska October 18, 2024

written by Justyna Lisinska October 18, 2024

AI-generated voices offer numerous benefits, from speeding up audiobook production to enhancing voiceovers for marketing and video games. However, the technology also poses significant risks, particularly when used for malicious impersonation in scams or political disinformation.

To address these concerns, some countries have introduced regulations requiring AI systems to label all AI-generated content, including audio, with watermarks—hidden signals indicating the content was created by AI. Unfortunately, this approach has limitations. Bad actors can remove watermarks from AI-generated audio and watermarks alone won’t prevent impersonation scams or disinformation.

Audio watermarking for AI-generated content involves embedding an imperceptible signal into an AI-generated audio file that only computers can detect. These watermarks are unnoticeable to listeners, preserving audio quality. However, common alterations, such as compressing a file, can remove the watermark. Developers have tried to make watermarks more resilient by embedding them in every section of an audio track, so they remain detectable even if the file is cropped or edited. Despite these efforts, even the most advanced watermarking techniques cannot prevent skilled and motivated attackers from removing them.

Imperfect protection offers little value, especially in urgent situations where audio watermarking provides no real defense. For example, if a scammer impersonates a trusted voice, like a family member, victims won’t pause and verify whether there is a hidden watermark—they’ll likely fall for the scam. Even if they do check, the absence of a watermark doesn’t prove that the content is authentic. The fake audio could have been created by an AI tool in a country without watermarking regulations, or by a human impersonator. To effectively combat AI-generated audio scams, policymakers should focus more on public awareness campaigns rather than relying solely on technical solutions.

Similarly, in a political context, audio watermarking won’t stop the spread of misinformation. Earlier this year, a robocall impersonating President Joe Biden urged recipients not to vote. Even if this AI-generated audio had included a watermark, unless recipients’ phones were recording and checking every call for AI-generated audio—a surveillance nightmare for many, and likely illegal in states requiring two-party consent for recordings—recipients still wouldn’t have known that the audio was fake. Even if experts later detected the watermark and alerted the public, the damage would have already been done.

Audio watermarking won’t mitigate the risks associated with AI-generated voice cloning. The challenge isn’t only technical but also social—how people consume and trust media. Until individuals can critically asses the media they consume, bad actors will continue to exploit AI, with or without watermarks.

Image Credits: Aaron Korenewsky via Midjourney

Justyna Lisinska

Justyna Lisinska is a policy analyst at the Center for Data Innovation. Previously, she served as a policy research fellow at King's College London, where she developed a policy program for the UK's largest project on autonomous systems. She also has experience working within the government and with government officials. Justyna holds a PhD in Web Science from the University of Southampton.

Audio Watermarking Won’t Solve the Real Dangers of AI Voice Manipulation

Comments to the Bureau of Industry and Security on Proposed Rule for Reporting Requirements on AI Models and Computing Clusters

Visualizing Partisan Food Preferences in Election Campaigns

You may also like