AI-generated voices offer numerous benefits, from speeding up audiobook production to enhancing voiceovers for marketing and video games. However, the technology also poses significant risks, particularly when used for malicious impersonation in scams or political disinformation.
To address these concerns, some countries have introduced regulations requiring AI systems to label all AI-generated content, including audio, with watermarks—hidden signals indicating the content was created by AI. Unfortunately, this approach has limitations. Bad actors can remove watermarks from AI-generated audio and watermarks alone won’t prevent impersonation scams or disinformation.
Audio watermarking for AI-generated content involves embedding an imperceptible signal into an AI-generated audio file that only computers can detect. These watermarks are unnoticeable to listeners, preserving audio quality. However, common alterations, such as compressing a file, can remove the watermark. Developers have tried to make watermarks more resilient by embedding them in every section of an audio track, so they remain detectable even if the file is cropped or edited. Despite these efforts, even the most advanced watermarking techniques cannot prevent skilled and motivated attackers from removing them.
Imperfect protection offers little value, especially in urgent situations where audio watermarking provides no real defense. For example, if a scammer impersonates a trusted voice, like a family member, victims won’t pause and verify whether there is a hidden watermark—they’ll likely fall for the scam. Even if they do check, the absence of a watermark doesn’t prove that the content is authentic. The fake audio could have been created by an AI tool in a country without watermarking regulations, or by a human impersonator. To effectively combat AI-generated audio scams, policymakers should focus more on public awareness campaigns rather than relying solely on technical solutions.
Similarly, in a political context, audio watermarking won’t stop the spread of misinformation. Earlier this year, a robocall impersonating President Joe Biden urged recipients not to vote. Even if this AI-generated audio had included a watermark, unless recipients’ phones were recording and checking every call for AI-generated audio—a surveillance nightmare for many, and likely illegal in states requiring two-party consent for recordings—recipients still wouldn’t have known that the audio was fake. Even if experts later detected the watermark and alerted the public, the damage would have already been done.
Audio watermarking won’t mitigate the risks associated with AI-generated voice cloning. The challenge isn’t only technical but also social—how people consume and trust media. Until individuals can critically asses the media they consume, bad actors will continue to exploit AI, with or without watermarks.
Image Credits: Aaron Korenewsky via Midjourney