Thursday, January 11, 2018

"AI learns how to fool text-to-speech. That’s bad news for voice assistants"

From The Next Web:
A pair of computer scientists at the University of California, Berkeley developed an AI-based attack that targets text-to-speech systems. With their method, no matter what an audio file sounds like, the text output will be whatever the attacker wants it to be.

This one is pretty cool, but it’s also another entry for the “terrifying uses of AI” category.

The team, Nicholas Carlini and Professor David Wagner, were able to trick Mozilla’s popular DeepSpeech open-source text-to-speech system by, essentially, turning it on itself. In a white paper published last week the researchers state:
Given any audio waveform, we can produce another that is over 99.9% similar, but transcribes as any phrase we choose (at a rate of up to 50 characters per second) … Our attack works with 100% success, regardless of the desired transcription, or initial source phrase being spoken. By starting with an arbitrary waveform instead of speech (such as music), we can embed speech into audio that should not be recognized as speech; and by choosing silence as the target, we can hide audio from a speech-to-text system.
This means they can, hypothetically, take any audio file and convince a text-to-speech converter – like the one Google Assistant, Siri, or Alexa use to figure out what you’re saying – that it’s something else. That’s pretty heavy in a world full of smart speakers and voice assistants....MUCH MORE
Relatedly, Lyrebird needs only one minute of audio to create speech that never actually happened.
This video from last year still sounds a bit off but points to a rather scary future world where your skeptical audience can say "Video, or it didn't happen" and you can create the video. That didn't happen.
But appears to have done so.