The Washington PostDemocracy Dies in Darkness

An artificial-intelligence first: Voice-mimicking software reportedly used in a major theft

September 4, 2019 at 6:27 p.m. EDT
A fake video featuring former president Barack Obama. A new worry: fake voice recordings that can be used to persuade people that they're being asked to do something by an authority. (AP)

Thieves used voice-mimicking software to imitate a company executive’s speech and dupe his subordinate into sending hundreds of thousands of dollars to a secret account, the company’s insurer said, in a remarkable case that some researchers are calling one of the world’s first publicly reported artificial-intelligence heists.

The managing director of a British energy company, believing his boss was on the phone, followed orders one Friday afternoon in March to wire more than $240,000 to an account in Hungary, said representatives from the French insurance giant Euler Hermes, which declined to name the company.

The request was “rather strange,” the director noted later in an email, but the voice was so lifelike that he felt he had no choice but to comply. The insurer, whose case was first reported by the Wall Street Journal, provided new details on the theft to The Washington Post on Wednesday, including an email from the employee tricked by what the insurer is referring to internally as “the false Johannes.”

Now being developed by a wide range of Silicon Valley titans and AI start-ups, such voice-synthesis software can copy the rhythms and intonations of a person’s voice and be used to produce convincing speech. Tech giants such as Google and smaller firms such as the “ultrarealistic voice cloning” start-up Lyrebird have helped refine the resulting fakes and made the tools more widely available free for unlimited use.

But the synthetic audio and AI-generated videos, known as “deepfakes,” have fueled growing anxieties over how the new technologies can erode public trust, empower criminals and make traditional communication — business deals, family phone calls, presidential campaigns — that much more vulnerable to computerized manipulation.

“Criminals are going to use whatever tools enable them to achieve their objectives cheapest,” said Andrew Grotto, a fellow at Stanford University’s Cyber Policy Center and a senior director for cybersecurity policy at the White House during the Obama and Trump administrations.

“This is a technology that would have sounded exotic in the extreme 10 years ago, now being well within the range of any lay criminal who's got creativity to spare,” Grotto added.

Viral Chinese app Zao puts your face in place of Leonardo DiCaprio’s in ‘deepfake’ videos

Developers of the technology have pointed to its positive uses, saying it can help humanize automated phone systems and help mute people speak again. But its unregulated growth has also sparked concern over its potential for fraud, targeted hacks and cybercrime.

Researchers at the cybersecurity firm Symantec said they have found at least three cases of executives’ voices being mimicked to swindle companies. Symantec declined to name the victim companies or say whether the Euler Hermes case was one of them, but it noted that the losses in one of the cases totaled millions of dollars.

The systems work by processing a person’s voice and breaking it down into components, like sounds or syllables, that can then be rearranged to form new phrases with similar speech patterns, pitch and tone. The insurer did not know which software was used, but a number of the systems are freely offered on the Web and require little sophistication, speech data or computing power.

Lyrebird, for instance, advertises the “most realistic artificial voices in the world” and allows anyone to create a voice-mimicking “vocal avatar” by uploading at least a minute of real-world speech.

The company, which did not respond to requests for comment, has defended releasing the software widely, saying it will help acclimate people to the new reality of a fast-improving and “inevitable” technology “so that society can adapt.” In an ethics statement, the company wrote: “Imagine that we had decided not to release this technology at all. Others would develop it and who knows if their intentions would be as sincere as ours.”

Top AI researchers race to detect ‘deepfake’ videos: ‘We are outgunned’

Saurabh Shintre, a senior researcher who studies such “adversarial attacks” in Symantec’s California-based research lab, said the audio-generating technology has in recent years made “transformative” progress because of breakthroughs in how the algorithms process data and compute results. The amount of recorded speech needed to train the voice-impersonating tools to produce compelling mimicries, he said, is also shrinking rapidly.

The technology is imperfect, and some of the faked voices wouldn’t fool a listener in a “calm, collected environment,” Shintre said. But in some cases, thieves have employed methods to explain the quirks away, saying the fake audio’s background noises, glitchy sounds or delayed responses are the result of the speaker’s being in an elevator or car or in a rush to catch a flight.

Beyond the technology’s capabilities, the thieves have also depended on age-old scam tactics to boost their effectiveness, using time pressure, such as an impending deadline, or social pressure, such as a desire to appease the boss, to make the listener move past any doubts. In some cases, criminals have targeted the financial gatekeepers in company accounting or budget departments, knowing they may have the capability to send money instantly.

“When you create a stressful situation like this for the victim, their ability to question themselves for a second — ‘Wait, what the hell is going on? Why is the CEO calling me?’ — goes away, and that lets them get away with it,” Shintre said.

A Google program can pass as a human on the phone. Should it be required to tell people it’s a machine?

Euler Hermes representatives said the company, a German energy firm’s subsidiary in Britain, contacted law enforcement but has yet to name any potential suspects. The insurer, which sells policies to businesses covering fraud and cybercrime, said it is covering the company’s full claim.

The victim director was first called late one Friday afternoon in March, and the voice demanded he urgently wire money to a supplier in Hungary to help the company avoid late-payment fines. The fake executive referred to the director by name and sent the financial details by email.

The director and his boss had spoken directly a number of times, said Euler Hermes spokeswoman Antje Wolters, who noted that the call was not recorded. “The software was able to imitate the voice, and not only the voice: the tonality, the punctuation, the German accent,” she said.

After the thieves made a second request, the director grew suspicious and called his boss directly. Then the thieves called back, unraveling the ruse: The fake “ ‘Johannes’ was demanding to speak to me whilst I was still on the phone to the real Johannes!” the director wrote in an email the insurer shared with The Post.

Fake-porn videos are being weaponized to harass and humiliate women: ‘Everybody is a potential target’

The money, totaling 220,000 euros, was funneled through accounts in Hungary and Mexico before being scattered elsewhere, Euler Hermes representatives said. No suspects have been named, the insurer said, and the money has disappeared.

AI developers are working to build systems that can detect and combat fake audio, but the voice-mimicking technology is evolving rapidly. Google, for instance, has invested in research and has funded challenges to automatically recognize “spoofed” speech. But the company has also developed some of the world’s most persuasive voice AI, including its Duplex service, which can call restaurants to book a table using a lifelike, computer-generated voice.

“There’s a tension in the commercial space between wanting to make the best product and considering the bad applications that product could have,” said Charlotte Stanton, the director of the Silicon Valley office of the Carnegie Endowment for International Peace. “Researchers need to be more cautious as they release technology as powerful as voice-synthesis technology, because clearly it’s at a point where it can be misused."

Dig deeper: New Technology + Privacy

Want to explore the impact of new technology on our privacy? Check out our curated list of stories below.

Understanding how facial recognition software can affect law enforcement

An Oregon sheriff’s department became the first law enforcement agency in the country to use Amazon’s facial-recognition software, running 1,000 searches in a year to help solve crimes. But experts fear it could increase wrongful arrests.

How businesses market surveillance software to schools

There is no proof that facial-recognition software can prevent school shootings, yet companies are building sales pitches to schools around the promise of keeping children safe from school shooters.

Get smart about your browser’s privacy issues

Our tech reviewer found more than 11,000 requests in a week for trackers from websites in Google Chrome. The browser even welcomed trackers from websites you would think were private, like Aetna and the Federal Student Aid website.