Deepfake voice cloning

Use of AI-generated synthetic voice — and increasingly video — to impersonate a known executive or colleague during a fraud attempt.

Deepfake voice cloning is the use of generative-AI models to synthesize a convincing copy of a specific person’s voice — typically a CEO, CFO, or close colleague — and then use that synthesized voice in a phone call, voicemail, or live video meeting to authorize a fraudulent transfer, an IBAN change, or a credential reset. With a few minutes of public audio (a conference talk, a podcast, a quarterly earnings call), commodity tools can now produce real-time voice clones good enough to fool most listeners.

The canonical 2024 case is the Arup deepfake in Hong Kong: an employee in the finance function joined what appeared to be a Microsoft Teams call with the CFO and other senior colleagues, all of whom were deepfaked. Over the call the employee was instructed to authorize a series of transfers reportedly totaling around USD 25 million. The fraud was discovered only after checking with headquarters. ENISA’s Threat Landscape reports flag generative-AI-enabled social engineering as one of the fastest-rising categories.

Defining properties:

Builds on classic BEC. This is Business Email Compromise with a credible voice or face attached. The financial-fraud playbook is the same.
Multi-channel pretext. Often an email request plus a phone confirmation in the cloned voice, designed to defeat “call them back to confirm” reflexes when the call-back lands on the attacker.
Targets finance, treasury, payroll, HR. Any role that can move money or change account details.
Real-time is now feasible. Live video deepfakes in Teams or Zoom are no longer research demos.
Hard to detect technically. No reliable consumer-grade detector exists; defense is procedural, not algorithmic.

Mitigations are behavioral, not technical. ANSSI and the FBI both publish guidance recommending dual-channel verification for any wire or account change — call back on a number from the company directory, not the number that just called. The reflex that protects an organization here is not “spot the deepfake” (humans cannot, reliably) but “verify out of band, every time, even when the CEO is on the line.”

Deepfake voice cloning

Related terms

See also