Path & Payload

The Psychology Behind Vishing — and Why Most Defenses Miss It

On August 27, 2023, an engineer at Retool received a phone call from someone who sounded like a colleague on the IT team. The caller knew the office floor plan, used the names of coworkers, and was fluent in internal processes. Retool's own post-mortem is clear that the employee became "more and more suspicious" as the conversation continued. And then, despite the suspicion, the employee provided the attacker with an additional MFA code. That single token allowed the attacker to register their own device to the engineer's Okta account, producing valid MFA from that point forward. Twenty-seven of Retool's cloud customers — all in the cryptocurrency sector — were subsequently compromised.

In January 2024, a finance worker at Arup's Hong Kong office received an email purportedly from the company's UK-based CFO requesting a confidential wire transfer. The employee recognized it as a probable phishing attempt and was skeptical. Then the attackers deployed a second move: a video conference call populated with deepfake recreations of the CFO and several colleagues the employee recognized, their faces and voices reconstructed from publicly available footage. The employee's skepticism was defeated. He authorized fifteen transfers totaling $25.6 million before checking with UK headquarters and realizing no one there knew anything about it. CNN Business confirmed the incident in May 2024 after Arup went public.

These attacks had different delivery mechanisms, but in both cases the attacker's decisive advantage was live voice/video interaction. And in both cases, the conversation worked even against employees who were trying to be careful.

CrowdStrike's Threat Hunting Report noted that in the first six months of 2025 alone, vishing volume had already exceeded every vishing attack recorded across the entirety of 2024. There’s a reason why voice converts at rates that email simply doesn't match — and the answer has less to do with technology than with psychology.

We Designed the Defenses for a Different Channel

The security controls organizations rely on to stop phishing attacks are almost entirely built around written content. Secure email gateways scan attachments and URLs. Browser warnings flag suspicious domains. Phishing awareness training teaches employees to hover over links before clicking, inspect sender addresses, and treat urgency as a red flag. None of those controls have any relevance to a phone call.

Caller ID can be spoofed to show a familiar number, but even when it doesn't, most corporate environments generate enough inbound call volume that an unfamiliar number doesn't automatically trigger alarm. Vishing routes around the entire email security stack by operating on a channel that security tooling has never learned to monitor in real time.

The callback phishing variant — known as telephone-oriented attack delivery, or TOAD — makes this bypass even more deliberate. The attack initiates via email, but the email contains no link, no attachment, and no malicious payload. It instructs the target to call a number to resolve an urgent issue. Because the email contains nothing technically malicious, it passes cleanly through URL scanners and attachment sandboxes. When the victim dials the number, they reach an attacker-controlled operator — and at that point they've already taken the first step, in more than one sense.

Freedman and Fraser's foundational foot-in-the-door research demonstrated in 1966 that a person who complies with a small initial request becomes significantly more likely to comply with a larger one — not because of external pressure, but because the first act of compliance shifts how they see themselves. Dialing a number and explaining their problem to a friendly operator is a small, easy action. By the time the attacker makes a real ask, the victim has already cast themselves in the role of someone cooperating to resolve an issue. Sales professionals have used this same escalation structure for decades.

Additionally, phishing awareness training specifically teaches people that receiving an unsolicited message is itself a warning sign. Callback phishing overcomes this by encouraging the victim to place the call, believing they've verified legitimacy by initiating contact.

Trustwave researchers detected a 140% surge in TOAD campaigns in Q3 2024, and VIPRE Security Group reported that callback phishing rose from 3% to 18% of phishing incidents in Q4 2025. According to Proofpoint's 2024 State of the Phish report, upward of 10 million TOAD attacks are executed monthly.

Voice Compresses the Evaluation Window

Email is asynchronous. You can stop reading, re-examine a suspicious phrase, forward it to IT, or simply close the window. The attack pauses while you do any of those things. A phone call unfolds in real time, at a pace the attacker controls, and ending the call feels socially costly in a way that closing an email doesn't.

Analytical evaluation of a potential threat takes time. The psychology literature on urgency and decision-making is consistent: urgency signals suppress goal-directed cognitive control and push behavior toward automatic, heuristic responses. Research published in eLife in 2021 found that urgency opens a brief window in which the brain's capacity to override automatic responses breaks down — even when the person is actively trying to think carefully. An attacker who creates urgency on a phone call is neurologically narrowing the victim's capacity to stop and evaluate what's happening.

The IT help desk context compounds this further. Help desk personnel exist specifically to resolve problems quickly. Their role is structured around moving from problem to resolution, and a caller in distress — frustrated, pressuring, escalating — maps directly onto the scenarios those workers are trained to address. The social and professional gravity of the situation pulls toward compliance. Skepticism is not just cognitively difficult under these conditions — it's structurally penalized, because the alternative is failing at the job.

The Retool case is instructive precisely because the employee's critical faculties were engaged. This wasn't someone on autopilot — they were actively evaluating the call and flagging it as suspicious. The voice pressure, the insider knowledge, the social weight of keeping an internal interaction moving toward resolution, pushed the employee across the line even while their threat evaluation was running.

Authority Transfers Differently Through Voice

Phishing emails impersonate authority too, but the impersonation is static. A spoofed sender address can be inspected. The writing style can be compared against prior communications. The claimed urgency can be evaluated against whether the organization actually sends emails like this.

Voice carries signals that written text doesn't. Tone, pacing, confidence, the ease with which a caller navigates interruption, the way they handle pushback — all of these are paralinguistic cues that listeners use, largely automatically, to assess whether the person on the other end is who they say they are and whether the request is legitimate.

An attacker who is fluent, composed, and authoritative on a call is providing real-time evidence that seems to corroborate their claimed identity. Scattered Spider's documented success across multiple targets correlates strongly with their operators' English fluency and their ability to improvise — to pivot, respond naturally to questions, and hold a conversation without following a rigid script. In the Retool incident, the voice on the call was a deepfake of a specific colleague — not a generic authority figure, but someone the engineer recognized. The attacker borrowed credibility from a trusted identity.

This is why help desks are the canonical vishing target. The help desk worker's job creates an orientation toward trust and assistance that is genuinely difficult to override. When someone calls with a plausible reason for needing access, the natural posture of the role is to help them. The attacker doesn't need to break through skepticism so much as they need to provide just enough plausibility that the normal helping behavior can proceed.

The Arup case illustrates this dynamic at its most extreme. The employee's skepticism was active — he recognized the email as suspicious and resisted it. The attackers simply escalated to a channel where authority and social proof are harder to dismiss: a video call populated with multiple familiar faces making the same request. The employee was not foolish. His threat evaluation process worked correctly on the first stimulus and was then overwhelmed on the second. The attack succeeded by exploiting the limits of human authentication under adversarial conditions.

AI Has Removed the Remaining Tells

Historically, vishing had detectable weaknesses — unfamiliar accents, scripted responses that broke down under unexpected questions, background noise characteristic of call centers. AI has systematically eliminated each of these. Voice cloning tools can now reproduce a target's voice from as little as three seconds of audio — the kind of footage that appears in a recorded earnings call, a LinkedIn video, or a public conference presentation. The Arup attackers built convincing deepfakes of specific, named individuals using nothing but publicly available material. The quality was sufficient to overcome a skeptical employee's active resistance.

The remaining friction in vishing operations — running effective call center operations, training operators, managing languages and time zones — is also being commoditized. Underground markets now offer TOAD-as-a-service platforms with multilingual operators, auto-dialers, and spoofed caller ID ranges available on subscription. Intel 471 documented more than 60 distinct threat actors offering underground call center services between January 2023 and August 2024. The industrialization of these services means the barrier to running a sophisticated vishing campaign has dropped to roughly the same level as running a phishing campaign.

What the Defenses Are Actually Missing

The conventional response to vishing risk is user awareness training: teach employees to be skeptical of unsolicited calls, to verify caller identity, to escalate before taking action on sensitive requests. This is a good first step, but it's insufficient. Awareness training operates on the assumption that a well-prepared employee can, in real time, recognize a vishing attempt and decline to comply. The Retool and Arup cases both involve employees who were not naive — they were simply operating within normal professional parameters in roles where taking requests at face value is part of the job.

The American Hospital Association documented a pattern in early 2024 of attackers impersonating hospital employees to IT help desks, providing correct answers to security questions, and then requesting password changes and new MFA device registrations — demonstrating that knowledge-based verification procedures specifically are the attack surface.

The controls that actually address this are process-level, not awareness-level. Out-of-band verification — requiring that any sensitive action requested by phone be confirmed through a separate channel using a number the organization controls, not the number provided by the caller — breaks the attack chain regardless of how convincing the caller is. Call authentication procedures that treat any inbound request for credential reset or sensitive access as unverified, regardless of how much the caller appears to know, remove the help desk's discretion from the equation entirely.

The Retool breach points to another control that gets less attention than it deserves: the deployment of phishing-resistant MFA. The attacker's decisive move in that incident was getting an employee to read out a one-time password on a live call — a social engineering technique that works precisely because OTP-based MFA requires a human to transfer a code across channels. FIDO2/WebAuthn authentication is bound to the legitimate origin, so an attacker-controlled site cannot simply collect and replay a code. There's no OTP for the employee to read out over the phone, and authentication fails when the request comes from the wrong origin. The control makes the attack technically unsuccessful even when the employee is deceived. Of course, deploying phishing-resistant authentication at scale creates its own support burden, and a poorly designed account recovery process can reintroduce the same attack surface.

At the network layer, voice firewalls are underdeployed relative to their email equivalents. Mutare's 2024 Voice Threat Survey found that while 94% of respondents believed voice attack defenses should be part of their cyber strategy, only 59% were aware that technical solutions to block calls existed. Voice firewalls filter inbound calls at the network edge using phone number reputation, STIR/SHAKEN attestation data, and anomaly detection before a call reaches an employee. STIR/SHAKEN — the FCC-mandated protocol that carriers use to validate that a call came from the number it claims — is worth requesting from your carrier, with the caveat that it authenticates a number, not a voice, and coverage gaps in non-IP networks limit its reliability as a standalone control.

The failure in the Arup case was at the authorization layer. High-value financial transactions authorized over voice or video channels are an obvious target for this class of attack, and the control is straightforward even if it requires process change: mandatory multi-party approval, time delays that create a window for verification, and a standing policy that no wire transfer instruction received by phone or video is treated as authorized until confirmed through a separate written channel. The same logic applies to any action that is difficult or impossible to reverse.

Most social engineering security controls weren’t designed for a threat model that includes voice-based attacks. Vishing operates on a different psychological substrate, bypasses the technical stack, and is becoming more convincing faster than awareness training cycles can keep pace with. The question for security teams is whether the processes surrounding sensitive actions are designed to be safe even when the employee is deceived.

Sources

#Perspectives