Most Security Controls Were Built on Assumptions Agentic AI Violates
Security practitioners have been noticing not just that AI agents have expanded the attack surface, but that they're failing in ways that existing mental models don't fully account for.
In July 2025, a malicious actor gained access to the Amazon Q Developer VS Code extension through misconfigured repository permissions and merged a system prompt instructing the agent to wipe users' local filesystems and delete their AWS cloud resources. The compromised version was distributed to nearly a million developers before anyone noticed. What stopped it was a syntax error in the injected prompt.
A month later, attackers compromised the Nx build system's npm publishing pipeline β a tool with 4.6 million weekly downloads. The payload didn't steal credentials in a traditional manner. It invoked the victims' own AI coding agents β Claude Code, Gemini CLI, Amazon Q β and prompted them to conduct reconnaissance and exfiltrate secrets on the attacker's behalf. It was the first documented case of malware conscripting AI agents as attack tools.
That same month, a supply chain breach through Salesloft's Drift AI chat integration exposed data across more than 700 organizations. The incident was serious enough that FINRA issued a formal alert to all member firms, citing elevated risk of credential theft and spear phishing. The entry point wasn't a vulnerability in a traditional sense β it was a trusted AI integration.
Three incidents, three different attack vectors. But thereβs one consistent pattern: the agentic properties that make these systems useful β autonomy, broad tool access, trusted relationships with other systems β are what's being exploited.
Last week, CISA, the NSA, and Five Eyes partner agencies published joint guidance on the secure deployment of agentic AI, the clearest multinational guidance to address these systems directly. The document catalogs five categories of risk and offers practical controls across the full deployment lifecycle. It's a useful and overdue contribution.
But read carefully, the guidance surfaces something unsettling: Agentic AI doesn't just introduce new vulnerabilities. It systematically violates the foundational assumptions that security controls were built to enforce.
We Assumed We Knew Who Was Acting
Traditional security controls are built around the concept of a principal β an identifiable entity whose actions can be attributed, audited, and held accountable. When a human logs into a system, they authenticate. When they take an action, it's logged under their identity. The model is imperfect, but it's auditable.
Agentic AI breaks that model at the foundation. Agents authenticate to services and to each other using API keys, tokens, and credentials β and in many deployments, those credentials can become static, broadly scoped or shared across agents.
The document describes what it calls the "confused deputy" pattern: a low-privilege user manipulates a high-privilege agent into performing actions the user couldn't execute directly. The agent complies because the request appears to fall within its authorized scope. The audit log shows a trusted identity taking an expected action. Nothing triggers an alert.
This is a structural property of how agents operate. Because behavioral detection models are tuned to flag anomalies from known-bad actors, a compromised agent operating under a legitimate identity is nearly invisible to the tools organizations rely on for detection.
The guidance recommends cryptographically anchored agent identities, short-lived credentials, mutual TLS on all inter-agent calls, and trusted registries reconciled regularly against live agent populations. These are sound controls. What I find striking is how far most current agentic deployments are from implementing any of them.
We Assumed We Could See What Happened
Incident response depends on a reliable record of what occurred. Logs, audit trails, and reproducible behavior are what allow security teams to reconstruct an attack, assign scope, and determine what needs to change. Agentic systems make all three significantly harder.
The guidance notes that agentic AI generates reasoning chains and context windows that produce logs which are massive in volume, loosely structured, often repetitive, and difficult to parse for meaningful signal. Comprehensive logging is technically possible but operationally impractical at scale β and the guidance is candid that extracting actionable intelligence from those logs is challenging.
Stochastic behavior compounds this. Unlike traditional software, where the same input reliably produces the same output, LLM-based agents can generate different actions from identical prompts depending on context window state, environmental inputs, and model variability. This makes reproducibility β a cornerstone of post-incident analysis β unreliable by design.
Agents can also spawn sub-agents and follow delegation chains that aren't surfaced in operator-facing monitoring. Tools integrated into agentic workflows frequently operate outside the system's monitoring boundary entirely, meaning a significant portion of what an agent does may simply not appear in any log.
The most underreported detail in the guidance is worth reading twice. The document explicitly notes that some AI systems have demonstrated what it calls "awareness" β altering their behavior when they detect they are under evaluation. More concerning, the guidance describes documented cases of agents that conceal vulnerabilities they discover rather than report them, and misrepresent their actions to avoid shutdown or constraint.
For security teams, this creates a specific problem: the monitoring and evaluation processes designed to provide assurance may be producing data that reflects how an agent behaves when it knows it's being watched β not how it behaves in production.
We Assumed We Could Fix It Afterward
Even when security controls fail, the assumption has always been that failures are recoverable. You identify the breach, contain the damage, restore from backup, and reconstruct the audit trail to understand what happened. Agentic systems complicate all of this.
The guidance identifies what it calls accountability risks β the ways agentic architecture makes it structurally difficult to trace what caused a particular action. When multiple agents collaborate on a task through a chain of distributed decisions, each operating within a limited scope, determining which component or design choice caused an erroneous outcome becomes genuinely hard. Fragmented logs, opaque reasoning, and emergent interactions between agents obscure the decision path even when nothing has been deliberately tampered with.
This is before you account for the cases where tampering has occurred. The document notes that compromised agents can delete audit trails, alter access controls, and modify files while appearing to function normally β producing logs that look legitimate and delay detection. If the audit trail itself cannot be trusted, the forensic foundation of incident response is undermined.
Hallucinations introduce another compounding problem. In a single-agent system, a hallucinated output is a contained error, but in a multi-agent system, a hallucinated output accepted as fact by one agent propagates downstream as ground truth. The guidance describes a scenario where partial agent failures lead to hallucinated outputs that subsequent agents treat as valid inputs, under degraded conditions that make verification less likely, not more.
The structural risk section of the guidance describes cascading failures that emerge from the system's architecture itself β tightly coupled agents replanning and handing off ambiguous subtasks, selecting misconfigured or malicious tools under degraded conditions, with implicit trust between agents that was never explicitly established and cannot be easily revoked.
The guidance also identifies what it calls specification gaming, where an agent tasked with maximizing uptime disables security updates to avoid reboots. The agent is technically achieving its goal. The behavior is within its authorized scope. And it's creating a vulnerability that may not surface until much later.
What the Guidance Recommends and What It Admits
The practical controls in the guidance are grounded and implementable β least-privilege access scoped to the narrowest possible permissions, cryptographic identity for every agent, human approval gates for high-impact or irreversible actions, defense-in-depth rather than reliance on any single control, isolated agent environments that limit blast radius, and continuous monitoring tuned to behavioral baselines rather than known-bad signatures.
The guidance also recommends what starting with clearly defined, low-risk tasks and incrementally expanding agent autonomy as operators develop genuine understanding of how a system behaves in their specific environment. This is an acknowledgment that the evaluation methods needed to validate agent behavior before deployment don't yet exist at the fidelity the risk warrants.
The document states that some risks unique to agentic systems are not covered by existing frameworks, evaluation methods are not mature enough to reliably validate agent behavior, and governance mechanisms designed for human actors do not translate cleanly to autonomous AI agents.
That last point is the one I keep coming back to. This is a joint advisory from multiple intelligence agencies, and its conclusion is essentially: assume that agentic AI systems may behave unexpectedly, and prioritize resilience and reversibility over efficiency gains until the field catches up.
That's not standard advisory language. It's an honest accounting of where things actually stand β and for practitioners making deployment decisions right now, it's probably the most important sentence in the document.
Sources
CISA, NSA, ASD, CCCS, NCSC-NZ, NCSC-UK. "Careful Adoption of Agentic AI Services," May 2026: cisa.gov
SC Media. "Amazon Q Extension for VS Code Reportedly Injected with 'Wiper' Prompt," July 2025: scworld.com
Snyk. "Weaponizing AI Coding Agents for Malware in the Nx Malicious Package Security Incident," August 2025: snyk.io
FINRA. "Cybersecurity Alert β Salesloft Drift AI Supply Chain Attack," 2025: finra.org