ATLAS · Lab #1 · Security research

Your Agents Work as Expected. That’s the Problem.

Prompt-Injection Cascades in a Multi-Agent Enterprise: A Forensic Study

Senthex Research·June 2026·19 pages

17 / 18

cascades to exfiltration

behind a shadow-mode firewall

0 / 17

with enforcing

the same firewall, one flag

∅

jailbreaks used

no model manipulated past entry

Abstract

We deployed four large language model agents in a simulated small enterprise — a support agent, a sales agent, a CEO, and a finance agent — drawn from three different production model families and connected by a shared message bus. The agents had the tools and autonomy of a routine back office: read tickets, look up invoices, request approval, move money. We delivered a single poisoned support ticket from outside the company and observed what the agents did with it.

Across 18 runs, 17 reproduced the same outcome — a fraudulent €48,500 transfer to an attacker-controlled account. No model was jailbroken, no agent was altered, and no guardrail failed in the usual sense; each agent behaved as its instructions intended. The compromise spread because a trusted internal agent, doing its job, rewrote the attacker’s instruction into an ordinary business request — laundering the injection so that it was detectable only at the point of entry.

The experiment ran behind an AI firewall in shadow mode, which observed every call without intervening. It recorded the attack in full and judged all 17 sessions compromised; a counterfactual replay indicates that blocking the first interaction would have stopped every cascade, with no estimated false positives. We argue this failure is topological — a property of how autonomous agents are wired together — rather than a defect of any one model, and we discuss what that implies for defending multi-agent systems.

Key findings

A single externally-supplied message — an ordinary support ticket — drove four well-behaved AI agents to a fraudulent payment in 17 of 18 trials.
Nothing broke. No jailbreak, no rogue agent, no failed guardrail — the agents followed their instructions. The weakness is in how they are connected, not in the models.
Detection is not prevention. In shadow mode the firewall saw and logged every step but did not act; in enforcing mode the same firewall would have stopped each cascade at the first step — 100% → 0%, with zero estimated false positives.
The attack is visible only at the entry point. Once a trusted agent relays it, it reads as normal business traffic (injection score 0.90 → 0.00) — so inspecting later hops does not catch it.
To defend multi-agent systems, control the topology and the entry points with an external control plane that can block — not better prompts alone. Observation is cheap: 295 ms median added latency.

Your Agents Work as Expected. That’s the Problem.

Abstract

Key findings

The full paper

Cite this