Senthex
Blog
May 5, 2026·11 min read·Yohann Sidot

Agents make the proxy load-bearing

Karpathy says it's a decade. Krieger said a year. Willison drew the lethal trifecta. Here is what their disagreement implies for runtime LLM security in Europe over the next eighteen months.

The cleanest way to disagree with two thoughtful people is to read both of them carefully.

In December 2024, Mike Krieger — Anthropic's CPO and one of the people responsible for shipping Claude into production — said autonomous agents were "still at least a year away from being able to work autonomously" (Axios, AI+ Summit). A year later, on Dwarkesh Patel's podcast, Andrej Karpathy described 2025 as "the decade of agents, not the year of agents" and called the current generation "slop" — not because the research is bad, but because the agents lack memory, multimodality, continual learning, and reliable computer use (Dwarkesh Patel, October 2025). At Davos in January 2025, Yann LeCun went further: "Basing agentic systems on LLMs is a recipe for disaster" — his argument being that agents need a world model that current LLMs structurally do not have (Fortune, Davos 2026 retrospective).

Stack those three quotes and you might conclude the entire agentic space is overhyped and any product strategy that bets on it is ill-advised. The conclusion is wrong — and the way it is wrong is the only interesting question for anyone building runtime infrastructure today.

The actual disagreement

Krieger, Karpathy and LeCun are not making the same argument.

Krieger is making a deployment argument: today's agents need supervision; the autonomy gradient is steeper than the demos suggest. That argument has aged well — see the publicly documented Klarna pivot, where the AI-only customer-service rollout was partly reversed in mid-2025 (Customer Experience Dive) — and it cuts against the procurement tempo of every enterprise that mistook a launch for a feature complete.

Karpathy is making a capability argument: even with a decade of effort, the gap between an agent that solves a task in a benchmark and an agent that you would trust as an intern is structural. His timeline (a decade) is meant to push back against people promising AGI by 2027.

LeCun is making an architecture argument: LLMs are the wrong substrate for autonomous decision-making, period. He is funding an alternative (Advanced Machine Intelligence Labs) and his disagreement is with the substrate, not the timeline.

Now overlay deployment data. Stanford's AI Index 2026 puts Terminal-Bench at 20 % → 77.3 % in twelve months — a four-fold gain on a benchmark that was supposed to take longer (Stanford HAI). MIT Sloan and BCG report 35 % of organisations using agentic AI, fewer than 10 % scaled across a function (BCG). Eurostat: 20 % AI adoption in EU enterprises at end-2025 (Eurostat). Lleverage's State of European AI: 12 % agent adoption in Europe.

This is what the data actually shows: capability is improving fast on narrow benchmarks; deployment in production at scale is not. The two metrics live on different curves. The proxy question — what you put between an LLM and the world — does not depend on which curve you bet on. It depends on the surface area of the calls those agents are making, regardless of whether they "work" in any particular benchmark sense.

What changes structurally with MCP

For most of 2024, an LLM feature was a chat-completions call with one user message and one system message. Auditing was simple: two strings in, one string out, the trace in your app logs.

Then Anthropic published the Model Context Protocol in November 2024, donated it to the Linux Foundation in late 2025, and the industry built on top. OpenAI shipped AgentKit at DevDay 2025 (AgentKit). Mistral released its Agents API (Mistral). Google pushed A2A v1.0 GA alongside the Gemini Enterprise Agent Platform. Microsoft pushed Copilot Studio agents to GA at $349/month per agent. None of this is hypothetical: it ships.

What ships with it is a different surface area. A single user request now expands into a multi-step session: the agent reads documents from a vector store, calls a tool, receives the tool's output, decides whether to call another tool, eventually composes an answer. The LLM provider you talk to is not the only thing in the loop. The number of tokens crossing the boundary multiplies. The number of distinct prompts crossing the boundary multiplies. The number of untrusted inputs crossing the boundary, in the form of tool outputs and document content, multiplies in a different way again.

Five European platforms made this concrete in 2025 by shipping production MCP integrations:

  • Mollie (NL) shipped an MCP server for merchants and an Agentic Commerce Protocol path in late 2025 (fintech.global, November 2025).
  • Bitmovin (AT) shipped its Agentic AI Hub on MCP in November 2025.
  • Supermetrics (FI) shipped its Insights Agent + MCP server.
  • AMBOSS (DE) shipped AMBOSS MCP for medical reference search.
  • Quantexa (UK) shipped its Agent Gateway with MCP support across OpenAI, Claude, Mistral and Gemini in November 2025.

These are not pilots — paying customers, production paths, regulated sectors. Each operates a system where the surface area for prompt injection, exfiltration, tool-call abuse and budget exhaustion is structurally larger than it was eighteen months ago.

The lethal trifecta is the load-bearing argument

Simon Willison's clearest contribution to this conversation is the lethal trifecta: an AI agent becomes structurally unsafe when it combines (a) access to private data, (b) exposure to untrusted content, and (c) the ability to communicate externally (Simon Willison, June 2025).

Read the three legs. The first is the entire point of an enterprise agent — without private-data access it has no value. The second is unavoidable the moment your agent reads any document, email or tool output from outside your trust boundary. The third is unavoidable the moment your agent does anything beyond text — call an API, send an email, write to a database, talk to another agent.

If all three are present, an attacker can sometimes — and in the documented cases, has — exfiltrate private data simply by inserting instructions into content the agent reads (HiddenLayer research). The OWASP GenAI working group put prompt injection at #1 of the LLM Top 10 in 2025 (OWASP GenAI). The UK National Cyber Security Centre's position is unambiguous: prompt injection is a structural property of LLMs, not a bug to be patched (NCSC via CyberScoop).

Willison's own conclusion is a structural one: the only reliable defence is to remove one of the three legs by design. That is exactly what a runtime proxy does, well or badly, depending on its sophistication. It is the place in the architecture where you can:

  • enforce that the agent's external communication path goes through a single, monitored chokepoint;
  • inspect every document and tool output the agent reads before the LLM sees it;
  • decide, per-call, whether the agent should be allowed to escalate to the next tool;
  • attribute every action to a session, a user, a budget, and a policy.

Notice what this does not require. The argument works whether or not Karpathy is right about the decade, Krieger about the year, or LeCun about the substrate. It only requires that someone is shipping agents into production right now — which they demonstrably are, in Europe, in regulated sectors.

What this rules out

Pure LLM-API observability gateways — proxies whose value proposition is logging, caching and cost control on chat-completions endpoints. They serve a real and current need. But their data model assumes one prompt in, one response out, single-call attribution. As soon as the workload becomes a multi-step MCP session, that data model produces logs that are technically correct and operationally useless. The compliance question "what did this agent see and do during this session?" cannot be reconstructed from per-call rows.

Pure agent-orchestration platforms that treat security as a side-effect of orchestration. The lethal trifecta is a property of the system, not of any component. An orchestrator that runs the agent and the security checks in the same process inherits the trust assumptions of the agent — exactly what Willison warns against (allardewinter blog). Security observability has to live outside the orchestrator's trust boundary; otherwise compromising the agent compromises the watcher.

Neither of these positions is wrong about the present. Both are increasingly costly to defend over the next eighteen months in Europe specifically, where Article 15 of the EU AI Act, ANSSI-PA-102 (April 2024), and the upcoming CRA reporting obligations make the evidence you keep about agentic sessions a regulatory artefact, not just an engineering one.

Roadmap as argument, not feature list

The reason this is being written now, rather than after the work is finished, is that the EU AI Act enforcement window opens on 2 August 2026 — five months from this post — and the design partners we want to work with on the agent-aware path are the ones already shipping MCP today.

What is in production at Senthex on the LLM-API path is the basis. Twenty-six shields concurrent on the call boundary, EU-only inference, an audit log designed to support Article 15 evidence requirements (per-call shield verdicts, scores, timestamps, retention controls). That work is not getting deprecated by what comes next; it is the substrate that makes what comes next defensible.

What ships next is session-aware tracing for MCP-style agentic loops: a single audit object joining every tool call, document read and model invocation in a session, with shield verdicts at each step. Per-session budget caps. Per-tool policy gates. Publishing the timeline now is not advertising features that don't exist — it is inviting the few European teams already shipping MCP in production to define what session-aware audit should look like before we ship it.

If your team is one of the five mentioned earlier — or one of the others we have not yet found — we are taking on a small number of design partners ahead of the Q3/Q4 2026 ship of session-aware tracing. The bar is concrete: an MCP integration in production today, a buyer asking compliance questions you cannot fully answer with per-call logs, and an opinion on what an Article 15 audit should look like for agentic sessions.

What this is not predicting

A manifesto that ends with predictions is usually wrong. So instead, here are the conditions under which the argument above invalidates itself.

The argument fails if MCP is replaced within twelve months by a fundamentally different protocol that absorbs the trifecta into the protocol layer. No candidate today; A2A is a peer, not a successor. If we are wrong, we will say so.

The argument fails if provider-side guardrails (Bedrock, Azure Content Safety, Vertex Safety, Anthropic's filtering) absorb agent-level controls before buyers notice. They are moving that way, but the in-process, in-trust-boundary nature of provider-side controls is precisely what the trifecta literature pushes back on.

The argument fails if EU enterprises decide agentic AI is a 2027 problem. McKinsey's <10 % scaled-deployment number leaves room for that scenario. Our hedge is explicit: the LLM-API proxy is the load-bearing path for the foreseeable EU mid-market; the agent path is built on top, not in place of it.

We will check this in eighteen months. The question is not whether agents are a year or a decade away. The question is what the surface looks like for systems shipping right now — and whether the runtime infrastructure under them was designed for that surface, or for the simpler one we used to have.


Want to discuss the design partner programme? security@senthex.com. The full Article 15 / ANSSI-PA-102 / GDPR mapping for the proxy layer is on the EU AI Act compliance page.