MCP Security: Why Guardrails Aren't Enough
A new paper proposes defense-in-depth for MCP security. The diagnosis is right, but policy enforcement can't solve what structural isolation must.
A recent paper from researchers at Vanta, MintMCP, and Darktrace—Securing the Model Context Protocol: Risks, Controls, and Governance—offers the most comprehensive treatment of MCP security threats to date. It’s worth reading. The threat model is solid, the attack taxonomy is useful, and the framework mappings (NIST AI RMF, ISO 42001) will help security teams translate agent risks into compliance language.
But the proposed solution—defense-in-depth through gateway-based policy enforcement—treats symptoms rather than causes. Here’s why.
What the Paper Gets Right
The Problem Diagnosis
The authors identify three adversary types that map cleanly to real-world incidents:
- Content Injection Adversaries: External attackers who embed malicious instructions in data sources agents process (support tickets, emails, calendar events)
- Supply Chain Adversaries: Malicious or compromised MCP servers (rugpull attacks, response injection)
- Inadvertent Agent Adversaries: Agents whose goal-seeking behavior creates security harms through emergent tool chaining
They also name the “lethal trifecta” (credit to Simon Willison): an agent with (a) access to private data, (b) exposure to untrusted content, and (c) external communication capability becomes a data exfiltration vector. Inject instructions into any connected system, exfiltrate from all of them.
This is the confused deputy restated for MCP. The paper doesn’t use that term, but the shape is identical: a trusted intermediary (the agent) can be manipulated into misusing its authority. (The same pattern emerges in NHI security—identity governance without authority-flow semantics.)
The Attack Taxonomy
The paper documents concrete attack vectors with proof-of-concept implementations:
| Attack | Mechanism | Real-World Example |
|---|---|---|
| Response injection | Malicious instructions in tool responses | PoC in Appendix A |
| Rugpull | Trusted server turns malicious | Postmark MCP server BCC’ing emails |
| Context poisoning | Malicious tool descriptions | Instructions to read ~/.aws/credentials |
| Cross-system exfiltration | Agent bridges network boundaries | Asana MCP data exposure |
These aren’t theoretical. The mcp-remote RCE (CVE-2025-6514), Asana’s cross-tenant leak, and the Postmark email exfiltration all happened in production.
The Framework Mappings
For security teams navigating compliance, the paper’s mappings to NIST AI RMF, ISO/IEC 27001, and ISO/IEC 42001 are genuinely useful. Table 2 provides a reference for integrating MCP security into existing audit programs.
Where the Solution Falls Short
The paper proposes five control categories:
- Authentication & Authorization (per-user OAuth, RBAC)
- Provenance Tracking (audit trails, SIEM integration)
- Context Isolation & Sandboxing (containerization, I/O filtering)
- Inline Policy Enforcement (DLP, secrets scanning, anomaly detection)
- Centralized Governance (private registries, tool allowlists)
All operationalized through an MCP Gateway—a proxy between agents and servers that enforces policy at every interaction.
This is defense-in-depth. It’s the right instinct. But it has structural limitations that no amount of layering can fix.
Problem 1: Policies Are Evaluated, Not Enforced
The gateway evaluates whether an action should be allowed. If the policy check passes, the action executes. If it fails, the action is blocked.
But what if:
- The policy is misconfigured?
- A code path bypasses the policy check?
- The gateway has a bug?
- The policy can’t express the constraint you need?
Policy-based systems fail open. The Copilot audit log vulnerability is a perfect example: Copilot could access files without generating audit entries because the logging hook was separable from the authorization path. The action was “authorized” but unlogged.
In capability-based systems, authorization is the proof of action. You can’t act without presenting a capability, and the capability is the audit record. There’s no separate hook to bypass.
Problem 2: Ambient Authority Remains
The paper’s controls restrict what agents can do. They don’t change what agents have access to. This is the missing layer in identity-based approaches.
With gateway-based RBAC, an agent still holds credentials for all tools its role permits. When processing Transaction A, it has access to tools for Transaction B. The confused deputy can still be confused—you’re just hoping the policy catches it.
Capability-based systems eliminate ambient authority. The agent doesn’t “have access” to anything. It receives a capability scoped to this transaction, this resource, these constraints. Authority travels with the request, not ambient to identity.
Policy model: Agent has role → role has permissions → hope policy catches misuse
Capability model: Transaction has capability → capability designates authority → misuse is unaddressableProblem 3: No Delegation Semantics
The paper briefly mentions “multi-agent collusion” but doesn’t address the fundamental problem: multi-agent delegation. When Agent A delegates to Agent B delegates to Agent C, what happens to authorization?
This isn’t edge-case territory. Multi-agent orchestration is how production systems work. Salesforce Agentforce routes across model providers. AWS Bedrock AgentCore chains supervisor agents to specialists. ServiceNow’s AI Control Tower coordinates across services. The paper’s Section 3.4.4 acknowledges this but offers only “monitoring inter-agent interaction patterns”—forensics, not authorization.
In the gateway model, each agent authenticates independently. There’s no cryptographic link between the original user’s consent and the third agent’s action. By the third hop, you’re trusting that each agent correctly propagated intent—a chain of hopes, not proofs. We call this the third-hop problem.
Capability chains solve this structurally:
- Each delegation appends a block to the capability token
- Each block can only attenuate (narrow) permissions, never expand
- The final agent presents the entire chain
- Verification proves: this action is the legitimate continuation of that transaction, with these accumulated constraints
No trust required at each hop. The chain is self-verifying. See Capabilities 101 for the full model.
Problem 4: Anomaly Detection Is Forensics, Not Prevention
The paper proposes behavioral baselines and anomaly detection:
Deviations from baselines should trigger alerts, for example: unusual data volume spikes, activity during after hours, or repeated authentication failures.
This detects attacks after they begin. It doesn’t prevent the confused deputy from acting in the first place.
Anomaly detection is valuable for forensics. But for authorization, you want constraints that make unauthorized actions impossible to express, not just detectable after the fact.
Problem 5: Gateway as Single Point of Failure
The authors acknowledge this:
The gateway can become a single point of failure if it is not designed for high availability.
But it’s worse than availability. The gateway becomes the single point of trust. Compromise the gateway, compromise all agent authorization. Every tool call, every response, every credential—visible to whoever controls that layer.
Capability tokens are bearer credentials, but they’re also self-contained proofs. Verification requires only the public key. A compromised intermediary can drop or delay tokens, but it can’t forge them, expand their scope, or hide their usage. The cryptographic properties survive gateway compromise in ways policy databases don’t.
The Structural Alternative
Operating systems solved this problem fifty years ago. Multics didn’t ask processes to behave well—it made misbehavior impossible by construction. Each process got its own address space. Other processes’ memory wasn’t hidden; it was unaddressable.
Agents need the same treatment: not guardrails that hope for compliance, but structural isolation that makes “bad” impossible to express.
| Property | Gateway + Policy | Capability Chains |
|---|---|---|
| Authorization model | Evaluated at gateway | Carried with request |
| Ambient authority | Yes (role-based) | No (transaction-scoped) |
| Delegation | Implicit trust chain | Cryptographic attenuation |
| Audit | Separate logging layer | Authorization = audit |
| Failure mode | Fail open (policy miss) | Fail closed (no capability) |
| Gateway compromise | Full access | Can’t forge capabilities |
What We’d Add to the Paper
The paper’s controls aren’t wrong—they’re incomplete. Here’s what’s missing:
1. Transaction-Bound Capabilities
Every agent action should require presenting a capability scoped to that transaction. No ambient authority. The capability encodes:
- What resource
- What action
- What constraints
- When it expires
- Who it’s designated for
2. Cryptographic Attenuation
When Agent A delegates to Agent B, B receives an attenuated capability. B can only narrow permissions, never expand them. The capability chain is append-only, cryptographically signed at each hop.
3. Authorization Equals Audit
The capability token required to act is the proof that you acted. No separate logging hook to bypass. If you don’t have the token, you can’t act. If you acted, the token exists.
4. Structural Memory Isolation
The paper mentions sandboxing but focuses on container boundaries. For agents, you also need memory isolation per transaction—each invocation gets its own context partition. Transaction B can’t access Transaction A’s memory, even for the same user.
Credit Where Due
This paper advances the conversation. The threat model is the most complete public treatment of MCP security risks. The framework mappings will help security teams. The proof-of-concept attacks are useful for red teams.
But defense-in-depth through policy layers treats the symptom (agents doing bad things) rather than the cause (agents having the authority to do bad things in the first place).
The confused deputy was named in 1988 and solved in operating systems. MCP security doesn’t need more guardrails. It needs structural enforcement where unauthorized actions are impossible to express—not just detectable, not just logged, but unaddressable.
That’s what capability chains provide.
This post is part of our series on agent authorization. See also: The Confused Deputy Problem · Agents Need Kernels, Not Guardrails · Why NHI Can’t Secure Agentic AI · Proof of Continuity · Capabilities 101
References
- Errico, Ngiam, Sojan (2025): Securing the Model Context Protocol: Risks, Controls, and Governance
- Hardy (1988): The Confused Deputy
- Willison (2025): The Lethal Trifecta