MCP Security: Why Guardrails Aren't Enough

A recent paper from researchers at Vanta, MintMCP, and Darktrace—Securing the Model Context Protocol: Risks, Controls, and Governance—offers the most comprehensive treatment of MCP security threats to date. It’s worth reading. The threat model is solid, the attack taxonomy is useful, and the framework mappings (NIST AI RMF, ISO 42001) will help security teams translate agent risks into compliance language.

But the proposed solution—defense-in-depth through gateway-based policy enforcement—treats symptoms rather than causes. Here’s why.

What the Paper Gets Right

The Problem Diagnosis

The authors identify three adversary types that map cleanly to real-world incidents:

Content Injection Adversaries: External attackers who embed malicious instructions in data sources agents process (support tickets, emails, calendar events)
Supply Chain Adversaries: Malicious or compromised MCP servers (rugpull attacks, response injection)
Inadvertent Agent Adversaries: Agents whose goal-seeking behavior creates security harms through emergent tool chaining

They also name the “lethal trifecta” (credit to Simon Willison): an agent with (a) access to private data, (b) exposure to untrusted content, and (c) external communication capability becomes a data exfiltration vector. Inject instructions into any connected system, exfiltrate from all of them.

This is the confused deputy restated for MCP. The paper doesn’t use that term, but the shape is identical: a trusted intermediary (the agent) can be manipulated into misusing its authority. (The same pattern emerges in NHI security—identity governance without authority-flow semantics.)

The Attack Taxonomy

The paper documents concrete attack vectors with proof-of-concept implementations:

Attack	Mechanism	Real-World Example
Response injection	Malicious instructions in tool responses	PoC in Appendix A
Rugpull	Trusted server turns malicious	Postmark MCP server BCC’ing emails
Context poisoning	Malicious tool descriptions	Instructions to read ~/.aws/credentials
Cross-system exfiltration	Agent bridges network boundaries	Asana MCP data exposure

These aren’t theoretical. The mcp-remote RCE (CVE-2025-6514), Asana’s cross-tenant leak, and the Postmark email exfiltration all happened in production.

The Framework Mappings

For security teams navigating compliance, the paper’s mappings to NIST AI RMF, ISO/IEC 27001, and ISO/IEC 42001 are genuinely useful. Table 2 provides a reference for integrating MCP security into existing audit programs.

Where the Solution Falls Short

The paper proposes five control categories:

Authentication & Authorization (per-user OAuth, RBAC)
Provenance Tracking (audit trails, SIEM integration)
Context Isolation & Sandboxing (containerization, I/O filtering)
Inline Policy Enforcement (DLP, secrets scanning, anomaly detection)
Centralized Governance (private registries, tool allowlists)

All operationalized through an MCP Gateway—a proxy between agents and servers that enforces policy at every interaction.

This is defense-in-depth. It’s the right instinct. But it has structural limitations that no amount of layering can fix.

Problem 1: Policies Are Evaluated, Not Enforced

The gateway evaluates whether an action should be allowed. If the policy check passes, the action executes. If it fails, the action is blocked.

But what if:

The policy is misconfigured?
A code path bypasses the policy check?
The gateway has a bug?
The policy can’t express the constraint you need?

Policy-based systems fail open. The Copilot audit log vulnerability is a perfect example: Copilot could access files without generating audit entries because the logging hook was separable from the authorization path. The action was “authorized” but unlogged.

In capability-based systems, authorization is the proof of action. You can’t act without presenting a capability, and the capability is the audit record. There’s no separate hook to bypass.

Problem 2: Ambient Authority Remains

The paper’s controls restrict what agents can do. They don’t change what agents have access to. This is the missing layer in identity-based approaches.

With gateway-based RBAC, an agent still holds credentials for all tools its role permits. When processing Transaction A, it has access to tools for Transaction B. The confused deputy can still be confused—you’re just hoping the policy catches it.

Capability-based systems eliminate ambient authority. The agent doesn’t “have access” to anything. It receives a capability scoped to this transaction, this resource, these constraints. Authority travels with the request, not ambient to identity.

Policy model:    Agent has role → role has permissions → hope policy catches misuse
Capability model: Transaction has capability → capability designates authority → misuse is unaddressable

Problem 3: No Delegation Semantics

The paper briefly mentions “multi-agent collusion” but doesn’t address the fundamental problem: multi-agent delegation. When Agent A delegates to Agent B delegates to Agent C, what happens to authorization?

This isn’t edge-case territory. Multi-agent orchestration is how production systems work. Salesforce Agentforce routes across model providers. AWS Bedrock AgentCore chains supervisor agents to specialists. ServiceNow’s AI Control Tower coordinates across services. The paper’s Section 3.4.4 acknowledges this but offers only “monitoring inter-agent interaction patterns”—forensics, not authorization.

In the gateway model, each agent authenticates independently. There’s no cryptographic link between the original user’s consent and the third agent’s action. By the third hop, you’re trusting that each agent correctly propagated intent—a chain of hopes, not proofs. We call this the third-hop problem.

Capability chains solve this structurally:

Each delegation appends a block to the capability token
Each block can only attenuate (narrow) permissions, never expand
The final agent presents the entire chain
Verification proves: this action is the legitimate continuation of that transaction, with these accumulated constraints

No trust required at each hop. The chain is self-verifying. See Capabilities 101 for the full model.

Problem 4: Anomaly Detection Is Forensics, Not Prevention

The paper proposes behavioral baselines and anomaly detection:

Deviations from baselines should trigger alerts, for example: unusual data volume spikes, activity during after hours, or repeated authentication failures.

This detects attacks after they begin. It doesn’t prevent the confused deputy from acting in the first place.

Anomaly detection is valuable for forensics. But for authorization, you want constraints that make unauthorized actions impossible to express, not just detectable after the fact.

Problem 5: Gateway as Single Point of Failure

The authors acknowledge this:

The gateway can become a single point of failure if it is not designed for high availability.

But it’s worse than availability. The gateway becomes the single point of trust. Compromise the gateway, compromise all agent authorization. Every tool call, every response, every credential—visible to whoever controls that layer.

Capability tokens are bearer credentials, but they’re also self-contained proofs. Verification requires only the public key. A compromised intermediary can drop or delay tokens, but it can’t forge them, expand their scope, or hide their usage. The cryptographic properties survive gateway compromise in ways policy databases don’t.

The Structural Alternative

Operating systems solved this problem fifty years ago. Multics didn’t ask processes to behave well—it made misbehavior impossible by construction. Each process got its own address space. Other processes’ memory wasn’t hidden; it was unaddressable.

Agents need the same treatment: not guardrails that hope for compliance, but structural isolation that makes “bad” impossible to express.

Property	Gateway + Policy	Capability Chains
Authorization model	Evaluated at gateway	Carried with request
Ambient authority	Yes (role-based)	No (transaction-scoped)
Delegation	Implicit trust chain	Cryptographic attenuation
Audit	Separate logging layer	Authorization = audit
Failure mode	Fail open (policy miss)	Fail closed (no capability)
Gateway compromise	Full access	Can’t forge capabilities

What We’d Add to the Paper

The paper’s controls aren’t wrong—they’re incomplete. Here’s what’s missing:

1. Transaction-Bound Capabilities

Every agent action should require presenting a capability scoped to that transaction. No ambient authority. The capability encodes:

What resource
What action
What constraints
When it expires
Who it’s designated for

2. Cryptographic Attenuation

When Agent A delegates to Agent B, B receives an attenuated capability. B can only narrow permissions, never expand them. The capability chain is append-only, cryptographically signed at each hop.

3. Authorization Equals Audit

The capability token required to act is the proof that you acted. No separate logging hook to bypass. If you don’t have the token, you can’t act. If you acted, the token exists.

4. Structural Memory Isolation

The paper mentions sandboxing but focuses on container boundaries. For agents, you also need memory isolation per transaction—each invocation gets its own context partition. Transaction B can’t access Transaction A’s memory, even for the same user.

Credit Where Due

This paper advances the conversation. The threat model is the most complete public treatment of MCP security risks. The framework mappings will help security teams. The proof-of-concept attacks are useful for red teams.

But defense-in-depth through policy layers treats the symptom (agents doing bad things) rather than the cause (agents having the authority to do bad things in the first place).

The confused deputy was named in 1988 and solved in operating systems. MCP security doesn’t need more guardrails. It needs structural enforcement where unauthorized actions are impossible to express—not just detectable, not just logged, but unaddressable.

That’s what capability chains provide.

This post is part of our series on agent authorization. See also: The Confused Deputy Problem · Agents Need Kernels, Not Guardrails · Why NHI Can’t Secure Agentic AI · Proof of Continuity · Capabilities 101

References

Errico, Ngiam, Sojan (2025): Securing the Model Context Protocol: Risks, Controls, and Governance
Hardy (1988): The Confused Deputy
Willison (2025): The Lethal Trifecta