Skip to main content
← back to blog

Your Agent's Memory Is a Security Hole

Default agent memory patterns leak unless you enforce scoping at the runtime boundary. The problem isn't implementation bugs—it's architectural.

memory security architecture enterprise

Every major AI assistant now has memory: saved preferences, long-term “facts,” cross-session context. It makes agents more useful—and fundamentally more dangerous.

The problem isn’t implementation bugs. It’s architectural: default agent memory patterns leak unless you enforce scoping at the runtime boundary.

Most security guidance focuses on multi-tenant isolation—separate indexes per customer, metadata filters, row-level security. That’s SaaS 101. It solves gross cross-tenant leaks but leaves subtler problems unaddressed.

The real boundary isn’t the user. It’s the transaction.

Even for a single user, conversations accumulate. A sensitive query in Session A (health data, salary information) can semantically bleed into Session B via shared memory or RAG—if nothing enforces transaction-level isolation. Per-tenant stores solve the obvious multi-tenant leak. Per-transaction mediation solves the architectural one.

The Memory Problem in One Sentence

Agent memory is shared state that accumulates over time. In multi-tenant or multi-workflow environments, that shared state becomes a data leakage vector—not because of missing guardrails, but because the system can name data it shouldn’t.

Here’s the pattern in a typical B2B SaaS deployment:

Customer support agent serving multiple enterprise clients
Shared vector store for "helpful context"
Shared memory for "learned preferences"

Tenant A's support ticket: contains salary data, org chart
Agent writes to shared memory/RAG index

Tenant B's user asks an innocuous question
Semantic search returns Tenant A's chunks (similar embeddings)
Model paraphrases the retrieved content
Tenant B sees Tenant A's salary data

This isn’t a bug in any specific framework. It’s what happens when retrieval is semantic and the pipeline is wired as:

Take whatever retrieval returns → stuff into prompt → the model might print (or paraphrase) it.

That “stuffing into prompt” step is the amplification. The moment a wrong chunk enters context, the leak is just summarization.

Two Confused Deputies

The confused deputy problem (Hardy, 1988) describes a pattern where a deputy with authority the requestor lacks exercises that authority on the requestor’s behalf—but for the wrong principal or purpose. Hardy’s formalization is the standard reference, and it captures a broader class of designation/authority mismatches that include both write-side and read-side failures.

What’s less discussed: agents are confused deputies in two distinct ways.

LayerConfused Deputy TypeWhat’s MisappliedThe Failure
EffectsAuthorityWrong principal’s capabilityAgent uses Alice’s credentials while processing Bob’s transaction
InputsInformationWrong principal’s dataAgent retrieves Alice’s data while assembling Bob’s context

Both share the same structural shape: an untrusted executor sits at a confluence of streams (authorities or data). If it can choose which stream to consult, it can misapply it—accidentally or under prompt injection.

The Authority Confused Deputy (Effects)

Agent receives Transaction A (Alice): gets Cap_A ($1000 limit)
Agent receives Transaction B (Bob): gets Cap_B ($100 limit)

Agent, processing Bob's transaction, invokes Cap_A.
Bob's transaction executes with Alice's authority.

The deputy accumulated multiple capabilities and picked the wrong one.

The Information Confused Deputy (Inputs)

Agent processes Transaction A (Alice): writes Alice's salary to memory
Agent processes Transaction B (Bob): retrieves "salary" context

Bob's response includes Alice's salary data.

The deputy accumulated access to a shared memory surface and pulled the wrong tenant’s data into context.

Same Fix: The Deputy Shouldn’t Choose

For authority: bind capabilities to transactions, not agents. The agent doesn’t select which credential to use; the transaction designates it.

For information: bind context assembly to transactions, not agents. The agent doesn’t search “all memories”; the context router assembles only what’s designated for this transaction.

In both cases, a trusted layer outside the deputy’s control enforces the binding.

Why Per-Tenant Isolation Isn’t Enough

The standard advice—“use tenant-specific indexes,” “apply metadata filters,” “enable row-level security”—solves the first-order problem. But it misses three failure modes that per-transaction scoping addresses:

1. Intra-user cross-session leakage

A user asks about their medical condition in Session A. That context gets written to “helpful memory.” In Session B, they’re drafting a work email. The model helpfully incorporates health context into a professional communication.

Same user. Same tenant. Still a leak.

2. Blast radius accumulation

If one transaction gets poisoned (prompt injection via document, calendar invite, email subject), the damage spreads to every subsequent transaction in that session. Per-transaction scoping contains the blast radius: one poisoned transaction can’t corrupt the next.

3. Persistent poisoning attacks

SpAIware, delayed tool invocation, and similar attacks rely on malicious content persisting across turns or transactions. Ephemeral per-transaction contexts—where memory writes happen only at transaction end, after verification—make long-lived manipulation structurally harder.

The OS analogy holds: processes aren’t isolated just per-user—they’re isolated per-process. Even processes with the same uid get fresh address spaces. It’s the structural guarantee that makes “bad” inexpressible.

Two Problems, One Root Cause

Memory vulnerabilities manifest as two distinct failure modes:

Poisoning is an untrusted write problem. Attackers inject malicious content or instructions into long-lived stores (memories, RAG indexes, conversation history). The system later trusts and replays it.

Boundary failure is an unauthorized read problem. Benign data bleeds across tenants or sessions because the system can “name” it. Retrieval returns data the current principal shouldn’t access.

Guardrails mostly target poisoning (detect malicious writes, filter suspicious outputs). They can’t solve boundary failure—because the issue isn’t malicious content, it’s missing isolation boundaries that prevent cross-context reads.

The solution in both cases is designation, not detection: a trusted layer binds each transaction to its authorized capabilities and its authorized context before the model runs.

Real Attacks, Real Impact

SpAIware: Persistent Spyware in ChatGPT

On September 20, 2024, Johann Rehberger demonstrated “SpAIware”: persistent prompt injection stored in ChatGPT memory that influenced future sessions and enabled data exfiltration.

OpenAI mitigated the disclosed exfiltration path in ChatGPT v1.2024.247, but Rehberger notes that untrusted content can still invoke the memory tool to store arbitrary memories—the underlying write-to-memory surface remains open.

Gemini Memory Poisoning

In February 2025, Rehberger demonstrated a similar pattern against Gemini using “delayed tool invocation,” where memory writes are triggered on a later turn after the model believes it’s responding to the user.

OWASP explicitly cites the Gemini Memory Attack as a real-world example of ASI06 (Memory & Context Poisoning) in their agentic threat taxonomy.

The Gemini Trifecta

On September 30, 2025, Tenable disclosed three Gemini vulnerabilities spanning Cloud Assist, Search Personalization, and the Browsing tool. Their key takeaway:

AI itself can be turned into the attack vehicle, not just the target.

Targeted Promptware: Calendar Invites as Attack Vectors

SafeBreach’s Black Hat USA 2025 research shows how malicious prompts embedded in calendar invites, email subjects, or document names can compromise Gemini. Of 14 scenarios tested, 73% achieved high-critical impact.

The key insight: anything that enters context can carry instructions.

Windsurf: SpAIware in Coding Agents

On August 22, 2025, Rehberger reported the SpAIware persistence pattern in Windsurf Cascade: a memory tool invoked automatically, enabling prompt injection to persist across future coding sessions.

Cross-Session Leak: The Boundary Failure Pattern

Giskard formalizes “Cross Session Leak” as a distinct vulnerability class: sensitive info from one session bleeds into another in multi-tenant systems due to shared caches, shared memory, or poorly scoped context.

Their red-team methodology is simple:

  1. Inject canary data in Session/Tenant 1
  2. Attempt retrieval from Session/Tenant 2

If it crosses the boundary, your architecture is leaking.

This is the information confused deputy in its purest form: the agent can “name” data it shouldn’t, because nothing in the addressing scheme prevents cross-context references.

Why RAG Makes It Worse

RAG turns “memory” into infrastructure: vector stores, embedding search, semantic caches. Two consequences matter:

  1. Semantic retrieval doesn’t respect authorization boundaries by default.
  2. Once a wrong chunk enters the prompt, the model can paraphrase it. Output filters catch strings; they don’t catch summaries.

IronCore Labs notes that vector embeddings derived from private data can be inverted to approximate the original content, and that vector DB security maturity is uneven.

Microsoft’s multitenant RAG guidance makes the tradeoff explicit:

  • Store-per-tenant: expensive, strong boundaries
  • Shared store: requires strict, query-time authorization encapsulated in an API layer

Most deployments pick shared stores and hope metadata filters are enough.

ConfusedPilot: Confused Deputy Risks in RAG

In ConfusedPilot (Aug 9, 2024), researchers apply the confused deputy label directly to RAG: malicious documents (the “attack”) trick the model (the “deputy”) into integrity failures and confidentiality leaks, including leaks that exploit retrieval caching.

This is the same failure mode as enterprise “prompt stuffing”: once attacker-controlled or out-of-scope content enters the modified prompt, the model becomes the deputy that amplifies it.

Academic Validation: Topology Matters

A December 2025 paper, Topology Matters: Measuring Memory Leakage in Multi-Agent LLMs (arXiv v1: Dec 4, 2025; v2: Dec 8, 2025), introduces MAMA to measure how architecture drives leakage.

Key findings:

  • Fully-connected topologies leak the most; chains leak the least
  • Shorter attacker-target distance increases vulnerability
  • Leakage spikes early, then plateaus
  • Temporal/locational PII leaks more readily than identity credentials

The core point: leakage is structural and predictable—not random “model weirdness.”

OWASP Codifies the Threat

On December 9, 2025, OWASP published its Top 10 for Agentic Applications, including ASI06: Memory & Context Poisoning—persistent manipulation of agent memory, shared context, or retrieval pipelines that reshapes behavior long after initial interaction.

OWASP’s agentic framing matters because it highlights what’s new: agents are stateful, tool-using, and context-accumulating—so failures persist and compound.

Why Guardrails Don’t Solve Either Confused Deputy

Behavioral controls address poisoning better than boundary failure:

ApproachWhat It AddressesWhat It Misses
Prompt rulesMalicious instructionsBenign data in wrong context
Output filteringSensitive strings in responsesLeak already happened at retrieval
Anomaly detectionWeird access patternsNormal queries returning wrong data
Metadata filtersExplicit tenant tagsSemantic similarity ignoring tags

Once the wrong chunk enters context, “don’t leak” becomes a behavioral hope, not a security property.

The Structural Solution: Mediating Both Deputies

Operating systems solved process isolation structurally: a process can’t read another process’s memory because the addressing scheme and MMU won’t allow it.

Agents need the same idea for both layers—but scoped to the transaction, not the user:

LayerWhat’s MediatedEnforcement Mechanism
EffectsTool invocationsCapability-bound execution: only transaction-designated capabilities can be exercised
InputsContext assemblyContext-scoped retrieval: only transaction-designated data can enter the prompt

A Context Kernel enforces both by ensuring the agent never chooses which stream to consult. Think “page tables for context”: the model only sees what’s mapped into this transaction.

Three Invariants

1. The model never receives a handle that can name out-of-scope memory.

Every transaction gets a cryptographic context handle. Both capabilities and memory are scoped to it—not to the user, not to the session:

Transaction T:
  context_id: ctx_abc123
  designated_capabilities: [Cap_A with $100 limit]
  designated_memory_scope: [tenant_acme/user_bob/transaction_T/*]

Even within Bob’s own history, Transaction T can only access what’s explicitly designated for T. Previous conversations are unaddressable unless specifically mapped in.

2. Retrieval returns (chunk, proof-of-scope), or nothing.

The runtime has no ambient access to tools or memory—only transaction-scoped capabilities:

memory_capability: {
  namespace: "tenant_acme/user_bob/transaction_T",
  context: "ctx_abc123",
  operations: ["read"],
  ttl: "transaction_duration"
}

3. Memory writes happen at transaction end, after verification.

The agent can propose writes. The kernel verifies they’re within scope before persisting. Poisoned content can’t accumulate across transactions because each write is gated.

This is the difference between “the agent might leak data across sessions” and “the agent can’t name data from other sessions to leak it.”

When This Isn’t an “Agent Problem”

To be precise: if you have hard tenant partitioning (separate indexes, namespaces, encryption keys), retrieval-time authorization (ABAC/RLS filters applied before chunks return), and no reuse of conversation buffers across transactions—then “memory leakage” becomes the same problem as any multi-tenant application.

Standard access control. Standard data governance.

The issue is that default agent memory patterns don’t have these properties. Shared vector stores, shared semantic caches, shared “helpful memories,” retrieval that operates on similarity rather than authorization—these are the common deployment shapes, and they leak.

Enterprises Are Already Feeling This

This isn’t theoretical. LangChain’s State of AI Agents survey (November-December 2025, 1,340 respondents) found that security is the second-largest blocker to production deployment for large enterprises—cited by 24.9% of companies with 2,000+ employees, surpassing latency.

SailPoint’s 2025 research puts a number on the pain: 80% of companies report their AI agents have taken unintended actions, including 39% accessing unauthorized systems and 33% accessing inappropriate or sensitive data.

UiPath’s 2025 Agentic AI Report (surveying 252 U.S. IT executives) confirms the pattern: 56% identify IT security issues as their top agentic AI concern—ahead of integration and implementation costs.

The infrastructure is catching up. Google’s Agent Sandbox (announced at KubeCon NA 2025) addresses isolation at the infrastructure layer:

Agentic code execution and computer use require an isolated sandbox to be provisioned for each task.

Google’s “task” maps to our “transaction”—both represent the per-invocation isolation boundary. Agent Sandbox provides the infrastructure layer (process isolation, network restrictions). The authorization layer that binds capabilities and memory to that boundary is what’s still missing in most stacks.

What Enterprises Should Do Now

Ordered by highest ROI / lowest regret:

  1. Disable cross-tenant sharing by default. No shared vector indexes. No shared semantic caches. No “global memories.”
  2. Enforce authorization at query time. Every chunk, every retrieval, based on current user’s permissions.
  3. Treat all retrieved text as untrusted input. Document titles, calendar entries, log lines—all attack surfaces.
  4. Implement continuous canary testing. Cross-session leak probes, Giskard-style.
  5. Audit memory features before enabling. Disable long-term memory for sensitive workflows.
  6. Review stored memories as a fallback. Manual inspection doesn’t scale, but it catches obvious failures.

The Memory Problem Won’t Solve Itself

Agents are confused deputies twice over: once for authority (what they can do), once for information (what they can see). Solving one without the other leaves half the attack surface exposed.

The solution isn’t more policies. It’s structural mediation: a kernel that controls both what the model can see and what it can do, with transaction-level bindings the agent can’t override.


References


Related: The Confused Deputy Problem, Explained · MCP Standardizes Tools. It Doesn’t Secure Them. · Proof of Continuity