Capabilities 101: The Security Primitive That Changes Everything
What capabilities are, how they differ from ACLs, and why they matter for AI agent security—but also why capabilities alone aren't enough.
Everyone building AI agents eventually hits the same wall: how do you give an agent access to do its job without giving it access to everything?
The answer—developed over 60 years of operating systems research—is capabilities. Not the vague “AI capabilities” people debate on Twitter/X. Actual capability tokens: cryptographic primitives that combine what you can access with proof that you’re allowed to access it.
This post explains what capabilities are, how they work, and why they’re essential for agent security. But it also explains why capabilities alone aren’t sufficient—and what’s missing.
The Problem Capabilities Solve
Traditional access control asks: “Who are you?”
You present credentials. The system looks you up in an access control list (ACL). If your identity has the right permissions, access is granted.
This works fine when humans make requests at human speed. It breaks down when:
- Agents delegate to other agents. By the third hop, which identity should the ACL check?
- Permissions need to narrow. Can Agent B have less access than Agent A gave it?
- Decisions happen at machine speed. Can we really query a policy server for every action?
Capabilities flip the question: “What do you possess?”
Instead of checking identity against a list, you present a token that is your authorization. The token designates the resource and encodes your permissions. No identity lookup required.
What Is a Capability?
A capability is an unforgeable token that combines two things:
- Designation: Which resource does this refer to?
- Authority: What operations are permitted?
Alan Karp’s definition captures it precisely: “An unforgeable, transferable permission to use the resource it designates.”
The key insight is that designation and authority travel together. You can’t separate “I’m pointing at this file” from “I can write to it.” The reference is the permission.
Traditional (ACL):
Subject → "I am Alice"
System → [lookup Alice in ACL for Resource X]
System → "Alice has write permission, access granted"
Capability:
Subject → [presents capability token for Resource X with write permission]
System → [verifies token is valid and unforged]
System → "Token grants write permission, access granted"The difference seems subtle but has profound security implications.
Three Properties That Matter
Unforgeable. You cannot create a capability to something you don’t have access to. In memory-safe languages, object references satisfy this—you can’t conjure a pointer to an object you’ve never received. In distributed systems, cryptographic signatures provide unforgeability.
Transferable. Capabilities can be passed from one entity to another. Alice can give Bob her capability to read a file. Bob now has that access without any central authority needing to update an ACL.
Attenuatable. When transferring a capability, you can narrow its permissions but never expand them. Alice can give Bob read-only access derived from her read-write capability. Bob cannot escalate back to write access.
ACLs vs Capabilities: A Comparison
| Aspect | ACL Model | Capability Model |
|---|---|---|
| Core Question | ”Who are you?" | "What do you possess?” |
| Where Permissions Live | Centralized list per resource | Distributed in tokens |
| Delegation | Requires admin to update ACL | Pass the token |
| Attenuation | Requires new ACL entry | Derive restricted token |
| Revocation | Update the list | Invalidate the token |
| Confused Deputy | Vulnerable | Reduced (but not eliminated) |
The confused deputy vulnerability is instructive. In ACL systems, a program might accidentally use its own elevated permissions to act on behalf of a malicious user. The program has permissions A and permissions B; nothing distinguishes which should apply to which transaction.
Capabilities reduce this surface because you must possess the specific capability to exercise it. But as we’ll see, possession alone doesn’t solve everything.
Classic Capabilities vs Object Capabilities (OCAP)
Here’s where terminology gets confusing. There are two distinct capability traditions, and conflating them causes problems.
Classic (Kernel-Level) Capability Systems
In systems like Hydra (1970s) or POSIX capabilities, a capability is a token stored in a kernel-managed table called a C-List (Capability List).
Process A's C-List:
┌───────┬────────────────┬─────────┐
│ Index │ Object │ Rights │
├───────┼────────────────┼─────────┤
│ 0 │ File /data/x │ R │
│ 1 │ File /data/y │ RW │
│ 2 │ Device /dev/z │ X │
└───────┴────────────────┴─────────┘To access a resource, the process provides an index to its C-List. The kernel looks up the capability, verifies the rights, and performs the operation.
Key characteristic: The capability (index + rights) is separate from the object reference. The kernel mediates every access.
Object-Capability (OCAP) Systems
In OCAP systems like the E language, seL4, or Pony, the object reference itself is the capability. If you hold a reference to an object, you can invoke its methods. There’s no separate “rights” check.
// OCAP style (pseudocode)
function transferMoney(fromAccount, toAccount, amount) {
// If you have the reference, you have the capability
fromAccount.debit(amount);
toAccount.credit(amount);
}Key characteristic: Authority is embedded in the reference. No separate lookup. “Only connectivity begets connectivity”—you can only reach objects you’ve been given references to.
The Critical Differences
| Feature | Classic Capability | Object Capability (OCAP) |
|---|---|---|
| Authority Primitive | Index + Rights bits | Unforgeable object reference |
| Enforcement | Kernel checks a table | Object interface / language runtime |
| Delegation | Copy entry in C-List | Pass object reference in message |
| Attenuation | Modify rights bits | Create wrapper/proxy object |
| Ambient Authority | Often present | Forbidden by design |
The OCAP model is generally considered “purer” because it eliminates ambient authority entirely. In classic systems, a process might still access resources through paths outside its C-List (environment variables, global state). In strict OCAP, the only way to access anything is through references you’ve been explicitly given.
seL4: Capabilities in Practice
seL4, the formally verified microkernel, implements capabilities at the OS level:
- CNodes (Capability Nodes) are arrays that hold capabilities
- CSlots are individual positions in a CNode
- CSpace is a thread’s complete capability namespace
When a thread wants to use a capability, it provides the CSlot index. The kernel verifies the capability exists and the operation is permitted. Authority derives entirely from what capabilities a thread possesses in its CSpace.
Thread A's CSpace:
┌─────────────────────────────────────┐
│ CNode (root) │
├─────────────────────────────────────┤
│ Slot 0: TCB cap (Thread Control) │
│ Slot 1: Endpoint cap (IPC) │
│ Slot 2: Untyped cap (Memory) │
│ Slot 3: CNode cap (to another CNode)│
└─────────────────────────────────────┘seL4 proves a critical property: if a thread doesn’t hold a capability, it cannot access the corresponding resource. Period. The kernel formally guarantees this.
Why Capabilities Alone Aren’t Enough
Capabilities elegantly solve the “what can you access?” problem. They don’t solve the “which of your capabilities should you use for this transaction?” problem.
This is the confused deputy problem, and it persists even in capability systems.
Consider: An agent legitimately receives Capability A for Customer X’s records and Capability B (a superuser key) for a compliance investigation. Both are valid. The agent possesses both legitimately.
Now the agent processes a transaction from Customer X. Nothing in the capability model prevents the agent from using Capability B instead of Capability A.
flowchart TB
subgraph Agent["Agent's Capability Space"]
capA["Capability A<br/>(Customer X only)"]
capB["Capability B<br/>(Superuser access)"]
end
subgraph Transaction["Customer X's Transaction"]
req["'Get my records'"]
end
req --> Agent
capA -.->|"Correct choice"| result1["Customer X records"]
capB -.->|"Wrong choice<br/>(but valid capability)"| result2["Any customer's records"]
style capA stroke:#090
style capB stroke:#f90The capability model guarantees:
- You can only exercise authority you possess
- You can only possess authority explicitly granted
It does not guarantee:
- You will exercise the right authority for this transaction
Capabilities shrink the attack surface (the agent can only access what it has capabilities for), but within that space, confusion remains possible.
What’s Missing: Provenance
The gap is provenance—knowing not just what capabilities exist, but why they exist and which should apply to a specific transaction.
When Customer X initiates a transaction, the system should bind:
- This transaction → This specific capability → This execution context
That binding should travel with the transaction through every delegation hop. When the agent (or sub-agent, or sub-sub-agent) finally acts, the system verifies: “Are you using the capability designated for this transaction?”
This is what we call Proof of Continuity. Not just proof that you possess a capability, but proof that you’re the designated executor of a specific transaction, using the specific capability that was bound when the transaction began.
Capability alone:
"Do you have a valid capability?" → Yes (but which one?)
Capability + Provenance:
"Are you the designated continuation of Transaction T,
operating with Capability C that was bound at origin?" → VerifiableWhat Amla Labs Builds
We build on decades of capability research—Dennis and Van Horn (1966), seL4, Biscuit tokens—and extend it for AI agent workflows.
Our capability chains are append-only sequences of cryptographically signed blocks. Each block:
- Designates the resource and permissions (standard capability)
- Binds to a specific transaction context
- Designates the next authorized executor
- Can only attenuate (narrow), never expand
When an agent receives a capability chain, it’s not just “here’s what you can do.” It’s “here’s what you can do, for this specific transaction, and you must prove you’re the designated executor.”
Token theft becomes useless—the attacker can’t sign as the designated executor. Confused deputy attacks are structurally prevented—the transaction binding, not the agent’s choice, determines which capability applies.
Capabilities provide the foundation. Provenance provides the missing piece. Together, they enable authorization that actually works for autonomous systems.
References
- Alan Karp: “Capabilities 101” — Clear introduction to capability concepts
- seL4 Foundation: “Capabilities Tutorial” — How capabilities work in a formally verified OS
- Chip Morningstar: “What Are Capabilities?” — Historical context and conceptual foundations
- Wikipedia: “Object-capability model” — Overview of OCAP systems and implementations
- TerseSystems: “Introduction to Object Capabilities” — OCAP principles and advantages over ACLs
- Dennis and Van Horn (1966): “Programming Semantics for Multiprogrammed Computations” — The foundational paper introducing capability concepts
- Mark Miller (2006): “Robust Composition” — PhD thesis on object-capability security
Related Posts
- The Confused Deputy Problem: Why capabilities alone don’t prevent all authorization failures
- Proof of Continuity: How provenance tracking completes the capability model
- The Missing Layer: Where capability-based authorization fits in the agent stack