Skip to main content
← back to blog

IAM Locked Cloudflare Out. It'll Lock Your Agents Out Too.

Cloudflare's December 2025 resilience report reveals what every zero-trust org learns: your security stack becomes the outage when the platform is on fire.

security iam availability cloudflare incidents

On December 19, 2025, Cloudflare published its resilience plan after two major outages. Buried in it is a confession every “zero trust” org eventually makes: their own security controls slowed them down when the platform was on fire.

From Cloudflare’s post:

During the incidents, it took us too long to resolve the problem. In both cases, this was worsened by our security systems preventing team members from accessing the tools they needed to fix the problem.

And:

Turnstile became unavailable. As we use Turnstile on the login flow to the Cloudflare dashboard, customers who did not have active sessions were not able to log in to Cloudflare in the moment of most need.

Cloudflare is world-class at security. When things broke, their security stack became part of the outage.

That isn’t a Cloudflare-specific failure.

It’s what happens when your authorization system is a dependency graph.

IAM Is a Dependency Graph You Hope Never Fails

IAM answers: “Is this identity allowed to do this action right now?”

To answer that at runtime, you usually need a chain of services:

  • An identity provider (who is this?)
  • A session mechanism (are they logged in?)
  • A policy engine (what are they allowed to do?)
  • Context/risk signals (what’s the situation?)
  • Control-plane connectivity to reach all of the above

Each of these can fail. Each has dependencies. And in many real systems, parts of auth run on—or depend on—the same platform you’re trying to recover.

Cloudflare says it directly: security systems slowed mitigation, and circular dependencies made internal tooling unavailable when they needed it most.

This is the central IAM flaw: availability is not a “nice-to-have.” It’s a prerequisite. And IAM stacks love putting it at risk.

Turnstile Is the Perfect Failure Mode

During the November 18 incident, Turnstile became unavailable. Turnstile sits on the dashboard login flow. Result: customers without active sessions (or API service tokens) couldn’t log in at the moment they most needed access.

That’s the nightmare in one sentence:

Your control plane is gated by your data plane.

When the platform is degraded, the path to fixing it is the first thing to break.

”Break Glass” Is a Confession

Cloudflare’s plan includes improving “break glass” procedures—ways for engineers to bypass normal auth in emergencies.

Break glass exists because normal IAM routinely fails under incident conditions.

And it creates its own problems:

  • Who gets the bypass?
  • How is it audited?
  • How often is it tested?
  • What if the bypass path depends on the same systems that are down?

You’re building a second authorization system because the first one can’t be trusted when availability matters.

The Alternative: Capability-Based Authorization

Capabilities don’t ask “who are you?” at runtime.

They ask: “Is this token valid for this operation?”

A capability token carries its own authority: scope, constraints, expiry. Verification can be local at the enforcement point: check the signature, evaluate the embedded constraints. Issuance and revocation still require online services, but execution can remain local to the gateway.

No live policy call. No session store dependency. Verification happens at the gateway—not across a chain of services.

So the comparison isn’t “login vs no login.” It’s:

  • IAM: authorization depends on live services you might have just taken down
  • Capabilities: authorization can remain verifiable even when those services are degraded

Break glass becomes boring: pre-issued, tightly-scoped emergency tokens that continue to verify even if the rest of your auth stack is on fire.

Capabilities aren’t magic—you still need issuance, rotation, and revocation strategies—but they remove the worst failure mode: your authorization system becoming the availability choke point.

This Gets Worse With Agents

Humans can wait. Humans can retry. Humans can call someone.

Agents can’t.

If your agent’s authorization path depends on interactive login flows or a fragile runtime service chain, the agent doesn’t “degrade.” It stops.

And the whole point of agents is to keep operating when things are broken—that’s when you want automation the most.

If auth is inside the blast radius, your agent dies exactly when it’s supposed to be useful.

The Point

IAM was built for humans logging into web apps. It assumes:

  • Interactive flows are acceptable
  • The auth stack is more available than the app
  • Circular dependencies can be managed with procedures

Those assumptions don’t hold for agents. And increasingly, they don’t hold for humans either.

Cloudflare can harden its processes. But the fragility here isn’t just operational. It’s architectural.

When your security infrastructure locks you out during an emergency, the problem isn’t the emergency.

It’s the infrastructure.


Amla Labs is building capability-based authorization for AI agents. For background, see Capabilities 101 and Proof of Continuity.