GitHub's Agentic Security: Right Problem, Incomplete Solution

GitHub just published their agentic security principles for Copilot. Their threat model is correct. Their design philosophy reveals the tradeoff everyone is making.

We’ve built all of our hosted agents to maximize interpretability, minimize autonomy, and reduce anomalous behavior.

Minimize autonomy. That’s the current industry answer to agent security.

What GitHub Gets Right

Their threat model identifies three core risks:

Data exfiltration — agents with internet access leaking sensitive context
Impersonation and attribution — unclear responsibility when agents act
Prompt injection — hidden directives manipulating agent behavior

All correct. These are the same attack vectors we’ve been documenting.

Their mitigations are sensible:

Visible context — strip invisible Unicode and HTML before passing to agents
Firewalling — limit external network access
Minimal information — don’t give agents secrets they don’t need
Human-in-the-loop — require approval for irreversible actions
Clear attribution — co-commit with the initiating user
Permission-based context — only gather from authorized users

This is defense-in-depth done well. For a hosted coding agent, these controls are appropriate.

The Tradeoff They’re Making

The limiting factor is the fourth principle: “preventing irreversible state changes” through mandatory human review.

The Copilot coding agent is only able to create pull requests; it is not able to commit directly to a default branch. Pull requests created by Copilot do not run CI automatically; a human user must validate the code and manually run GitHub Actions.

This works for code review. It doesn’t work for:

Payment processing agents that need to execute transactions
Customer service agents that need to issue refunds
Healthcare agents that need to update records
Supply chain agents that need to place orders

These workflows require agents to take consequential actions autonomously. The “human-in-the-loop for everything” pattern becomes the bottleneck it was meant to eliminate.

The Alternative: Bounded Autonomy

What if instead of minimizing autonomy, you constrained it cryptographically?

GitHub's Model:
Agent proposes → Human approves → Action executes

Capability Chain Model:
Human grants scoped capability → Agent executes within bounds

The difference: GitHub’s approach requires human judgment on every consequential action. Capability chains encode that judgment upfront—this agent can issue refunds up to $200 for this customer’s order—and then the agent operates autonomously within those constraints. The challenge shifts from reviewing every action to correctly scoping the initial capability grant—a tractable problem with better tooling.

Capability chains also address GitHub’s other threat vectors. Data exfiltration? The capability specifies which data the agent can access—even if prompt-injected, it can’t reach data outside its granted scope. Prompt injection? The attacker controls the agent’s intent, but not its authority. The gateway enforces constraints regardless of what the agent tries to do.

Attribution Without Bottlenecks

GitHub’s attribution model is instructive:

Pull requests created by the Copilot coding agent are co-committed by the user who initiated the action.

This creates an audit trail: action → agent → initiator. Good for accountability.

Capability chains provide the same audit trail without requiring human approval at execution time:

Capability Chain:
├─ Root: Gateway issues refund capability to Agent A
│   └─ constraints: max_amount=$500, customer=cus_123
├─ Block 2: Agent A delegates to Agent B
│   └─ constraints: max_amount=$200 (attenuated)
└─ Execution: Agent B issues $150 refund
   └─ Signed proof of entire chain

The chain proves: who granted the original authority, what constraints accumulated, which agent executed, all cryptographically verified. This is the “traceability” that Article 14 of the EU AI Act requires.

Complementary, Not Competing

GitHub’s principles aren’t wrong—they’re appropriate for their use case. Coding agents can afford human-in-the-loop because code review is already a human process.

But the broader agent ecosystem needs a different answer. When agents cross organizational boundaries, when they process transactions, when they operate in regulated industries—“minimize autonomy” isn’t viable.

The question becomes: how do you give agents real autonomy while maintaining real security?

That’s what Proof of Continuity answers. Not “was a human watching?” but “was this action authorized by the cryptographic chain that preceded it?”

For the technical architecture, see Proof of Continuity. For attack vectors these controls address, see 5 Ways Your AI Agents Will Get Hacked.