GitHub uv pip install "git+https://github.com/amlalabs/amla-sandbox"

Give agents a scratchpad

Code Mode for AI agents. Process data locally, return only what matters.

13MB binary^¹
<10ms cold start^²
Zero infrastructure
Full audit trail

¹WASM binary size. ²Measured on Ryzen 9 9900X, browser WASM instantiation.

See it in action

How agents use the sandbox

Agent Workflow

User

Summarize this 50MB sales CSV and tell me the top 3 products by revenue.

Agent

Sandbox

# Process 50MB locally

Agent → User

Input: 50MB Context: 89 bytes

Try the sandbox now

Live in your browser

Interactive Shell Demo

This demo runs entirely in your browser and requires JavaScript/WebAssembly.

$ echo "Hello from amla-sandbox"

Hello from amla-sandbox

$ ls /tools

grep cut sort jq cat head tail wc

Run it locally:

uv pip install git+https://github.com/amlalabs/amla-sandbox

Live Shell Demo

Run shell commands, JavaScript, and explore the virtual filesystem—all in your browser.

~13MB download • Cached for future visits

○ WebAssembly ○ Memory

Deterministic Replay

Debug any workflow, anytime

External calls are captured. Replay produces exactly identical results.

1 Live Execution Recording

agent process-refund

Snapshot

2 Replay Execution Injecting

Replay Execution

Output matches exactly

Audit decisions, debug issues, prove compliance—weeks after the fact.

The context bloat problem

Anthropic reduced internal tool definitions from 150K→2K tokens (98.7%) with code execution.

Without sandbox

Agent: stripe.list(...)

→ 47KB JSON to context

Agent: stripe.get(...)

→ 3KB more to context

50KB+ per query

Context fills up fast. Costs skyrocket.

With sandbox

Agent: writes JS to sandbox

→ Results stored in /workspace/

→ jq '.txn_id' → "txn_456"

~100 bytes in context

Process data locally. Return only what matters.

The subprocess problem

Most agent frameworks run LLM-generated code through raw subprocess. That's arbitrary code execution on your host—no isolation, no capability restrictions.

Agent frameworks and their execution methods
Framework	Execution Method	Source
LangChain	Direct Python exec	python.py:53
AutoGPT	subprocess.run(shell=True)	Issue #3345
AutoGen	create_subprocess_exec()	__init__.py:410
SWE-Agent	run_in_session(BashAction())	swe_env.py:215
MetaGPT	subprocess.run(["python3", "-c", code])	build_customized_agent.py:48

Raw subprocess

✗ Full host access, no isolation
✗ One prompt injection = game over
✗ 10 tool calls = 10 LLM round trips

amla-sandbox

✓ WASM sandbox, no syscalls
✓ Capability tokens validate every call
✓ 1 script = 1 round trip (98% fewer tokens)

File Scratchpad
Store results, process locally, return only what matters.
Shell Applets
grep, cut, sort, jq. Pipes, redirects, heredocs just work.
Capability Tokens
Constrain parameters, limit calls, enforce patterns.
Deterministic Replay
Coroutine protocol. Step, yield, fully reproducible.

Enterprise ready, zero config

No VMs. No containers. No cloud dependencies.

pip install
One command. Works in CI, notebooks, and production. No infrastructure to provision.
$0 marginal cost
WASM runs in your process. No cloud API calls. Scale to millions at CPU cost only.
Full audit trail
Every tool call logged with timestamps. Deterministic replay for debugging.

Capabilities as code

Define what agents can do. Every tool call is validated.

capabilities.py

sandbox = Sandbox(
    capabilities=[
        MethodCapability(
            method_pattern="stripe/charges/*",
            constraints=[
                Param("amount").lte(10000),  # 10000 cents = $100 max
                Param("currency").is_in(["USD", "EUR"]),
            ],
            max_calls=100,
        ),
    ],
)

Pattern matching stripe/** • */create
Constraint DSL comparisons, sets
Call budgets prevent runaway

How it compares

amla-sandbox

Setup: pip install
Isolation: WASM sandbox
Cold start: <10ms
Authorization: Capability tokens
Replay: Deterministic
Context: File scratchpad

eval()

No isolation, full code injection risk

Local Shell

No isolation, full host access

E2B

Remote API, 200–500ms cold start

Docker/VM

Heavy infra, 1–10s cold start, ops overhead

Comparison of sandbox solutions
Feature	amla-sandbox	eval()	Local Shell	E2B	Docker/VM
Setup	pip install	Built-in	Built-in	Remote API	Self-hosted
Isolation	WASM sandbox	None	None	Firecracker VM	Container/VM
Cold start	<10ms	0ms	0ms	200–500ms¹	1–10s²
Code injection risk	Sandboxed	Full access	Full access	Sandboxed	Sandboxed
Authorization	Capability tokens	None	None	None	Allowlists
Deterministic replay	Yes	No	No	No	No

¹E2B docs. ²Typical container cold start.

Other sandboxes focus on isolation. amla adds authorization, deterministic replay, and context budget control—managing what agents can do and see, not just where they run.

Frequently Asked Questions

For Engineers

How does the scratchpad work?: A POSIX-like filesystem in WASM memory. Files persist for the session. Store API responses, intermediate computations, or scratch data—then extract only what you need.
What shell commands are available?: grep, cut, sort, uniq, head, tail, wc, cat, jq, tr, find, and more. WebAssembly applets, not system calls. Pipes work.
Which LLM frameworks are supported?: LangGraph and CrewAI adapters ship out of the box. The core Sandbox class works with any framework.
Why is authorization built into the sandbox?: The sandbox already intercepts every tool call—they yield to the host, not execute directly. That's the natural chokepoint for authz. External authorization would add a hop, lose execution context, and break deterministic replay. Plus, capability tokens enable secure agent-to-agent delegation: when your agent spawns sub-agents, authority automatically attenuates.

For Platform Teams

Does data leave my infrastructure?: Never. amla-sandbox runs entirely in your process—no cloud calls, no data exfiltration. Your code, your data, your control.
What does deployment look like?: uv pip install git+https://github.com/amlalabs/amla-sandbox. That's it. No containers, no VMs, no cloud accounts. Empower developers to move fast without waiting on IT.
What's the security model?: WASM memory isolation + capability tokens. No direct network or syscalls from inside the sandbox—only host-mediated tool calls through a single chokepoint. Enterprise-friendly: zero attack surface expansion.
How do capability tokens work?: Unforgeable tokens specify which methods the sandbox can call and with what constraints. Every tool call is validated. Full audit trail included.

Capabilities scale naturally to multi-agent architectures—when Agent A delegates to Agent B, it can only grant a subset of its own authority. Attenuation is cryptographically enforced, not configured.

Technical Architecture

The Sandbox Binary

A 13MB statically-linked binary containing a WebAssembly runtime, virtual filesystem, and capability interpreter. Ships with no external dependencies.

Execution Model

Every external API call is intercepted and validated against the capability chain. Reads and writes go to a copy-on-write overlay. The agent never touches the real filesystem.

Capture Format

The WASM runtime is constrained so all external effects flow through host-mediated calls under full control. We record inputs (API responses, file reads, timestamps) in a compact binary format. Replay substitutes these values exactly, making execution deterministic.

Capability Attenuation

When an agent delegates to a sub-agent, it can only grant a subset of its own capabilities. The sandbox enforces this at the API boundary—no configuration required.

Give your agents a scratchpad

13MB binary. Zero infrastructure. Runs anywhere Python runs.

uv pip install "git+https://github.com/amlalabs/amla-sandbox"

Give agents a scratchpad

How agents use the sandbox

Live in your browser

Live Shell Demo

Loading Runtime...

Failed to Load

Granted Capabilities

Constraints

Debug any workflow, anytime

The context bloat problem

The subprocess problem

Raw subprocess

amla-sandbox

File Scratchpad

Shell Applets

Capability Tokens

Deterministic Replay

Enterprise ready, zero config

pip install

$0 marginal cost

Full audit trail

Capabilities as code

How it compares

amla-sandbox

eval()

Local Shell

E2B

Docker/VM

Frequently Asked Questions

For Engineers

For Platform Teams

The Sandbox Binary

Execution Model

Capture Format

Capability Attenuation

Give your agents a scratchpad

Give agents a scratchpad

How agents use the sandbox

Live in your browser

Live Shell Demo

Loading Runtime...

Failed to Load

Granted Capabilities

Constraints

Debug any workflow, anytime

The context bloat problem

The subprocess problem

Raw subprocess

amla-sandbox

Core features

File Scratchpad

Shell Applets

Capability Tokens

Deterministic Replay

Enterprise ready, zero config

pip install

$0 marginal cost

Full audit trail

Capabilities as code

How it compares

amla-sandbox

eval()

Local Shell

E2B

Docker/VM

Frequently Asked Questions

For Engineers

For Platform Teams

The Sandbox Binary

Execution Model

Capture Format

Capability Attenuation

Give your agents a scratchpad