Skip to main content
← back to blog

A VM Snapshot Is a Quiescence Protocol

Why zygote-style VM spawning is less about copying memory and more about establishing that every device, backend, and guest channel is quiet enough to freeze.

infrastructure virtualization rust kalahari

Kalahari’s zygote API is deliberately small:

import { KalahariClient } from '@amlalabs/kalahari';

const client = new KalahariClient({ image: 'node:22-alpine' });
const sandbox = await client.createSandbox();

await sandbox.writeFile('/workspace/state.txt', 'base\n');

const zygote = await sandbox.zygote();
const first = await zygote.spawn();
const second = await zygote.spawn();

That shape is the product goal. Boot a sandbox once, prepare the filesystem and runtime state once, then branch it into isolated children without making users manage a VM lifecycle.

The implementation has a harder job. A zygote is not a file copy, and it is not just guest RAM. It is the point where Kalahari has to establish that guest memory, vCPU state, interrupt state, virtio queues, backend workers, command channels, and host-owned buffers all describe the same moment.

If that evidence is missing, the API is simple in the wrong way. Children can inherit a descriptor that was consumed but never completed, an interrupt that was meant for the parent, output that already escaped into a host receiver, or a backend task that still thinks it owns a VM that has become a template.

Freeze Is Not Pause

A paused VM can still have active host state.

The guest may have stopped executing instructions, but the VMM can still be finishing device work on its behalf. A virtio queue may have pending synchronous work. A filesystem backend may be about to publish a reply. The guest-control transport may have host-to-guest frames staged in transient host memory instead of durable VM state. A PTY pump may have stdout in a callback flow instead of in durable guest-visible state.

Freezing at that point would capture a lie. The child would resume from guest state that says “I am waiting” while the host-side operation that was supposed to answer it was discarded, duplicated, or delivered to the wrong sandbox.

That is why Kalahari treats zygote creation as a quiescence protocol. Before the parent can become a template, every transient owner has to either commit into snapshot state or disappear from the snapshot boundary.

Quiescence Has to Fail Closed

When Kalahari stops a VM, the VMM drains device work before returning a parked VM state that can later be frozen or resumed. That drain is intentionally bounded. Devices and backends can wake more work while shutdown is in progress, and a broken or hostile guest should not be able to force the host into an unbounded “almost quiet” loop.

So Kalahari distinguishes between “we tried to drain” and “the VM is snapshot-quiescent.” If device work still remains after the final drain budget, the VMM returns a hard error instead of producing a questionable zygote.

The same rule applies above virtio. The IPC transport has to establish that command progress is not stuck in transient host memory. It rejects snapshot quiescence if host-to-guest frames have not reached durable VM state, command channels are still open, process requests are queued outside the snapshot, sessions are still attached to active host observers, paused sessions are waiting on output backpressure, reattachable sessions still have host-buffered output, or host-side control is pending.

Those checks are not defensive decoration. They are the difference between a template that can spawn N children and a template that only usually works.

Backend Close Comes Before Freeze

Kalahari’s VM lifecycle makes the ordering explicit: live backend resources belong only to a running VM, and a zygote is exposed only after backend teardown has completed.

After Kalahari has reached a stopped, saved VM state, the final freeze step consumes a VM with backend resources attached, releases interrupt resources that own backend file descriptors, awaits backend close, drops backend-owned memory mappings, and only then exposes a zygote. In other words, backend close precedes freeze.

That sequencing matters because backend-owned writable mappings and file descriptors are not passive metadata. They are capabilities to mutate or signal the VM. If a zygote were visible before backend close completed, a late backend action could mutate the parent snapshot after children had already been derived from it.

Kalahari avoids that class of bug by making the lifecycle boring and strict:

running VM
stop vCPUs
drain device and IPC work
save VM state
close backend resources
expose zygote template

The important part is not the number of steps. It is that there is no public zygote until the host has given up every live claim that could change the captured state.

User-Visible Processes Cannot Cross the Boundary

The user-facing process lifecycle has a similar rule. A process handle is not just a guest PID. It includes host-side output collection, stdin/control channels, resize state, callbacks, and lifecycle promises.

For a normal running sandbox, that is exactly what users want:

const shell = await sandbox.startShell('cat');
await shell.sendStdin('hello\n');

For a zygote boundary, it is unsafe. If stdout has already been delivered to a callback, that byte stream is no longer wholly inside the VM. If stdin or resize control is queued in a host channel, one child could inherit a command that another child cannot observe the same way.

Kalahari therefore rejects zygote creation while active user-visible process handles exist:

const shell = await sandbox.startShell('cat');

await sandbox.zygote(); // rejects while processes are active

await shell.kill();
await shell.wait(); // rejects because the process was terminated

That check exists at the public Kalahari API layer for normal process handles, and the VM runtime also rejects active PTY sessions. The lower layer matters because compatibility wrappers and direct PTY users can create sessions that are not represented as ordinary high-level process handles.

The container init process is the exception because Kalahari owns it. Before freeze, the VM runtime closes the init process’s visible stdout and stderr streams, then converts init into reattachable runtime state. That conversion succeeds only when no stdout, stderr, exit, or control progress is trapped in transient host channels.

Each spawned child can then attach to that stored init state during its first run. User processes do not get the same treatment because Kalahari cannot turn arbitrary observed host I/O back into durable template state.

Reattachable Is Stronger Than Drained

Snapshotting forces a useful distinction:

  • Drained means the host consumed the state.
  • Reattachable means the state remains durable enough for a future VM run to observe coherently.

For normal command execution, draining output into a callback or a wait() result is useful. For a zygote, it is a problem if that output belonged to a process that still appears live in the template.

The zygote boundary prefers reattachable state over drained state. If a process must survive the boundary, its future observations need to be reconstructible from the captured VM and runtime state. If the only copy of some observation is already sitting in a host callback queue, the boundary has failed.

This is the same idea as device quiescence, applied to process lifecycle. The implementation asks the same question at every layer: after this snapshot, can a child explain every visible event from state it actually inherited?

The Parent Is Consumed

sandbox.zygote() consumes the parent sandbox:

const zygote = await sandbox.zygote();

await sandbox.run('node', {
  args: ['-e', "console.log('still running?')"],
}); // throws: the sandbox has become a zygote

That is not an arbitrary API restriction. Once a sandbox becomes the base of a branching tree, the base must stop being a mutable runtime.

Copy-on-write memory can protect pages, but Kalahari’s observable state is larger than pages. It includes command IDs, process ownership, device queues, timers, backend capabilities, and host-visible lifecycle handles. Allowing the parent to resume would make the template mutable again and would force every layer to answer whether an event belongs to the parent, the template, or one of the children.

Kalahari keeps the ownership model linear:

sandbox -> zygote -> child sandbox(s)

A child can later become a zygote too. That gives nested branching without making any one VM both an immutable template and an actively mutating sandbox.

The Product Feature Is the Boundary

The useful product behavior is straightforward:

await sandbox.writeFile('/workspace/app.js', 'console.log("ready")\n');

const zygote = await sandbox.zygote();
const children = await Promise.all([zygote.spawn(), zygote.spawn(), zygote.spawn()]);

Users should experience that as “prepare once, spawn many.” They should not need to know about virtio drain budgets, backend lifetimes, IPC transport state, or reattachable init state.

But the API can stay that small only because the boundary underneath is strict. Kalahari rejects active user-visible processes. It converts owned init state into reattachable runtime state. The VMM requires device and IPC quiescence before a run returns. Backend resources are closed before a zygote is exposed. On success, the parent is consumed.

That is the real shape of zygote spawning. The speedup is not “copy memory faster.” The speedup is reaching a durable boundary where copying is finally safe.