Isolate the whole fabric.
Not just a CPU.
Agentic and AI work runs across a fabric — CPU, GPU, NPU, TPU, photonic, quantum. Sandbox makes the unit of isolation a heterogeneous unit of work: the accelerator is a capability you grant, a resource you meter in joules, and a line item you attest — a signed per-run energy receipt across the whole fabric, never an opaque passthrough.
The fabric
The accelerator is first-class.
Sandbox enumerates the fabric and makes each device class a subject of the capability model, the receipt, and the attestation: the accelerator is granted explicitly, metered in joules, and named in a signed receipt — not folded into an undifferentiated CPU figure. A host that runs accelerated work but reports one combined CPU number is not conformant.
A unit may dispatch only to the classes it was granted. The host must be able to both isolate a granted class at the tier in force and meter its energy — a class it can run but cannot meter is not grantable. Metering is not optional.
The ladder
Eight tiers. Lowest that fits.
A unit runs at exactly one tier — the lowest whose guarantees satisfy its capability set. A host MUST NOT escalate beyond what the grants require. Each tier strictly contains the guarantees of the tier below it; the guarantees, not the named technology, define the rung.
The reference runs real executors at S2 (wasmtime) and S5 (Firecracker), selects the lower tiers by the same minimization rule, and carries the full S6/S7 confidential and sovereign path. Verifying a TEE report is hardware-independent and complete; generating a genuine vendor-signed report requires the CPU's security processor — surfaced honestly, never faked.
Four guarantees
Isolate. Capability. Meter. Attest.
"Run it safely" and "don't waste energy" are the same optimization. An over-isolated run burns joules it did not need; an unmetered accelerator is exactly where the energy hides. Every run carries four commitments.
Isolate
Run at the lowest rung of the S0–S7 ladder the work requires — and not a rung higher. Escalation is the escape hatch, not the entry point. An over-isolated run burns joules it never needed.
Capability
Every resource a unit touches — network, storage, key-value, and each compute device class — is explicitly granted. Access to anything ungranted is denied. Deny-by-default, enforced through real execution.
Meter
Every run emits a signed energy receipt with joules attributed per device class — measured + estimated, integer microjoules — each reading tagged HwShunt, ModelBased, or Estimator. The accelerator is never hidden.
Attest
A signed statement of what image ran, at what tier, under what grants, on what hardware, for what cost — with a replay-journal root, and at S6/S7 a verified TEE report and an audit-chain link.
Capabilities
Deny by default. Grant on purpose.
The default grant set is empty: no devices beyond the CPU, no storage, no network, no key-value, no host services. Every capability is opt-in, the grant set is content-addressed before the run, and a mutation to it mid-run terminates the run and invalidates the attestation.
The receipt
Joules, resolved to the silicon that spent them.
Every completed run emits a signed energy receipt with one record per device
class the unit actually used — a two-part measured + estimated
figure in integer microjoules, each carrying a provenance tag. The records sum
to the total; a measured total exists only if every used class carries one.
Accelerator energy is never folded into the CPU record. Receipts seal as
JCR-1 envelopes, the
signed-receipt format shared across the family.
Read from a hardware energy interface for that device class — RAPL / powercap for CPU, NVML for NVIDIA, ROCm SMI for AMD, Level Zero for Intel. A measured figure must come from here.
At S5+, where the interface exists, it must be used.
Computed from a calibrated per-operation model for that class. Honest where no per-unit hardware reading is available — the tag makes the limitation explicit.
Not a measurement; an estimate the auditor can discount.
A coarse constant estimator, used only where neither a hardware reading nor a calibrated model exists. Carries no measured figure, ever.
Wide tolerance by construction.
Attestation
Proof of what ran.
The attestation is a signed statement of what a run was — which image, at which tier, under which grants, on which hardware, with which energy receipt. It signs Ed25519 over JCS-canonicalized JSON, the same primitive the ARL Sandbox uses, so the two interoperate. The metered silicon and the named silicon must match; the receipt is bound by content address, not duplicated.
Replay journal
Every nondeterministic input a unit receives — a kv or storage read, the clock, a random draw, an outbound call — is recorded in a hash-chained journal whose root is signed in. Replay is for audit, not re-execution.
S6 — Confidential
The host obtains a TEE report bound to the run, verifies its ECDSA-P384 signature and its measurement, TCB, debug state, and report-data against policy, and seals the evidence. Verification is implemented and vectored.
S7 — Sovereign
The run is bound to a region and jurisdiction and folded into an append-only, hash-chained audit log. Reordering or removing any entry breaks the chain, so the sequence of sovereign runs is itself attestable.
Reference
A lean Rust library. Bring the best, leave the bloat.
The reference is a small set of focused crates a third party implements against — not a platform to adopt. Apache-2.0 for the code, CC-BY-4.0 for the spec text. The wire format and the right to fork are public.
Content addressing (BLAKE3 → b3:…), canonical det-CBOR, the S0–S7 tier ladder, the eight device classes.
The grant set and the deny-by-default Enforcer — a pure, deterministic decision over every access query.
The per-device energy receipt with provenance coupling, sealed as a JCR-1 envelope (det-CBOR + COSE_Sign1, EdDSA) shared across the family.
Ed25519-over-JCS attestation (interoperable with the ARL Sandbox), the deterministic replay journal, the hash-chained sovereign audit log, and a RATS EAT (CWT/COSE_Sign1) view for interop.
The reference host: tier minimization plus honest sealed-run assembly — the metered silicon must equal the named silicon.
Confidential evidence verification: SEV-SNP reports (ECDSA-P384), Intel TDX DCAP quotes (ECDSA-P256), and NVIDIA GPU EAT attestation (ES384) — the accelerator attested, not assumed.
A real S2 executor on wasmtime — runs untrusted WASM under fuel + memory caps, brokers every capability, journals nondeterministic inputs.
A real S5 executor driving Firecracker, metering CPU and an assigned accelerator separately through pluggable power meters.
An MCP server: agents run a unit and get back the attested, energy-metered receipt — a signed joule receipt as the tool result, not just output.
A standard is conformant if it round-trips the public vectors — canonicality, capability enforcement, per-device receipt integrity, attestation, tier minimization, an adversarial negatives pack, and SEV-SNP report verification. See the conformance contract.
The doctrine
Isolation per joule, across the whole fabric.
The durable layer of a sandbox is not fast container boot or a managed control plane — both are commoditized. It is the boundary itself, made honest: the lowest isolation tier that satisfies the work, every resource gated by an explicit capability, and a signed receipt that attributes energy to each device class it touched. The protocol carries no token; a conforming host performs its core function without contacting any single steward's servers.
Transaction Science is one steward — it publishes the spec, ships the reference, and runs the optional services. The protocol is owned by no one.