Content-Addressed Bytecode
Shape’s bytecode format is designed from the ground up for distribution. Every function, every type, every value is content-addressed via SHA-256 — making execution state portable across nodes, program versions, and time.
This chapter covers the architecture, the primitives it unlocks, and how to build distributed systems on top of it using nothing but Shape annotations.
The Core Idea
Section titled “The Core Idea”Traditional VMs use a flat instruction array with absolute offsets. Transfer state to another node and the instruction pointer is meaningless unless both sides have byte-identical programs. Update a single function and every offset shifts.
Shape breaks this by giving every function its own self-contained blob with a content hash as its identity:
// Every function compiles to a FunctionBlob:// - Its own instructions (not shared with other functions)// - Its own constant pool// - Its own string pool// - A list of dependencies (other functions it calls, by hash)// - A SHA-256 content hash of all of the above//// A "program" is just an entry hash + a store of blobs.Two functions with the same bytecode, constants, and dependencies produce the same hash — regardless of which program they appear in, which node compiled them, or when they were compiled.
Two Representations, One Runtime
Section titled “Two Representations, One Runtime”Shape maintains two representations of a program:
| Format | Purpose | IP Model | Used For |
|---|---|---|---|
Program (content-addressed) | Storage, transfer, caching | (FunctionHash, local_ip) | Disk, wire, state snapshots |
LinkedProgram (flat) | Fast execution | Absolute usize | VM dispatch loop |
At load time, a linking pass flattens the content-addressed blobs into a single instruction array with absolute offsets — identical to a traditional VM. The dispatch loop runs at full speed with zero overhead. The blob hashes are preserved alongside each function so that state capture can record content-addressed frames.
Compile → Program (content-addressed blobs) ↓ link() → LinkedProgram (flat, fast) ↓ VM dispatch loop (unchanged performance)FunctionBlob
Section titled “FunctionBlob”A FunctionBlob is a self-contained unit of execution:
// Conceptual structure (actual Rust struct):type FunctionBlob { content_hash: string, // SHA-256 identity
// Metadata name: string, arity: int, param_names: Array<string>, locals_count: int, is_closure: bool, is_async: bool,
// Self-contained bytecode instructions: Array<Instruction>, // THIS function only constants: Array<Constant>, // THIS function only strings: Array<string>, // THIS function only
// Dependencies dependencies: Array<string>, // Content hashes of called functions foreign_dependencies: Array<string>, // Content hashes of foreign (polyglot) functions type_schemas: Array<TypeSchema>, // Types this function constructs}Key properties:
- Self-contained: no shared pools. A blob carries everything it needs.
- Content-addressed: the hash is derived from the serialized blob contents, including both Shape dependencies and foreign function dependencies. Same function → same hash, always.
- Cross-language identity: if a function calls polyglot code (e.g.,
fn python ...), the content hashes of those foreign functions are recorded inforeign_dependenciesand included in the blob hash. Two Shape functions with identical bytecode but different foreign implementations produce different content hashes. - Independently transferable: send just the blobs you need, not the whole program.
- Cacheable forever: a hash uniquely identifies a blob. Cache globally, permanently.
Content-Addressed Types
Section titled “Content-Addressed Types”Every TypeSchema also has a content hash derived from its structural
definition — the type name, sorted field names and types, and enum variants:
type Trade { symbol: string, price: number, volume: int,}// → SHA-256("Trade" + sorted [("price", "number"), ("symbol", "string"), ("volume", "int")])Two types with the same name and fields produce the same hash. This means:
- Type identity is structural, not nominal
- Remote nodes can verify type compatibility by comparing hashes
- Type schemas serve as a content-addressed IDL (interface definition language)
How the Linking Pass Works
Section titled “How the Linking Pass Works”The linker takes a content-addressed Program and produces a flat
LinkedProgram:
- Topological sort: order blobs by their dependency graph
- Flatten: concatenate all instruction arrays into one
- Remap constants: merge per-blob constant pools, adjust
Constoperands - Remap strings: merge per-blob string pools, adjust
Propertyoperands - Resolve functions: replace hash-based function references with flat indices
After linking, the VM runs the exact same dispatch loop as always. Absolute IP, flat instruction array, global constant pool. No performance regression.
Portable Execution State
Section titled “Portable Execution State”With content-addressed functions, the call stack becomes a chain of
(function_hash, local_ip) pairs instead of absolute instruction pointers.
This state is meaningful on any node that has the referenced function blobs.
State Capture
Section titled “State Capture”from std::core::state use { capture, capture_all }
// Capture current function framelet frame = state::capture()// frame.function.hash is the content hash of the current function
// Capture full VM state (all frames)let vm = state::capture_all()// vm.frames is an array of (function_hash, local_ip, locals) tuplesState Resume
Section titled “State Resume”// Resume from captured state — execution continues from the captured pointstate::resume(vm) // does not returnCross-Node State Transfer
Section titled “Cross-Node State Transfer”Because state is expressed in terms of content hashes rather than absolute offsets, it can be transferred to any node:
Node A captures state: frames = [ { fn: "a1b2c3...", local_ip: 42, locals: [v1, v2] }, { fn: "d4e5f6...", local_ip: 17, locals: [v3] }, ]
Node A sends state to Node B (just hashes + small local_ip values).
Node B resolves function hashes: - "a1b2c3..." → found in local cache ✓ - "d4e5f6..." → not found → fetches blob from A
Node B reconstructs VM and resumes from exact point.The key insight: Node B doesn’t need Node A’s “program”. It just needs the function blobs — which could come from A, from C, from a global cache, from anywhere.
Content Hashing
Section titled “Content Hashing”Shape provides built-in content hashing for any value:
from std::core::state use { capture, capture_all, hash, fn_hash, schema_hash, serialize, deserialize, diff, patch, resume, caller, args, locals }
// Hash any valuelet h = state::hash(42) // SHA-256 of the numberlet h2 = state::hash("hello") // SHA-256 of the stringlet h3 = state::hash(my_object) // SHA-256 of type hash + field hashes
// Hash a function (returns its FunctionBlob content hash)let fh = state::fn_hash(my_function)
// Hash a type schemalet th = state::schema_hash("Trade")Content hashing is structural: for objects, the hash is derived from the type schema hash plus the recursive hashes of each field value. For arrays, each element is hashed. This creates a hash tree — the foundation for efficient diffing.
State Diffing
Section titled “State Diffing”Given two values (or two states), state::diff computes a minimal delta by
walking their hash trees and only descending into branches that differ:
from std::core::state use { capture, capture_all, hash, fn_hash, schema_hash, serialize, deserialize, diff, patch, resume, caller, args, locals }
type Portfolio { name: string, positions: Array<Position>, cash: number,}
let before = portfolio// ... mutations happen ...let after = portfolio
let delta = state::diff(before, after)// delta.changed: Map of field name → new value (only changed fields)// delta.removed: Array of removed keys
// Apply delta to reconstructlet reconstructed = state::patch(before, delta)For large objects where only a few fields changed, the delta is tiny. This is the foundation for efficient state synchronization — transfer only what changed.
Serialization
Section titled “Serialization”Shape uses MessagePack for wire serialization:
from std::core::state use { capture, capture_all, hash, fn_hash, schema_hash, serialize, deserialize, diff, patch, resume, caller, args, locals }
let bytes = state::serialize(my_value) // Value → Array<int> (MessagePack)let value = state::deserialize(bytes) // Array<int> → ValueCombined with content hashing, this enables hash-addressed storage:
let key = state::hash(my_value)store.put(key, state::serialize(my_value))
// Later, on any node:let bytes = store.get(key)let value = state::deserialize(bytes)__original__ in replace body
Section titled “__original__ in replace body”When an annotation uses replace body, the compiler creates a shadow
function containing the original body. This shadow function:
- Has its own
FunctionBlobwith its own content hash - Is a normal function in the function store — callable, transferable
- Is accessible as
__original__in the replacement body
annotation remote(transport) { targets: [function]
comptime post(target, ctx) { replace body { // __original__ references the shadow function (original body) if should_run_locally() { return __original__(args) } else { let payload = state::capture_call(__original__, args) return transport::call(state::serialize(payload)) } } }}
@remote(my_transport)fn train(data: Array<Sample>, epochs: int) -> Model { // This body becomes __original__ // The replacement wraps it with remote dispatch let model = Model.new() for epoch in 0..epochs { model.fit(data) } model}This is true aspect-oriented programming: the original function is wrapped, not discarded. The shadow function can be transferred to a remote node (just send its blob) and executed there.
Building Distributed Systems
Section titled “Building Distributed Systems”The combination of content-addressed functions, portable state, and
__original__ means you can build sophisticated distributed systems with just
annotations.
FaaS in 15 Lines
Section titled “FaaS in 15 Lines”annotation faas(cluster) { targets: [function] comptime pre(target, ctx) { for p in target.params { if !serializable(p.type) { error(f"@faas: '{p.name}' not serializable") } } } comptime post(target, ctx) { replace body { let node = cluster.schedule() let payload = state::capture_call(__original__, args) state::deserialize(cluster.transport::call(node, state::serialize(payload))) } }}
@faas(my_cluster)fn train(data: Array<Sample>) -> Model { // Automatically dispatched to a cluster node heavy_computation(data)}Content-Addressed Memoization in 10 Lines
Section titled “Content-Addressed Memoization in 10 Lines”annotation memoized(store) { targets: [function] comptime post(target, ctx) { replace body { let key = state::hash([state::fn_hash(__original__), ...args]) match store.get(key) { Some(cached) => state::deserialize(cached), None => { let result = __original__(args) store.put(key, state::serialize(result)) result } } } }}The cache key is derived from the function’s content hash plus the argument hashes. Same function + same args = same key, forever, across any node.
Distributed State Sync in 12 Lines
Section titled “Distributed State Sync in 12 Lines”annotation synced(peers) { targets: [function] comptime post(target, ctx) { replace body { let before = state::capture_module() let result = __original__(args) let after = state::capture_module() let delta = state::diff(before, after) if delta.changed.len() > 0 { for peer in peers { peer.send(state::serialize(delta)) } } result } }}After any annotated function modifies module state, only the changed fields are sent to peers.
Live Migration in 20 Lines
Section titled “Live Migration in 20 Lines”from std::core::state use { capture, capture_all, hash, fn_hash, schema_hash, serialize, deserialize, diff, patch, resume, caller, args, locals }from std::core::transport use { tcp, send, connect, connection_send, connection_recv, connection_close }
annotation migratable(scheduler) { targets: [function] comptime post(target, ctx) { replace body { scheduler.register(state::fn_hash(__original__)) let result = __original__(args) scheduler.unregister(state::fn_hash(__original__)) result } }}
// The scheduler, running in a separate coroutine:fn migration_loop(scheduler) { let tcp = transport::tcp() loop { let target = scheduler.check_migration_needed() if target != None { let vm = state::capture_all() let node = scheduler.pick_node(target) transport::send(tcp, node, state::serialize(vm)) } yield }}How Distribution Works (Step by Step)
Section titled “How Distribution Works (Step by Step)”When Node A wants to call function F on Node B:
- A has F’s
FunctionHash(from compilation) - A builds a
CallPayload:{ fn: hash_F, args: [hash_v1, hash_v2] } - A sends payload to B (tiny — just hashes)
- B checks: do I have
hash_F?- Yes → proceed to step 5
- No → A sends the
FunctionBlobfor F. B checks F’s dependencies recursively, fetching any missing blobs.
- B checks argument value hashes — fetches any it doesn’t have
- B executes
F(v1, v2), returns result hash - A fetches result value if not already cached
First call: transfers function blobs + argument values (one-time cost). Same function, new args: transfers only new argument values. Same function, same args: transfers nothing — result already cached.
Blob Negotiation on Persistent Connections
Section titled “Blob Negotiation on Persistent Connections”On a persistent connection, the wire protocol optimizes step 4 via blob negotiation. Before sending a call, the caller offers the content hashes of all function blobs it would include. The remote checks its blob cache and replies with which hashes it already has. The caller then strips known blobs from the request:
Caller Remote │── BlobNegotiation({offered}) ───────>│ check cache │<── BlobNegotiationReply({known}) ────│ │ strip known blobs │ │── Call(only missing blobs) ─────────>│ cache + execute │<── CallResponse ────────────────────│This uses simple hash lists (not bloom filters) because typical calls reference 10-200 blobs. Each remote connection maintains an LRU blob cache (default 4096 entries) — no invalidation needed because content hashes make stale entries harmless. See Wire Protocol & Optimization for full protocol details.
Closure Transfer
Section titled “Closure Transfer”Closures follow the same minimal-blob protocol. When a closure is dispatched remotely, the call request includes only the blobs that the closure and its captured function need — not the entire program. The system looks up the closure’s function by name, builds the minimal blob set from its dependency graph, and sends a stub program with just metadata. If the remote node already has the blobs cached (common for repeated calls), the transfer is just a hash check.
How Hot-Patching Works
Section titled “How Hot-Patching Works”When a function F is updated to a new version F’:
- In-flight calls still reference
hash_old— valid, untouched - New calls resolve F’s name to
hash_new hash_oldstays in the function store until no frames reference it- Smooth transition: no downtime, no IP corruption, no coordination
Both versions coexist in the function store. The content-addressed model makes this natural — there’s no “overwriting”, just adding a new blob with a new hash.
Scoped JIT
Section titled “Scoped JIT”With per-function blobs, JIT compilation is naturally scoped:
- Each function is independently assessed for JIT compatibility
- JIT-compatible functions get native code; others stay interpreted
- The function table is mixed: JIT pointers and VM markers coexist
- JIT output is cached by
FunctionHash— compile once, reuse forever
Function Table (after linking + selective JIT): [0] train_model → JIT native ptr (numeric, hot, JIT-compatible) [1] parse_config → VM interpreter (complex object ops) [2] compute_signal → JIT native ptr (inner loop, numeric) [3] format_output → VM interpreter (strings, objects)Same blob hash → same JIT output. If the same utility function appears in 10 different programs, it’s JIT-compiled once and reused everywhere.
Transport Interface
Section titled “Transport Interface”Shape provides a transport abstraction for inter-node communication:
from std::core::transport use { tcp, send, connect, connection_send, connection_recv, connection_close }
// Create a TCP transportlet tcp = transport::tcp()
// One-shot: send payload and wait for responselet response = transport::send(tcp, "10.0.0.5:9000", payload)?
// Persistent connectionlet conn = transport::connect(tcp, "10.0.0.5:9000")?transport::connection_send(conn, data)?let received = transport::connection_recv(conn, 5000)? // 5s timeouttransport::connection_close(conn)?The transport layer uses length-prefixed framing with transparent zstd compression. Function blobs (highly regular bytecode) typically compress 3-8x, significantly reducing transfer cost for first-time blob delivery. See Wire Protocol & Optimization for compression, blob negotiation, and sidecar splitting details.
Introspection
Section titled “Introspection”Shape provides runtime introspection into the current execution state:
from std::core::state use { capture, capture_all, hash, fn_hash, schema_hash, serialize, deserialize, diff, patch, resume, caller, args, locals }
fn my_function(x: int, y: string) { // Who called me? let c = state::caller() // FunctionRef? (None if top-level)
// What are my arguments? let a = state::args() // [x, y] as Array<Any>
// What are my local variables? let l = state::locals() // Map<string, Any> of current scope}Architecture Summary
Section titled “Architecture Summary”┌─────────────────────────────────────────────────┐│ Types: SHA-256(name + sorted fields) │ Global identity│ Functions: SHA-256(bytecode + deps + types) │ Portable, cacheable│ Values: SHA-256(type_hash + field_data) │ Deduplicable│ Frames: (function_hash, local_ip, locals) │ Portable│ State: ordered list of frames │ Resumable anywhere└─────────────────────────────────────────────────┘
Transfer = exchanging hashes + lazily fetching what you don't haveCache = hash → blob (trivial, global, permanent)Equality = compare hashes (O(1))Diffing = walk hash trees, only descend into branches that differSee Also
Section titled “See Also”- Standard Library: State — full API reference for
std::core::state - Wire Protocol & Optimization — compression, blob negotiation, binary serialization, sidecar splitting
- Security & Permissions — three-tier security model (compile-time, runtime, sandboxing)
- JIT Compilation — scoped JIT, tiered compilation, cross-function optimization
- Module Distribution & Signatures — manifests, blob stores, Ed25519 signatures
- Transport Layer — TCP, QUIC, and memoized transports
- Developer Tools — hot-reload, time-travel debugging, prefetch, code search
- Resumability & Distributed Computing — snapshot/resume basics
- Comptime System —
replace body, annotations,__original__ - Comptime Annotations Cookbook — practical annotation patterns