Content-Addressed Bytecode

Shape’s bytecode format is designed from the ground up for distribution. Every function, every type, every value is content-addressed via SHA-256 — making execution state portable across nodes, program versions, and time.

This chapter covers the architecture, the primitives it unlocks, and how to build distributed systems on top of it using nothing but Shape annotations.

The Core Idea

Traditional VMs use a flat instruction array with absolute offsets. Transfer state to another node and the instruction pointer is meaningless unless both sides have byte-identical programs. Update a single function and every offset shifts.

Shape breaks this by giving every function its own self-contained blob with a content hash as its identity:

// Every function compiles to a FunctionBlob:
//   - Its own instructions (not shared with other functions)
//   - Its own constant pool
//   - Its own string pool
//   - A list of dependencies (other functions it calls, by hash)
//   - A SHA-256 content hash of all of the above
//
// A "program" is just an entry hash + a store of blobs.

Two functions with the same bytecode, constants, and dependencies produce the same hash — regardless of which program they appear in, which node compiled them, or when they were compiled.

Two Representations, One Runtime

Shape maintains two representations of a program:

Format	Purpose	IP Model	Used For
`Program` (content-addressed)	Storage, transfer, caching	`(FunctionHash, local_ip)`	Disk, wire, state snapshots
`LinkedProgram` (flat)	Fast execution	Absolute `usize`	VM dispatch loop

At load time, a linking pass flattens the content-addressed blobs into a single instruction array with absolute offsets — identical to a traditional VM. The dispatch loop runs at full speed with zero overhead. The blob hashes are preserved alongside each function so that state capture can record content-addressed frames.

Compile → Program (content-addressed blobs)
                ↓
        link() → LinkedProgram (flat, fast)
                ↓
        VM dispatch loop (unchanged performance)

FunctionBlob

A FunctionBlob is a self-contained unit of execution:

// Conceptual structure (actual Rust struct):
type FunctionBlob {
    content_hash: string,        // SHA-256 identity

    // Metadata
    name: string,
    arity: int,
    param_names: Array<string>,
    locals_count: int,
    is_closure: bool,
    is_async: bool,

    // Self-contained bytecode
    instructions: Array<Instruction>,  // THIS function only
    constants: Array<Constant>,        // THIS function only
    strings: Array<string>,            // THIS function only

    // Dependencies
    dependencies: Array<string>,       // Content hashes of called functions
    foreign_dependencies: Array<string>, // Content hashes of foreign (polyglot) functions
    type_schemas: Array<TypeSchema>,   // Types this function constructs
}

Key properties:

Self-contained: no shared pools. A blob carries everything it needs.
Content-addressed: the hash is derived from the serialized blob contents, including both Shape dependencies and foreign function dependencies. Same function → same hash, always.
Cross-language identity: if a function calls polyglot code (e.g., fn python ...), the content hashes of those foreign functions are recorded in foreign_dependencies and included in the blob hash. Two Shape functions with identical bytecode but different foreign implementations produce different content hashes.
Independently transferable: send just the blobs you need, not the whole program.
Cacheable forever: a hash uniquely identifies a blob. Cache globally, permanently.

Content-Addressed Types

Every TypeSchema also has a content hash derived from its structural definition — the type name, sorted field names and types, and enum variants:

type Trade {
    symbol: string,
    price: number,
    volume: int,
}
// → SHA-256("Trade" + sorted [("price", "number"), ("symbol", "string"), ("volume", "int")])

Two types with the same name and fields produce the same hash. This means:

Type identity is structural, not nominal
Remote nodes can verify type compatibility by comparing hashes
Type schemas serve as a content-addressed IDL (interface definition language)

How the Linking Pass Works

The linker takes a content-addressed Program and produces a flat LinkedProgram:

Topological sort: order blobs by their dependency graph
Flatten: concatenate all instruction arrays into one
Remap constants: merge per-blob constant pools, adjust Const operands
Remap strings: merge per-blob string pools, adjust Property operands
Resolve functions: replace hash-based function references with flat indices

After linking, the VM runs the exact same dispatch loop as always. Absolute IP, flat instruction array, global constant pool. No performance regression.

Portable Execution State

With content-addressed functions, the call stack becomes a chain of (function_hash, local_ip) pairs instead of absolute instruction pointers. This state is meaningful on any node that has the referenced function blobs.

State Capture

from std::core::state use { capture, capture_all }

// Capture current function frame
let frame = state::capture()
// frame.function.hash is the content hash of the current function

// Capture full VM state (all frames)
let vm = state::capture_all()
// vm.frames is an array of (function_hash, local_ip, locals) tuples

State Resume

// Resume from captured state — execution continues from the captured point
state::resume(vm)  // does not return

Cross-Node State Transfer

Because state is expressed in terms of content hashes rather than absolute offsets, it can be transferred to any node:

Node A captures state:
  frames = [
    { fn: "a1b2c3...", local_ip: 42, locals: [v1, v2] },
    { fn: "d4e5f6...", local_ip: 17, locals: [v3] },
  ]

Node A sends state to Node B (just hashes + small local_ip values).

Node B resolves function hashes:
  - "a1b2c3..." → found in local cache ✓
  - "d4e5f6..." → not found → fetches blob from A

Node B reconstructs VM and resumes from exact point.

The key insight: Node B doesn’t need Node A’s “program”. It just needs the function blobs — which could come from A, from C, from a global cache, from anywhere.

Content Hashing

Shape provides built-in content hashing for any value:

from std::core::state use { capture, capture_all, hash, fn_hash, schema_hash, serialize, deserialize, diff, patch, resume, caller, args, locals }

// Hash any value
let h = state::hash(42)              // SHA-256 of the number
let h2 = state::hash("hello")        // SHA-256 of the string
let h3 = state::hash(my_object)      // SHA-256 of type hash + field hashes

// Hash a function (returns its FunctionBlob content hash)
let fh = state::fn_hash(my_function)

// Hash a type schema
let th = state::schema_hash("Trade")

Content hashing is structural: for objects, the hash is derived from the type schema hash plus the recursive hashes of each field value. For arrays, each element is hashed. This creates a hash tree — the foundation for efficient diffing.

State Diffing

Given two values (or two states), state::diff computes a minimal delta by walking their hash trees and only descending into branches that differ:

from std::core::state use { capture, capture_all, hash, fn_hash, schema_hash, serialize, deserialize, diff, patch, resume, caller, args, locals }

type Portfolio {
    name: string,
    positions: Array<Position>,
    cash: number,
}

let before = portfolio
// ... mutations happen ...
let after = portfolio

let delta = state::diff(before, after)
// delta.changed: Map of field name → new value (only changed fields)
// delta.removed: Array of removed keys

// Apply delta to reconstruct
let reconstructed = state::patch(before, delta)

For large objects where only a few fields changed, the delta is tiny. This is the foundation for efficient state synchronization — transfer only what changed.

Serialization

Shape uses MessagePack for wire serialization:

from std::core::state use { capture, capture_all, hash, fn_hash, schema_hash, serialize, deserialize, diff, patch, resume, caller, args, locals }

let bytes = state::serialize(my_value)    // Value → Array<int> (MessagePack)
let value = state::deserialize(bytes)     // Array<int> → Value

Combined with content hashing, this enables hash-addressed storage:

let key = state::hash(my_value)
store.put(key, state::serialize(my_value))

// Later, on any node:
let bytes = store.get(key)
let value = state::deserialize(bytes)

`original` in `replace body`

When an annotation uses replace body, the compiler creates a shadow function containing the original body. This shadow function:

Has its own FunctionBlob with its own content hash
Is a normal function in the function store — callable, transferable
Is accessible as __original__ in the replacement body

annotation remote(transport) {
    targets: [function]

    comptime post(target, ctx) {
        replace body {
            // __original__ references the shadow function (original body)
            if should_run_locally() {
                return __original__(args)
            } else {
                let payload = state::capture_call(__original__, args)
                return transport::call(state::serialize(payload))
            }
        }
    }
}

@remote(my_transport)
fn train(data: Array<Sample>, epochs: int) -> Model {
    // This body becomes __original__
    // The replacement wraps it with remote dispatch
    let model = Model.new()
    for epoch in 0..epochs { model.fit(data) }
    model
}

This is true aspect-oriented programming: the original function is wrapped, not discarded. The shadow function can be transferred to a remote node (just send its blob) and executed there.

Building Distributed Systems

The combination of content-addressed functions, portable state, and __original__ means you can build sophisticated distributed systems with just annotations.

FaaS in 15 Lines

annotation faas(cluster) {
    targets: [function]
    comptime pre(target, ctx) {
        for p in target.params {
            if !serializable(p.type) { error(f"@faas: '{p.name}' not serializable") }
        }
    }
    comptime post(target, ctx) {
        replace body {
            let node = cluster.schedule()
            let payload = state::capture_call(__original__, args)
            state::deserialize(cluster.transport::call(node, state::serialize(payload)))
        }
    }
}

@faas(my_cluster)
fn train(data: Array<Sample>) -> Model {
    // Automatically dispatched to a cluster node
    heavy_computation(data)
}

Content-Addressed Memoization in 10 Lines

annotation memoized(store) {
    targets: [function]
    comptime post(target, ctx) {
        replace body {
            let key = state::hash([state::fn_hash(__original__), ...args])
            match store.get(key) {
                Some(cached) => state::deserialize(cached),
                None => {
                    let result = __original__(args)
                    store.put(key, state::serialize(result))
                    result
                }
            }
        }
    }
}

The cache key is derived from the function’s content hash plus the argument hashes. Same function + same args = same key, forever, across any node.

Distributed State Sync in 12 Lines

annotation synced(peers) {
    targets: [function]
    comptime post(target, ctx) {
        replace body {
            let before = state::capture_module()
            let result = __original__(args)
            let after = state::capture_module()
            let delta = state::diff(before, after)
            if delta.changed.len() > 0 {
                for peer in peers {
                    peer.send(state::serialize(delta))
                }
            }
            result
        }
    }
}

After any annotated function modifies module state, only the changed fields are sent to peers.

Live Migration in 20 Lines

from std::core::state use { capture, capture_all, hash, fn_hash, schema_hash, serialize, deserialize, diff, patch, resume, caller, args, locals }
from std::core::transport use { tcp, send, connect, connection_send, connection_recv, connection_close }

annotation migratable(scheduler) {
    targets: [function]
    comptime post(target, ctx) {
        replace body {
            scheduler.register(state::fn_hash(__original__))
            let result = __original__(args)
            scheduler.unregister(state::fn_hash(__original__))
            result
        }
    }
}

// The scheduler, running in a separate coroutine:
fn migration_loop(scheduler) {
    let tcp = transport::tcp()
    loop {
        let target = scheduler.check_migration_needed()
        if target != None {
            let vm = state::capture_all()
            let node = scheduler.pick_node(target)
            transport::send(tcp, node, state::serialize(vm))
        }
        yield
    }
}

How Distribution Works (Step by Step)

When Node A wants to call function F on Node B:

A has F’s FunctionHash (from compilation)
A builds a CallPayload: { fn: hash_F, args: [hash_v1, hash_v2] }
A sends payload to B (tiny — just hashes)
B checks: do I have hash_F?
- Yes → proceed to step 5
- No → A sends the FunctionBlob for F. B checks F’s dependencies recursively, fetching any missing blobs.
B checks argument value hashes — fetches any it doesn’t have
B executes F(v1, v2), returns result hash
A fetches result value if not already cached

First call: transfers function blobs + argument values (one-time cost). Same function, new args: transfers only new argument values. Same function, same args: transfers nothing — result already cached.

Blob Negotiation on Persistent Connections

On a persistent connection, the wire protocol optimizes step 4 via blob negotiation. Before sending a call, the caller offers the content hashes of all function blobs it would include. The remote checks its blob cache and replies with which hashes it already has. The caller then strips known blobs from the request:

Caller                                   Remote
  │── BlobNegotiation({offered}) ───────>│  check cache
  │<── BlobNegotiationReply({known}) ────│
  │  strip known blobs                   │
  │── Call(only missing blobs) ─────────>│  cache + execute
  │<── CallResponse ────────────────────│

This uses simple hash lists (not bloom filters) because typical calls reference 10-200 blobs. Each remote connection maintains an LRU blob cache (default 4096 entries) — no invalidation needed because content hashes make stale entries harmless. See Wire Protocol & Optimization for full protocol details.

Closure Transfer

Closures follow the same minimal-blob protocol. When a closure is dispatched remotely, the call request includes only the blobs that the closure and its captured function need — not the entire program. The system looks up the closure’s function by name, builds the minimal blob set from its dependency graph, and sends a stub program with just metadata. If the remote node already has the blobs cached (common for repeated calls), the transfer is just a hash check.

How Hot-Patching Works

When a function F is updated to a new version F’:

In-flight calls still reference hash_old — valid, untouched
New calls resolve F’s name to hash_new
hash_old stays in the function store until no frames reference it
Smooth transition: no downtime, no IP corruption, no coordination

Both versions coexist in the function store. The content-addressed model makes this natural — there’s no “overwriting”, just adding a new blob with a new hash.

Scoped JIT

With per-function blobs, JIT compilation is naturally scoped:

Each function is independently assessed for JIT compatibility
JIT-compatible functions get native code; others stay interpreted
The function table is mixed: JIT pointers and VM markers coexist
JIT output is cached by FunctionHash — compile once, reuse forever

Function Table (after linking + selective JIT):
  [0] train_model    → JIT native ptr   (numeric, hot, JIT-compatible)
  [1] parse_config   → VM interpreter   (complex object ops)
  [2] compute_signal → JIT native ptr   (inner loop, numeric)
  [3] format_output  → VM interpreter   (strings, objects)

Same blob hash → same JIT output. If the same utility function appears in 10 different programs, it’s JIT-compiled once and reused everywhere.

Transport Interface

Shape provides a transport abstraction for inter-node communication:

from std::core::transport use { tcp, send, connect, connection_send, connection_recv, connection_close }

// Create a TCP transport
let tcp = transport::tcp()

// One-shot: send payload and wait for response
let response = transport::send(tcp, "10.0.0.5:9000", payload)?

// Persistent connection
let conn = transport::connect(tcp, "10.0.0.5:9000")?
transport::connection_send(conn, data)?
let received = transport::connection_recv(conn, 5000)?  // 5s timeout
transport::connection_close(conn)?

The transport layer uses length-prefixed framing with transparent zstd compression. Function blobs (highly regular bytecode) typically compress 3-8x, significantly reducing transfer cost for first-time blob delivery. See Wire Protocol & Optimization for compression, blob negotiation, and sidecar splitting details.

Introspection

Shape provides runtime introspection into the current execution state:

from std::core::state use { capture, capture_all, hash, fn_hash, schema_hash, serialize, deserialize, diff, patch, resume, caller, args, locals }

fn my_function(x: int, y: string) {
    // Who called me?
    let c = state::caller()  // FunctionRef? (None if top-level)

    // What are my arguments?
    let a = state::args()    // [x, y] as Array<Any>

    // What are my local variables?
    let l = state::locals()  // Map<string, Any> of current scope
}

Architecture Summary

┌─────────────────────────────────────────────────┐
│  Types:     SHA-256(name + sorted fields)        │  Global identity
│  Functions: SHA-256(bytecode + deps + types)     │  Portable, cacheable
│  Values:    SHA-256(type_hash + field_data)       │  Deduplicable
│  Frames:    (function_hash, local_ip, locals)    │  Portable
│  State:     ordered list of frames               │  Resumable anywhere
└─────────────────────────────────────────────────┘

Transfer = exchanging hashes + lazily fetching what you don't have
Cache    = hash → blob (trivial, global, permanent)
Equality = compare hashes (O(1))
Diffing  = walk hash trees, only descend into branches that differ