Wire Protocol & Optimization

Shape’s wire protocol governs how data moves between nodes during distributed execution. This chapter covers the four optimization layers that minimize bandwidth and latency: transparent compression, blob negotiation, efficient binary serialization for collections, and sidecar splitting for large payloads.

Wire Path Overview

Every remote call follows this path:

Shape code
  → state::capture_call(f, args)     # build CallPayload
  → state::serialize(payload)        # MessagePack encode
  → encode_framed(bytes)            # zstd compress (if beneficial)
  → write_length_prefixed(framed)   # TCP/QUIC transport
  → network
  → read_length_prefixed(framed)    # receive
  → decode_framed(bytes)            # decompress (if compressed)
  → state::deserialize(bytes)        # MessagePack decode
  → execute

The optimizations described here operate at three different layers of this pipeline: framing (compression), protocol (blob negotiation), serialization (binary collections), and transport (sidecar splitting).

Transparent Compression

All wire frames are transparently compressed using zstd. This happens automatically — no Shape code changes required.

Frame Format

[4-byte big-endian payload length] [flags: u8] [body...]

The flags byte is the first byte of the payload:

Bit	Meaning
`0x01`	Body is zstd-compressed
`0x00`	Body is uncompressed

Compression Policy

Not all payloads benefit from compression. The framing layer applies a simple policy:

Small payloads (< 256 bytes): stored uncompressed. The overhead of compression headers would negate any savings.
Large payloads (>= 256 bytes): compressed with zstd at level 3. If the compressed form is not smaller than the original, the uncompressed form is used instead.

This means compression is always beneficial — it never makes payloads larger.

Decompression Safety

A MAX_DECOMPRESSED_SIZE limit of 256 MB prevents decompression bombs. If a compressed frame claims to decompress to more than 256 MB, the transport returns a TransportError.

Compression Ratios

Typical compression ratios for Shape wire payloads:

Payload Type	Typical Ratio	Notes
MessagePack (structured data)	2-4x	Good — repeated field names, type tags
Raw numeric arrays	1.5-3x	Moderate — floating-point data has some redundancy
Function blobs (bytecode)	3-8x	Excellent — instruction patterns are highly regular
Already-compressed data	~1x	Falls back to uncompressed (no penalty)

Example

Compression is fully transparent to Shape code:

from std::core::state use { serialize, deserialize }
from std::core::transport use { tcp, send }

let tcp = transport::tcp()

// This payload is automatically compressed on the wire:
let bytes = state::serialize(large_dataset)
let response = transport::send(tcp, "10.0.0.5:9000", bytes)?

// And automatically decompressed when received:
let result = state::deserialize(response)

Blob Negotiation

When using persistent connections for repeated remote calls, blob negotiation avoids sending function blobs the remote node already has.

The Problem

Every remote call includes the function’s FunctionBlob (bytecode, constants, string pool) plus the blobs of all transitive dependencies. For a function with 10 dependencies, the first call might transfer 50 KB of blobs. Without negotiation, every subsequent call to the same function transfers those same 50 KB — even though the remote already has them cached.

The Protocol

Blob negotiation uses a simple hash-list exchange before the call:

Caller                                   Remote
  │                                        │
  │── BlobNegotiation({offered_hashes}) ──>│  check cache
  │<── BlobNegotiationReply({known}) ──────│
  │  strip known blobs from request        │
  │── Call(only missing blobs) ───────────>│  merge into cache + execute
  │<── CallResponse ──────────────────────│
  │                                        │

Caller sends the content hashes of all blobs it would include in the call.
Remote checks its blob cache, responds with which hashes it already has.
Caller strips known blobs from the request, sending only missing ones.
Remote merges new blobs into its cache and executes the call.

Wire Messages

All wire communication is wrapped in a WireMessage envelope:

pub enum WireMessage {
    // Distributed execution (V1)
    BlobNegotiation(BlobNegotiationRequest),
    BlobNegotiationReply(BlobNegotiationResponse),
    Call(RemoteCallRequest),
    CallResponse(RemoteCallResponse),
    Sidecar(BlobSidecar),

    // Execution server (V2)
    Execute(ExecuteRequest),
    ExecuteResponse(ExecuteResponse),
    Validate(ValidateRequest),
    ValidateResponse(ValidateResponse),
    Auth(AuthRequest),
    AuthResponse(AuthResponse),
    ExecuteFile(ExecuteFileRequest),
    ExecuteProject(ExecuteProjectRequest),
    ValidatePath(ValidatePathRequest),
    Ping(PingRequest),
    Pong(ServerInfo),
}

The V2 variants power the shape serve command, which provides in-process code execution over the same wire protocol. See that page for details on the execution server API.

The negotiation request and response carry hash lists:

pub struct BlobNegotiationRequest {
    pub offered_hashes: Vec<FunctionHash>,
}

pub struct BlobNegotiationResponse {
    pub known_hashes: Vec<FunctionHash>,
}

Remote Blob Cache

Each persistent connection maintains a RemoteBlobCache — an LRU cache of function blobs indexed by content hash:

Default capacity: 4096 entries
Eviction: least-recently-used when at capacity
No invalidation needed: content hashes make stale entries harmless — if a blob’s content changes, its hash changes, so the old entry simply becomes unreachable

When Negotiation Applies

Scenario	Negotiation?	Reason
Persistent connection, repeated calls	Yes	Cache accumulates known blobs
Persistent connection, first call	Skipped	Empty cache, nothing to negotiate
One-shot `transport::send(...)`	No	No persistent connection = no cache

Savings

For a typical workflow where the same function is called repeatedly with different arguments:

First call: transfers all blobs (no savings)
Second call: negotiation strips all blobs (saves ~50 KB for a 10-dependency function)
Nth call: same as second (the savings compound with frequency)

For long-running connections calling the same function thousands of times, blob negotiation reduces per-call overhead to just the argument payload.

Binary Collection Serialization

Shape’s wire format uses MessagePack for general values, but collection types (typed arrays, matrices, hashmaps) use optimized binary serialization that avoids per-element overhead.

The Problem

Consider a Vec<number> with 1 million elements. Naively, each element is serialized as an individual MessagePack Number value — adding type tags and encoding overhead per element. A million f64 values that should be 8 MB of raw data becomes ~16 MB of MessagePack.

Typed Array Serialization

All 11 typed array kinds are serialized as raw byte buffers:

Element Kind	Byte Size	Shape Type
`F64`	8	`Vec<number>`
`I64`	8	`Vec<int>`
`F32`	4	`Vec<f32>`
`I32`	4	`Vec<i32>`
`I16`	2	`Vec<i16>`
`I8`	1	`Vec<i8>`
`U64`	8	`Vec<u64>`
`U32`	4	`Vec<u32>`
`U16`	2	`Vec<u16>`
`U8`	1	`Vec<u8>`
`Bool`	1	`Vec<bool>`

The serialized form is:

TypedArray {
    element_kind: TypedArrayElementKind,  // which type
    blob: BlobRef,                        // content-addressed raw bytes
    len: usize,                           // element count
}

The raw bytes are stored via the content-addressed blob store, chunked into 256 KB pieces for deduplication. For a 1M-element Vec<number>:

Before: 1M individual MessagePack Number values (~16 MB)
After: 8 MB raw bytes, content-addressed and chunked (~8 MB, further reduced by zstd compression)

Matrix Serialization

Matrices (Mat<number>) are serialized as raw f64 bytes with dimension metadata:

Matrix {
    blob: BlobRef,   // raw f64 bytes, row-major order
    rows: u32,
    cols: u32,
}

Previously, matrices could not be serialized at all — attempting to serialize a matrix would produce an error. With this change, matrices are fully serializable and round-trip correctly, preserving dimensions.

HashMap Serialization

HashMaps use a dedicated parallel-array format:

HashMap {
    keys: Vec<SerializableVMValue>,
    values: Vec<SerializableVMValue>,
}

Previously, a HashMap was serialized as Array([Array([k, v]), ...]) — an array of key-value pair arrays. This lost the HashMap type identity on deserialization (it came back as a generic array). With the dedicated format:

Type identity is preserved: deserialization reconstructs a proper HashMap with rebuilt hash index
No wrapper allocations: keys and values are in flat parallel arrays instead of N pair-arrays

Backward Compatibility

Old snapshots using the previous formats (element-by-element arrays for typed arrays, array-of-pairs for hashmaps) are still deserializable. The new format is used only for new serializations.

Sidecar Splitting

For payloads containing large blobs (> 1 MB), sidecar splitting sends the bulk data as separate messages before the call request. This enables parallel delivery on QUIC and avoids blocking the main message channel.

The Problem

A remote call with a 50 MB DataTable argument would serialize the entire table inline in the Call message. On TCP, this blocks the connection until the entire 50 MB is transmitted. On QUIC, it prevents multiplexing other work on the same connection.

How It Works

The sidecar system walks the serialized value tree looking for any BlobRef whose content exceeds 1 MB. Each large blob is replaced with a lightweight SidecarRef placeholder, and the raw bytes are sent as a separate BlobSidecar message.

Caller                                   Remote
  │                                        │
  │── Sidecar({id:0, data: 30MB}) ───────>│  buffer
  │── Sidecar({id:1, data: 15MB}) ───────>│  buffer
  │── Call(request with SidecarRefs) ────>│  reassemble + execute
  │<── CallResponse ─────────────────────│
  │                                        │

SidecarRef

When a large blob is extracted, it is replaced with a SidecarRef — a variant of the SerializableVMValue tree — that preserves the metadata needed for reassembly:

SerializableVMValue::SidecarRef {
    sidecar_id: u32,       // matches the BlobSidecar.sidecar_id
    blob_kind: BlobKind,   // DataTable | TypedArray(TypedArrayElementKind) | Matrix
    original_hash: HashDigest,
    meta_a: u32,           // TypedArray: element count. Matrix: rows. Otherwise 0.
    meta_b: u32,           // Matrix: columns. Otherwise 0.
}

BlobKind has exactly three variants — DataTable, TypedArray (carrying a TypedArrayElementKind), and Matrix. The other blob-carrying value types (TypedTable, ColumnRef, RowView, IndexedTable) reuse BlobKind::DataTable since they are all backed by Arrow IPC data.

What Gets Extracted

The sidecar system handles all blob-carrying value types:

Value Type	Blob Source	Example
DataTable	Arrow IPC data	Large query results
TypedTable	Arrow IPC data	Typed table with schema
TypedArray	Raw element bytes	Large `Vec<number>` (> 131K f64 elements)
Matrix	Raw f64 bytes	Large matrix (> 131K f64 cells)
ColumnRef	Parent table’s Arrow data	Column view of large table
RowView	Parent table’s Arrow data	Row view of large table
IndexedTable	Index + table data	Indexed table

Nested blobs are found by recursive descent — a HashMap whose values contain large DataTables will have those tables extracted as sidecars.

QUIC Parallel Delivery

On TCP, sidecars are sent sequentially on the same stream. On QUIC, the Connection trait’s sidecar support enables parallel delivery:

pub trait Connection: Send {
    fn send(&mut self, payload: &[u8]) -> Result<(), TransportError>;
    fn recv(&mut self, timeout: Option<Duration>) -> Result<Vec<u8>, TransportError>;
    fn close(&mut self) -> Result<(), TransportError>;

    /// Whether this connection supports parallel sidecar delivery.
    fn supports_sidecars(&self) -> bool { false }

    /// Send a sidecar payload (possibly on a parallel stream).
    fn send_sidecar(&mut self, payload: &[u8]) -> Result<(), TransportError> {
        self.send(payload)
    }

    /// Receive any incoming message (regular or sidecar).
    fn recv_any(&mut self, timeout: Option<Duration>) -> Result<Vec<u8>, TransportError> {
        self.recv(timeout)
    }
}

QUIC connections override these methods to use separate unidirectional streams for each sidecar, taking advantage of QUIC’s built-in multiplexing. A 30 MB sidecar and a 15 MB sidecar can be delivered in parallel, with the main call message following on its own stream.

Zero Overhead When Not Needed

If a call has no large blobs, no sidecars are extracted and the protocol behaves exactly as before — zero overhead. The sidecar extraction only activates when a blob exceeds the 1 MB threshold.

Compression Interaction

Sidecar messages pass through the same framing layer as regular messages, so they benefit from zstd compression. Raw numeric data (typed arrays, matrices) typically compresses 1.5-3x, providing additional savings on top of the binary serialization.

Putting It All Together

Consider a remote call that sends a function with 5 dependencies and a 10 MB Vec<number> argument, over a persistent QUIC connection:

First Call

Serialization: The Vec<number> is serialized as raw bytes (10 MB instead of ~20 MB MessagePack)
Sidecar extraction: The 10 MB blob exceeds the 1 MB threshold, so it becomes a sidecar
No blob negotiation: first call on this connection, cache is empty
Wire transfer:
- Sidecar (10 MB raw bytes, zstd-compressed to ~5 MB) on a parallel QUIC stream
- Call message (~50 KB of function blobs + sidecar placeholder, zstd-compressed)
Remote: reassembles sidecar, caches blobs, executes

Total wire: ~5 MB (vs ~20 MB without optimizations)

Second Call (same function, different data)

Serialization: new Vec<number> serialized as raw bytes
Sidecar extraction: large blob becomes a sidecar
Blob negotiation: remote already has all 5 blobs, strips them from request
Wire transfer:
- Sidecar (new data, compressed) on a parallel stream
- Call message (~1 KB — just arguments and sidecar ref, no blobs)

Total wire: ~5 MB (vs ~20 MB, with near-zero protocol overhead)

Nth Call (same function, same data)

If using MemoizedTransport, the call hits the cache:

Total wire: 0 bytes — instant cache hit, no network traffic at all.

Configuration

The wire protocol optimizations are not configurable from Shape code — they operate transparently at the infrastructure level. The relevant constants are:

Constant	Value	Purpose
`COMPRESSION_THRESHOLD`	256 bytes	Payloads below this size are not compressed
`ZSTD_LEVEL`	3	zstd compression level (1-22 scale)
`MAX_DECOMPRESSED_SIZE`	256 MB	Safety limit for decompression
`MAX_PAYLOAD_SIZE`	64 MB	Maximum single-frame payload
`SIDECAR_THRESHOLD`	1 MB	Blobs larger than this become sidecars
`RemoteBlobCache` capacity	4096	Max cached function blobs per connection

Rust API

For embedders building custom transports or extending the wire protocol:

Framing

use shape_wire::transport::framing::{encode_framed, decode_framed};

// Compress a payload for wire transmission
let framed = encode_framed(&payload_bytes);

// Decompress a received frame
let original = decode_framed(&framed_bytes)?;

Blob Negotiation

use shape_vm::remote::{
    RemoteBlobCache, BlobNegotiationRequest,
    build_call_request_negotiated, handle_negotiation,
};

// Server side: maintain a cache per connection
let mut cache = RemoteBlobCache::default_cache();

// Handle incoming negotiation request
let request = BlobNegotiationRequest { offered_hashes: vec![...] };
let response = handle_negotiation(&request, &cache);

// Client side: build request with only missing blobs
let call = build_call_request_negotiated(
    &program, "my_function", args, &response.known_hashes
)?;

Sidecar Extraction

use shape_vm::remote::{extract_sidecars, reassemble_sidecars};
use shape_runtime::snapshot::SnapshotStore;

let store = SnapshotStore::open("/path/to/store")?;

// Extract large blobs from serialized args
let mut args = serialized_args;
let sidecars = extract_sidecars(&mut args, &store);

// Send sidecars first, then the call with SidecarRefs

// On the receiving end: reassemble
let sidecar_map: HashMap<u32, BlobSidecar> = sidecars
    .into_iter()
    .map(|s| (s.sidecar_id, s))
    .collect();
reassemble_sidecars(&mut args, &sidecar_map, &store)?;

Wire Protocol & Optimization

Wire Path Overview

Transparent Compression

Frame Format

Compression Policy

Decompression Safety

Compression Ratios

Example

Blob Negotiation

The Problem

The Protocol

Wire Messages

Remote Blob Cache

When Negotiation Applies

Savings

Binary Collection Serialization

The Problem

Typed Array Serialization

Matrix Serialization

HashMap Serialization

Backward Compatibility

Sidecar Splitting

The Problem

How It Works

SidecarRef

What Gets Extracted

QUIC Parallel Delivery

Zero Overhead When Not Needed

Compression Interaction

Putting It All Together

First Call

Second Call (same function, different data)

Nth Call (same function, same data)

Configuration

Rust API

Framing

Blob Negotiation

Sidecar Extraction

See Also