Wire Protocol & Optimization
Shape’s wire protocol governs how data moves between nodes during distributed execution. This chapter covers the four optimization layers that minimize bandwidth and latency: transparent compression, blob negotiation, efficient binary serialization for collections, and sidecar splitting for large payloads.
Wire Path Overview
Section titled “Wire Path Overview”Every remote call follows this path:
Shape code → state::capture_call(f, args) # build CallPayload → state::serialize(payload) # MessagePack encode → encode_framed(bytes) # zstd compress (if beneficial) → write_length_prefixed(framed) # TCP/QUIC transport → network → read_length_prefixed(framed) # receive → decode_framed(bytes) # decompress (if compressed) → state::deserialize(bytes) # MessagePack decode → executeThe optimizations described here operate at three different layers of this pipeline: framing (compression), protocol (blob negotiation), serialization (binary collections), and transport (sidecar splitting).
Transparent Compression
Section titled “Transparent Compression”All wire frames are transparently compressed using zstd. This happens automatically — no Shape code changes required.
Frame Format
Section titled “Frame Format”[4-byte big-endian payload length] [flags: u8] [body...]The flags byte is the first byte of the payload:
| Bit | Meaning |
|---|---|
0x01 | Body is zstd-compressed |
0x00 | Body is uncompressed |
Compression Policy
Section titled “Compression Policy”Not all payloads benefit from compression. The framing layer applies a simple policy:
- Small payloads (< 256 bytes): stored uncompressed. The overhead of compression headers would negate any savings.
- Large payloads (>= 256 bytes): compressed with zstd at level 3. If the compressed form is not smaller than the original, the uncompressed form is used instead.
This means compression is always beneficial — it never makes payloads larger.
Decompression Safety
Section titled “Decompression Safety”A MAX_DECOMPRESSED_SIZE limit of 256 MB prevents decompression bombs. If a
compressed frame claims to decompress to more than 256 MB, the transport
returns a TransportError.
Compression Ratios
Section titled “Compression Ratios”Typical compression ratios for Shape wire payloads:
| Payload Type | Typical Ratio | Notes |
|---|---|---|
| MessagePack (structured data) | 2-4x | Good — repeated field names, type tags |
| Raw numeric arrays | 1.5-3x | Moderate — floating-point data has some redundancy |
| Function blobs (bytecode) | 3-8x | Excellent — instruction patterns are highly regular |
| Already-compressed data | ~1x | Falls back to uncompressed (no penalty) |
Example
Section titled “Example”Compression is fully transparent to Shape code:
from std::core::state use { serialize, deserialize }from std::core::transport use { tcp, send }
let tcp = transport::tcp()
// This payload is automatically compressed on the wire:let bytes = state::serialize(large_dataset)let response = transport::send(tcp, "10.0.0.5:9000", bytes)?
// And automatically decompressed when received:let result = state::deserialize(response)Blob Negotiation
Section titled “Blob Negotiation”When using persistent connections for repeated remote calls, blob negotiation avoids sending function blobs the remote node already has.
The Problem
Section titled “The Problem”Every remote call includes the function’s FunctionBlob (bytecode, constants,
string pool) plus the blobs of all transitive dependencies. For a function with
10 dependencies, the first call might transfer 50 KB of blobs. Without
negotiation, every subsequent call to the same function transfers those same
50 KB — even though the remote already has them cached.
The Protocol
Section titled “The Protocol”Blob negotiation uses a simple hash-list exchange before the call:
Caller Remote │ │ │── BlobNegotiation({offered_hashes}) ──>│ check cache │<── BlobNegotiationReply({known}) ──────│ │ strip known blobs from request │ │── Call(only missing blobs) ───────────>│ merge into cache + execute │<── CallResponse ──────────────────────│ │ │- Caller sends the content hashes of all blobs it would include in the call.
- Remote checks its blob cache, responds with which hashes it already has.
- Caller strips known blobs from the request, sending only missing ones.
- Remote merges new blobs into its cache and executes the call.
Wire Messages
Section titled “Wire Messages”All wire communication is wrapped in a WireMessage envelope:
pub enum WireMessage { // Distributed execution (V1) BlobNegotiation(BlobNegotiationRequest), BlobNegotiationReply(BlobNegotiationResponse), Call(RemoteCallRequest), CallResponse(RemoteCallResponse), Sidecar(BlobSidecar),
// Execution server (V2) Execute(ExecuteRequest), ExecuteResponse(ExecuteResponse), Validate(ValidateRequest), ValidateResponse(ValidateResponse), Auth(AuthRequest), AuthResponse(AuthResponse), ExecuteFile(ExecuteFileRequest), ExecuteProject(ExecuteProjectRequest), ValidatePath(ValidatePathRequest), Ping(PingRequest), Pong(ServerInfo),}The V2 variants power the shape serve command,
which provides in-process code execution over the same wire protocol. See that
page for details on the execution server API.
The negotiation request and response carry hash lists:
pub struct BlobNegotiationRequest { pub offered_hashes: Vec<FunctionHash>,}
pub struct BlobNegotiationResponse { pub known_hashes: Vec<FunctionHash>,}Remote Blob Cache
Section titled “Remote Blob Cache”Each persistent connection maintains a RemoteBlobCache — an LRU cache of
function blobs indexed by content hash:
- Default capacity: 4096 entries
- Eviction: least-recently-used when at capacity
- No invalidation needed: content hashes make stale entries harmless — if a blob’s content changes, its hash changes, so the old entry simply becomes unreachable
When Negotiation Applies
Section titled “When Negotiation Applies”| Scenario | Negotiation? | Reason |
|---|---|---|
| Persistent connection, repeated calls | Yes | Cache accumulates known blobs |
| Persistent connection, first call | Skipped | Empty cache, nothing to negotiate |
One-shot transport::send(...) | No | No persistent connection = no cache |
Savings
Section titled “Savings”For a typical workflow where the same function is called repeatedly with different arguments:
- First call: transfers all blobs (no savings)
- Second call: negotiation strips all blobs (saves ~50 KB for a 10-dependency function)
- Nth call: same as second (the savings compound with frequency)
For long-running connections calling the same function thousands of times, blob negotiation reduces per-call overhead to just the argument payload.
Binary Collection Serialization
Section titled “Binary Collection Serialization”Shape’s wire format uses MessagePack for general values, but collection types (typed arrays, matrices, hashmaps) use optimized binary serialization that avoids per-element overhead.
The Problem
Section titled “The Problem”Consider a Vec<number> with 1 million elements. Naively, each element is
serialized as an individual MessagePack Number value — adding type tags and
encoding overhead per element. A million f64 values that should be 8 MB of raw
data becomes ~16 MB of MessagePack.
Typed Array Serialization
Section titled “Typed Array Serialization”All 11 typed array kinds are serialized as raw byte buffers:
| Element Kind | Byte Size | Shape Type |
|---|---|---|
F64 | 8 | Vec<number> |
I64 | 8 | Vec<int> |
F32 | 4 | Vec<f32> |
I32 | 4 | Vec<i32> |
I16 | 2 | Vec<i16> |
I8 | 1 | Vec<i8> |
U64 | 8 | Vec<u64> |
U32 | 4 | Vec<u32> |
U16 | 2 | Vec<u16> |
U8 | 1 | Vec<u8> |
Bool | 1 | Vec<bool> |
The serialized form is:
TypedArray { element_kind: TypedArrayElementKind, // which type blob: BlobRef, // content-addressed raw bytes len: usize, // element count}The raw bytes are stored via the content-addressed blob store, chunked into
256 KB pieces for deduplication. For a 1M-element Vec<number>:
- Before: 1M individual MessagePack Number values (~16 MB)
- After: 8 MB raw bytes, content-addressed and chunked (~8 MB, further reduced by zstd compression)
Matrix Serialization
Section titled “Matrix Serialization”Matrices (Mat<number>) are serialized as raw f64 bytes with dimension
metadata:
Matrix { blob: BlobRef, // raw f64 bytes, row-major order rows: u32, cols: u32,}Previously, matrices could not be serialized at all — attempting to serialize a matrix would produce an error. With this change, matrices are fully serializable and round-trip correctly, preserving dimensions.
HashMap Serialization
Section titled “HashMap Serialization”HashMaps use a dedicated parallel-array format:
HashMap { keys: Vec<SerializableVMValue>, values: Vec<SerializableVMValue>,}Previously, a HashMap was serialized as Array([Array([k, v]), ...]) — an
array of key-value pair arrays. This lost the HashMap type identity on
deserialization (it came back as a generic array). With the dedicated format:
- Type identity is preserved: deserialization reconstructs a proper HashMap with rebuilt hash index
- No wrapper allocations: keys and values are in flat parallel arrays instead of N pair-arrays
Backward Compatibility
Section titled “Backward Compatibility”Old snapshots using the previous formats (element-by-element arrays for typed arrays, array-of-pairs for hashmaps) are still deserializable. The new format is used only for new serializations.
Sidecar Splitting
Section titled “Sidecar Splitting”For payloads containing large blobs (> 1 MB), sidecar splitting sends the bulk data as separate messages before the call request. This enables parallel delivery on QUIC and avoids blocking the main message channel.
The Problem
Section titled “The Problem”A remote call with a 50 MB DataTable argument would serialize the entire table
inline in the Call message. On TCP, this blocks the connection until the
entire 50 MB is transmitted. On QUIC, it prevents multiplexing other work on
the same connection.
How It Works
Section titled “How It Works”The sidecar system walks the serialized value tree looking for any BlobRef
whose content exceeds 1 MB. Each large blob is replaced with a lightweight
SidecarRef placeholder, and the raw bytes are sent as a separate BlobSidecar
message.
Caller Remote │ │ │── Sidecar({id:0, data: 30MB}) ───────>│ buffer │── Sidecar({id:1, data: 15MB}) ───────>│ buffer │── Call(request with SidecarRefs) ────>│ reassemble + execute │<── CallResponse ─────────────────────│ │ │SidecarRef
Section titled “SidecarRef”When a large blob is extracted, it is replaced with a SidecarRef — a
variant of the SerializableVMValue tree — that preserves the metadata
needed for reassembly:
SerializableVMValue::SidecarRef { sidecar_id: u32, // matches the BlobSidecar.sidecar_id blob_kind: BlobKind, // DataTable | TypedArray(TypedArrayElementKind) | Matrix original_hash: HashDigest, meta_a: u32, // TypedArray: element count. Matrix: rows. Otherwise 0. meta_b: u32, // Matrix: columns. Otherwise 0.}BlobKind has exactly three variants — DataTable, TypedArray (carrying
a TypedArrayElementKind), and Matrix. The other blob-carrying value types
(TypedTable, ColumnRef, RowView, IndexedTable) reuse BlobKind::DataTable
since they are all backed by Arrow IPC data.
What Gets Extracted
Section titled “What Gets Extracted”The sidecar system handles all blob-carrying value types:
| Value Type | Blob Source | Example |
|---|---|---|
| DataTable | Arrow IPC data | Large query results |
| TypedTable | Arrow IPC data | Typed table with schema |
| TypedArray | Raw element bytes | Large Vec<number> (> 131K f64 elements) |
| Matrix | Raw f64 bytes | Large matrix (> 131K f64 cells) |
| ColumnRef | Parent table’s Arrow data | Column view of large table |
| RowView | Parent table’s Arrow data | Row view of large table |
| IndexedTable | Index + table data | Indexed table |
Nested blobs are found by recursive descent — a HashMap whose values contain large DataTables will have those tables extracted as sidecars.
QUIC Parallel Delivery
Section titled “QUIC Parallel Delivery”On TCP, sidecars are sent sequentially on the same stream. On QUIC, the
Connection trait’s sidecar support enables parallel delivery:
pub trait Connection: Send { fn send(&mut self, payload: &[u8]) -> Result<(), TransportError>; fn recv(&mut self, timeout: Option<Duration>) -> Result<Vec<u8>, TransportError>; fn close(&mut self) -> Result<(), TransportError>;
/// Whether this connection supports parallel sidecar delivery. fn supports_sidecars(&self) -> bool { false }
/// Send a sidecar payload (possibly on a parallel stream). fn send_sidecar(&mut self, payload: &[u8]) -> Result<(), TransportError> { self.send(payload) }
/// Receive any incoming message (regular or sidecar). fn recv_any(&mut self, timeout: Option<Duration>) -> Result<Vec<u8>, TransportError> { self.recv(timeout) }}QUIC connections override these methods to use separate unidirectional streams for each sidecar, taking advantage of QUIC’s built-in multiplexing. A 30 MB sidecar and a 15 MB sidecar can be delivered in parallel, with the main call message following on its own stream.
Zero Overhead When Not Needed
Section titled “Zero Overhead When Not Needed”If a call has no large blobs, no sidecars are extracted and the protocol behaves exactly as before — zero overhead. The sidecar extraction only activates when a blob exceeds the 1 MB threshold.
Compression Interaction
Section titled “Compression Interaction”Sidecar messages pass through the same framing layer as regular messages, so they benefit from zstd compression. Raw numeric data (typed arrays, matrices) typically compresses 1.5-3x, providing additional savings on top of the binary serialization.
Putting It All Together
Section titled “Putting It All Together”Consider a remote call that sends a function with 5 dependencies and a 10 MB
Vec<number> argument, over a persistent QUIC connection:
First Call
Section titled “First Call”- Serialization: The
Vec<number>is serialized as raw bytes (10 MB instead of ~20 MB MessagePack) - Sidecar extraction: The 10 MB blob exceeds the 1 MB threshold, so it becomes a sidecar
- No blob negotiation: first call on this connection, cache is empty
- Wire transfer:
- Sidecar (10 MB raw bytes, zstd-compressed to ~5 MB) on a parallel QUIC stream
- Call message (~50 KB of function blobs + sidecar placeholder, zstd-compressed)
- Remote: reassembles sidecar, caches blobs, executes
Total wire: ~5 MB (vs ~20 MB without optimizations)
Second Call (same function, different data)
Section titled “Second Call (same function, different data)”- Serialization: new
Vec<number>serialized as raw bytes - Sidecar extraction: large blob becomes a sidecar
- Blob negotiation: remote already has all 5 blobs, strips them from request
- Wire transfer:
- Sidecar (new data, compressed) on a parallel stream
- Call message (~1 KB — just arguments and sidecar ref, no blobs)
Total wire: ~5 MB (vs ~20 MB, with near-zero protocol overhead)
Nth Call (same function, same data)
Section titled “Nth Call (same function, same data)”If using MemoizedTransport, the call hits the cache:
Total wire: 0 bytes — instant cache hit, no network traffic at all.
Configuration
Section titled “Configuration”The wire protocol optimizations are not configurable from Shape code — they operate transparently at the infrastructure level. The relevant constants are:
| Constant | Value | Purpose |
|---|---|---|
COMPRESSION_THRESHOLD | 256 bytes | Payloads below this size are not compressed |
ZSTD_LEVEL | 3 | zstd compression level (1-22 scale) |
MAX_DECOMPRESSED_SIZE | 256 MB | Safety limit for decompression |
MAX_PAYLOAD_SIZE | 64 MB | Maximum single-frame payload |
SIDECAR_THRESHOLD | 1 MB | Blobs larger than this become sidecars |
RemoteBlobCache capacity | 4096 | Max cached function blobs per connection |
Rust API
Section titled “Rust API”For embedders building custom transports or extending the wire protocol:
Framing
Section titled “Framing”use shape_wire::transport::framing::{encode_framed, decode_framed};
// Compress a payload for wire transmissionlet framed = encode_framed(&payload_bytes);
// Decompress a received framelet original = decode_framed(&framed_bytes)?;Blob Negotiation
Section titled “Blob Negotiation”use shape_vm::remote::{ RemoteBlobCache, BlobNegotiationRequest, build_call_request_negotiated, handle_negotiation,};
// Server side: maintain a cache per connectionlet mut cache = RemoteBlobCache::default_cache();
// Handle incoming negotiation requestlet request = BlobNegotiationRequest { offered_hashes: vec![...] };let response = handle_negotiation(&request, &cache);
// Client side: build request with only missing blobslet call = build_call_request_negotiated( &program, "my_function", args, &response.known_hashes)?;Sidecar Extraction
Section titled “Sidecar Extraction”use shape_vm::remote::{extract_sidecars, reassemble_sidecars};use shape_runtime::snapshot::SnapshotStore;
let store = SnapshotStore::open("/path/to/store")?;
// Extract large blobs from serialized argslet mut args = serialized_args;let sidecars = extract_sidecars(&mut args, &store);
// Send sidecars first, then the call with SidecarRefs
// On the receiving end: reassemblelet sidecar_map: HashMap<u32, BlobSidecar> = sidecars .into_iter() .map(|s| (s.sidecar_id, s)) .collect();reassemble_sidecars(&mut args, &sidecar_map, &store)?;See Also
Section titled “See Also”- Execution Server —
shape servecommand using the V2 protocol - Transport Layer — TCP, QUIC, and
Connectiontrait - Content-Addressed Bytecode —
FunctionBlob, content hashes, and distribution - Standard Library: State —
state::serialize,CallPayload, and collection types - Module Distribution & Signatures — manifests and blob stores
- Resumability & Distributed Computing — snapshot/resume basics