Skip to content

Wire Protocol & Optimization

Shape’s wire protocol governs how data moves between nodes during distributed execution. This chapter covers the four optimization layers that minimize bandwidth and latency: transparent compression, blob negotiation, efficient binary serialization for collections, and sidecar splitting for large payloads.

Every remote call follows this path:

Shape code
→ state::capture_call(f, args) # build CallPayload
→ state::serialize(payload) # MessagePack encode
→ encode_framed(bytes) # zstd compress (if beneficial)
→ write_length_prefixed(framed) # TCP/QUIC transport
→ network
→ read_length_prefixed(framed) # receive
→ decode_framed(bytes) # decompress (if compressed)
→ state::deserialize(bytes) # MessagePack decode
→ execute

The optimizations described here operate at three different layers of this pipeline: framing (compression), protocol (blob negotiation), serialization (binary collections), and transport (sidecar splitting).

All wire frames are transparently compressed using zstd. This happens automatically — no Shape code changes required.

[4-byte big-endian payload length] [flags: u8] [body...]

The flags byte is the first byte of the payload:

BitMeaning
0x01Body is zstd-compressed
0x00Body is uncompressed

Not all payloads benefit from compression. The framing layer applies a simple policy:

  1. Small payloads (< 256 bytes): stored uncompressed. The overhead of compression headers would negate any savings.
  2. Large payloads (>= 256 bytes): compressed with zstd at level 3. If the compressed form is not smaller than the original, the uncompressed form is used instead.

This means compression is always beneficial — it never makes payloads larger.

A MAX_DECOMPRESSED_SIZE limit of 256 MB prevents decompression bombs. If a compressed frame claims to decompress to more than 256 MB, the transport returns a TransportError.

Typical compression ratios for Shape wire payloads:

Payload TypeTypical RatioNotes
MessagePack (structured data)2-4xGood — repeated field names, type tags
Raw numeric arrays1.5-3xModerate — floating-point data has some redundancy
Function blobs (bytecode)3-8xExcellent — instruction patterns are highly regular
Already-compressed data~1xFalls back to uncompressed (no penalty)

Compression is fully transparent to Shape code:

from std::core::state use { serialize, deserialize }
from std::core::transport use { tcp, send }
let tcp = transport::tcp()
// This payload is automatically compressed on the wire:
let bytes = state::serialize(large_dataset)
let response = transport::send(tcp, "10.0.0.5:9000", bytes)?
// And automatically decompressed when received:
let result = state::deserialize(response)

When using persistent connections for repeated remote calls, blob negotiation avoids sending function blobs the remote node already has.

Every remote call includes the function’s FunctionBlob (bytecode, constants, string pool) plus the blobs of all transitive dependencies. For a function with 10 dependencies, the first call might transfer 50 KB of blobs. Without negotiation, every subsequent call to the same function transfers those same 50 KB — even though the remote already has them cached.

Blob negotiation uses a simple hash-list exchange before the call:

Caller Remote
│ │
│── BlobNegotiation({offered_hashes}) ──>│ check cache
│<── BlobNegotiationReply({known}) ──────│
│ strip known blobs from request │
│── Call(only missing blobs) ───────────>│ merge into cache + execute
│<── CallResponse ──────────────────────│
│ │
  1. Caller sends the content hashes of all blobs it would include in the call.
  2. Remote checks its blob cache, responds with which hashes it already has.
  3. Caller strips known blobs from the request, sending only missing ones.
  4. Remote merges new blobs into its cache and executes the call.

All wire communication is wrapped in a WireMessage envelope:

pub enum WireMessage {
// Distributed execution (V1)
BlobNegotiation(BlobNegotiationRequest),
BlobNegotiationReply(BlobNegotiationResponse),
Call(RemoteCallRequest),
CallResponse(RemoteCallResponse),
Sidecar(BlobSidecar),
// Execution server (V2)
Execute(ExecuteRequest),
ExecuteResponse(ExecuteResponse),
Validate(ValidateRequest),
ValidateResponse(ValidateResponse),
Auth(AuthRequest),
AuthResponse(AuthResponse),
ExecuteFile(ExecuteFileRequest),
ExecuteProject(ExecuteProjectRequest),
ValidatePath(ValidatePathRequest),
Ping(PingRequest),
Pong(ServerInfo),
}

The V2 variants power the shape serve command, which provides in-process code execution over the same wire protocol. See that page for details on the execution server API.

The negotiation request and response carry hash lists:

pub struct BlobNegotiationRequest {
pub offered_hashes: Vec<FunctionHash>,
}
pub struct BlobNegotiationResponse {
pub known_hashes: Vec<FunctionHash>,
}

Each persistent connection maintains a RemoteBlobCache — an LRU cache of function blobs indexed by content hash:

  • Default capacity: 4096 entries
  • Eviction: least-recently-used when at capacity
  • No invalidation needed: content hashes make stale entries harmless — if a blob’s content changes, its hash changes, so the old entry simply becomes unreachable
ScenarioNegotiation?Reason
Persistent connection, repeated callsYesCache accumulates known blobs
Persistent connection, first callSkippedEmpty cache, nothing to negotiate
One-shot transport::send(...)NoNo persistent connection = no cache

For a typical workflow where the same function is called repeatedly with different arguments:

  • First call: transfers all blobs (no savings)
  • Second call: negotiation strips all blobs (saves ~50 KB for a 10-dependency function)
  • Nth call: same as second (the savings compound with frequency)

For long-running connections calling the same function thousands of times, blob negotiation reduces per-call overhead to just the argument payload.

Shape’s wire format uses MessagePack for general values, but collection types (typed arrays, matrices, hashmaps) use optimized binary serialization that avoids per-element overhead.

Consider a Vec<number> with 1 million elements. Naively, each element is serialized as an individual MessagePack Number value — adding type tags and encoding overhead per element. A million f64 values that should be 8 MB of raw data becomes ~16 MB of MessagePack.

All 11 typed array kinds are serialized as raw byte buffers:

Element KindByte SizeShape Type
F648Vec<number>
I648Vec<int>
F324Vec<f32>
I324Vec<i32>
I162Vec<i16>
I81Vec<i8>
U648Vec<u64>
U324Vec<u32>
U162Vec<u16>
U81Vec<u8>
Bool1Vec<bool>

The serialized form is:

TypedArray {
element_kind: TypedArrayElementKind, // which type
blob: BlobRef, // content-addressed raw bytes
len: usize, // element count
}

The raw bytes are stored via the content-addressed blob store, chunked into 256 KB pieces for deduplication. For a 1M-element Vec<number>:

  • Before: 1M individual MessagePack Number values (~16 MB)
  • After: 8 MB raw bytes, content-addressed and chunked (~8 MB, further reduced by zstd compression)

Matrices (Mat<number>) are serialized as raw f64 bytes with dimension metadata:

Matrix {
blob: BlobRef, // raw f64 bytes, row-major order
rows: u32,
cols: u32,
}

Previously, matrices could not be serialized at all — attempting to serialize a matrix would produce an error. With this change, matrices are fully serializable and round-trip correctly, preserving dimensions.

HashMaps use a dedicated parallel-array format:

HashMap {
keys: Vec<SerializableVMValue>,
values: Vec<SerializableVMValue>,
}

Previously, a HashMap was serialized as Array([Array([k, v]), ...]) — an array of key-value pair arrays. This lost the HashMap type identity on deserialization (it came back as a generic array). With the dedicated format:

  • Type identity is preserved: deserialization reconstructs a proper HashMap with rebuilt hash index
  • No wrapper allocations: keys and values are in flat parallel arrays instead of N pair-arrays

Old snapshots using the previous formats (element-by-element arrays for typed arrays, array-of-pairs for hashmaps) are still deserializable. The new format is used only for new serializations.

For payloads containing large blobs (> 1 MB), sidecar splitting sends the bulk data as separate messages before the call request. This enables parallel delivery on QUIC and avoids blocking the main message channel.

A remote call with a 50 MB DataTable argument would serialize the entire table inline in the Call message. On TCP, this blocks the connection until the entire 50 MB is transmitted. On QUIC, it prevents multiplexing other work on the same connection.

The sidecar system walks the serialized value tree looking for any BlobRef whose content exceeds 1 MB. Each large blob is replaced with a lightweight SidecarRef placeholder, and the raw bytes are sent as a separate BlobSidecar message.

Caller Remote
│ │
│── Sidecar({id:0, data: 30MB}) ───────>│ buffer
│── Sidecar({id:1, data: 15MB}) ───────>│ buffer
│── Call(request with SidecarRefs) ────>│ reassemble + execute
│<── CallResponse ─────────────────────│
│ │

When a large blob is extracted, it is replaced with a SidecarRef — a variant of the SerializableVMValue tree — that preserves the metadata needed for reassembly:

SerializableVMValue::SidecarRef {
sidecar_id: u32, // matches the BlobSidecar.sidecar_id
blob_kind: BlobKind, // DataTable | TypedArray(TypedArrayElementKind) | Matrix
original_hash: HashDigest,
meta_a: u32, // TypedArray: element count. Matrix: rows. Otherwise 0.
meta_b: u32, // Matrix: columns. Otherwise 0.
}

BlobKind has exactly three variants — DataTable, TypedArray (carrying a TypedArrayElementKind), and Matrix. The other blob-carrying value types (TypedTable, ColumnRef, RowView, IndexedTable) reuse BlobKind::DataTable since they are all backed by Arrow IPC data.

The sidecar system handles all blob-carrying value types:

Value TypeBlob SourceExample
DataTableArrow IPC dataLarge query results
TypedTableArrow IPC dataTyped table with schema
TypedArrayRaw element bytesLarge Vec<number> (> 131K f64 elements)
MatrixRaw f64 bytesLarge matrix (> 131K f64 cells)
ColumnRefParent table’s Arrow dataColumn view of large table
RowViewParent table’s Arrow dataRow view of large table
IndexedTableIndex + table dataIndexed table

Nested blobs are found by recursive descent — a HashMap whose values contain large DataTables will have those tables extracted as sidecars.

On TCP, sidecars are sent sequentially on the same stream. On QUIC, the Connection trait’s sidecar support enables parallel delivery:

pub trait Connection: Send {
fn send(&mut self, payload: &[u8]) -> Result<(), TransportError>;
fn recv(&mut self, timeout: Option<Duration>) -> Result<Vec<u8>, TransportError>;
fn close(&mut self) -> Result<(), TransportError>;
/// Whether this connection supports parallel sidecar delivery.
fn supports_sidecars(&self) -> bool { false }
/// Send a sidecar payload (possibly on a parallel stream).
fn send_sidecar(&mut self, payload: &[u8]) -> Result<(), TransportError> {
self.send(payload)
}
/// Receive any incoming message (regular or sidecar).
fn recv_any(&mut self, timeout: Option<Duration>) -> Result<Vec<u8>, TransportError> {
self.recv(timeout)
}
}

QUIC connections override these methods to use separate unidirectional streams for each sidecar, taking advantage of QUIC’s built-in multiplexing. A 30 MB sidecar and a 15 MB sidecar can be delivered in parallel, with the main call message following on its own stream.

If a call has no large blobs, no sidecars are extracted and the protocol behaves exactly as before — zero overhead. The sidecar extraction only activates when a blob exceeds the 1 MB threshold.

Sidecar messages pass through the same framing layer as regular messages, so they benefit from zstd compression. Raw numeric data (typed arrays, matrices) typically compresses 1.5-3x, providing additional savings on top of the binary serialization.

Consider a remote call that sends a function with 5 dependencies and a 10 MB Vec<number> argument, over a persistent QUIC connection:

  1. Serialization: The Vec<number> is serialized as raw bytes (10 MB instead of ~20 MB MessagePack)
  2. Sidecar extraction: The 10 MB blob exceeds the 1 MB threshold, so it becomes a sidecar
  3. No blob negotiation: first call on this connection, cache is empty
  4. Wire transfer:
    • Sidecar (10 MB raw bytes, zstd-compressed to ~5 MB) on a parallel QUIC stream
    • Call message (~50 KB of function blobs + sidecar placeholder, zstd-compressed)
  5. Remote: reassembles sidecar, caches blobs, executes

Total wire: ~5 MB (vs ~20 MB without optimizations)

Second Call (same function, different data)

Section titled “Second Call (same function, different data)”
  1. Serialization: new Vec<number> serialized as raw bytes
  2. Sidecar extraction: large blob becomes a sidecar
  3. Blob negotiation: remote already has all 5 blobs, strips them from request
  4. Wire transfer:
    • Sidecar (new data, compressed) on a parallel stream
    • Call message (~1 KB — just arguments and sidecar ref, no blobs)

Total wire: ~5 MB (vs ~20 MB, with near-zero protocol overhead)

If using MemoizedTransport, the call hits the cache:

Total wire: 0 bytes — instant cache hit, no network traffic at all.

The wire protocol optimizations are not configurable from Shape code — they operate transparently at the infrastructure level. The relevant constants are:

ConstantValuePurpose
COMPRESSION_THRESHOLD256 bytesPayloads below this size are not compressed
ZSTD_LEVEL3zstd compression level (1-22 scale)
MAX_DECOMPRESSED_SIZE256 MBSafety limit for decompression
MAX_PAYLOAD_SIZE64 MBMaximum single-frame payload
SIDECAR_THRESHOLD1 MBBlobs larger than this become sidecars
RemoteBlobCache capacity4096Max cached function blobs per connection

For embedders building custom transports or extending the wire protocol:

use shape_wire::transport::framing::{encode_framed, decode_framed};
// Compress a payload for wire transmission
let framed = encode_framed(&payload_bytes);
// Decompress a received frame
let original = decode_framed(&framed_bytes)?;
use shape_vm::remote::{
RemoteBlobCache, BlobNegotiationRequest,
build_call_request_negotiated, handle_negotiation,
};
// Server side: maintain a cache per connection
let mut cache = RemoteBlobCache::default_cache();
// Handle incoming negotiation request
let request = BlobNegotiationRequest { offered_hashes: vec![...] };
let response = handle_negotiation(&request, &cache);
// Client side: build request with only missing blobs
let call = build_call_request_negotiated(
&program, "my_function", args, &response.known_hashes
)?;
use shape_vm::remote::{extract_sidecars, reassemble_sidecars};
use shape_runtime::snapshot::SnapshotStore;
let store = SnapshotStore::open("/path/to/store")?;
// Extract large blobs from serialized args
let mut args = serialized_args;
let sidecars = extract_sidecars(&mut args, &store);
// Send sidecars first, then the call with SidecarRefs
// On the receiving end: reassemble
let sidecar_map: HashMap<u32, BlobSidecar> = sidecars
.into_iter()
.map(|s| (s.sidecar_id, s))
.collect();
reassemble_sidecars(&mut args, &sidecar_map, &store)?;