JIT Compilation
Shape’s JIT compilation system operates at three levels: scoped per-function JIT, tiered compilation with background promotion, and cross-function optimization with inlining and constant propagation. The content-addressed bytecode architecture makes each of these levels natural — every function blob is an independent compilation unit with a stable identity.
Scoped Per-Function JIT
Section titled “Scoped Per-Function JIT”Because every function lives in its own FunctionBlob, JIT compilation is
naturally scoped to individual functions. Each blob is independently assessed
for JIT compatibility via a per-blob preflight check before any code generation
occurs.
JIT-compatible functions — those containing only supported operations (arithmetic, comparisons, local variable access, direct calls, control flow) — are compiled via Cranelift to native machine code.
JIT-incompatible functions — those using async operations, unsupported builtins, or complex runtime features — remain interpreted by the VM. There is no penalty; the interpreter handles them exactly as before.
MixedFunctionTable
Section titled “MixedFunctionTable”The function table after selective JIT contains three entry types that coexist in a single lookup structure:
enum FunctionEntry { /// JIT-compiled function pointer — call directly via native ABI Native(*const u8),
/// VM interpreter fallback — execute via bytecode dispatch (function index) Interpreted(u16),
/// Awaiting background compilation — currently interpreted, will promote Pending(u16),}Native(*const u8)holds a raw pointer to JIT-compiled machine code. The VM calls through this pointer directly, bypassing the interpreter entirely.Interpreted(u16)holds a function index into the linked program. The VM dispatches these through its normal bytecode loop.Pending(u16)marks a function that has been submitted for background JIT compilation but has not yet completed. It behaves asInterpreteduntil the compiled result is ready.
Function Table (after selective JIT): [0] train_model → Native (numeric, hot, JIT-compatible) [1] parse_config → Interpreted (complex object ops) [2] compute_signal → Native (inner loop, numeric) [3] format_output → Interpreted (strings, objects)VM Fallback Trampoline
Section titled “VM Fallback Trampoline”When JIT-compiled code calls a function that is Interpreted or Pending, the
runtime uses a fallback trampoline to bridge the two execution modes:
- The trampoline reads the
function_idfrom the call stub embedded in the JIT code. - It marshals arguments from the JIT (native) stack layout to the VM stack layout.
- It invokes the VM interpreter for that function.
- When the interpreter returns, the trampoline marshals the result back to the JIT calling convention and returns control to the native caller.
This trampoline is transparent to both the JIT and interpreted sides — mixed call chains work seamlessly regardless of which functions are native and which are interpreted.
JIT Dispatch Table
Section titled “JIT Dispatch Table”The VM maintains a dispatch table that maps function IDs to JIT-compiled native code pointers:
pub type JitFnPtr = unsafe extern "C" fn(*mut u8, *const u8) -> u64;
// On VirtualMachine:jit_dispatch_table: HashMap<u16, JitFnPtr>External code (e.g., the shape-jit crate) registers compiled functions via
vm.register_jit_function(function_id, ptr). When the VM’s Call opcode
handler encounters a function with a dispatch table entry, it attempts JIT
dispatch. If the marshaling bridge is not yet implemented for a particular
calling convention, the VM falls through to bytecode interpretation — registered
JIT entries never cause hard errors.
Content-Addressed JIT Cache
Section titled “Content-Addressed JIT Cache”JIT output is cached by blob content hash. JitCodeCache
(crates/shape-jit/src/jit_cache.rs) keeps one entry per function hash,
carrying the native code pointer plus enough metadata to invalidate it:
pub struct CacheEntry { /// Native code pointer. pub code_ptr: *const u8, /// Content hash of the function blob. pub function_hash: FunctionHash, /// Schema version at compilation time (for shape guard invalidation). pub schema_version: u32, /// Feedback epoch at compilation time (for speculation invalidation). pub feedback_epoch: u32, /// Hashes of functions this compiled code depends on (inlined callees). pub dependencies: Vec<FunctionHash>, /// Tier 2 cache key, present for optimizing-compiler output. pub tier2_key: Option<Tier2CacheKey>,}Same blob hash means same native code. If the same utility function appears in
ten different programs, baseline (Tier 1) code for it is JIT-compiled exactly
once and reused everywhere — Tier 1 carries no speculation, so its output is
stable for a given content hash. Tier 2 entries embed speculative shape guards;
they are invalidated via invalidate_by_dependency() when an inlined callee
changes or when the schema version / feedback epoch advances.
Tiered Compilation
Section titled “Tiered Compilation”Shape uses a three-tier compilation strategy. Every function starts interpreted and is promoted to higher tiers based on observed call frequency.
Tier Definitions
Section titled “Tier Definitions”| Tier | Name | Threshold | Description |
|---|---|---|---|
| 0 | Interpreted | 0 calls | All functions start here. Full bytecode interpretation. |
| 1 | BaselineJit | 100 calls | Per-function JIT compilation. No cross-function optimization. |
| 2 | OptimizingJit | 10,000 calls | Feedback-guided inlining and constant propagation. (Devirtualization is a planned v0.4 addition.) |
Promotion thresholds are checked at function entry. When a function’s call count crosses a tier boundary, a compilation request is submitted for the next tier.
Per-Function Tier State
Section titled “Per-Function Tier State”Each function tracks its own compilation state:
struct FunctionTierState { /// Current execution tier tier: Tier,
/// Cumulative call count since program start call_count: u32,
/// Whether a compilation request is already in flight compilation_pending: bool,}call_count is a u32 — 4.3 billion entries is far above any realistic
single-function call count, and a 32-bit counter keeps FunctionTierState
compact for the per-function dispatch path.
The compilation_pending flag prevents duplicate submissions. Once a
compilation completes, the flag is cleared and the function’s tier is updated
atomically.
Background Compilation
Section titled “Background Compilation”Compilation happens off the hot path, on a dedicated background thread:
- When a function crosses a tier threshold, the VM creates a
CompilationRequestcontaining the function blob, target tier, and any profiling data collected so far. - This request is sent via an
mpscchannel to the background compilation thread. - The background thread owns the JIT compiler instance. It processes requests
sequentially, producing a
CompilationResultwith the native code pointer (or an error if compilation fails). - The result is sent back via a second
mpscchannel. - The VM checks
try_recv()at safe points — function entry and loop back-edges — to pick up completed compilations without blocking. - On receiving a successful result, the VM calls
promote_to_native(id, ptr)to atomically swap the function table entry fromPending(orInterpreted) toNative.
VM hot path Background thread │ │ ├─ call_count hits 100 ─────► │ │ CompilationRequest │ │ ├─ Cranelift compile │ (function continues │ │ interpreted) │ │ ├─ CompilationResult ────► │ │ ├─ try_recv() at safe point │ │ promote_to_native(id, ptr) │ │ │ ├─ next call → Native │Functions continue executing as interpreted while compilation proceeds in the background. There is no stop-the-world pause. The transition from interpreted to native is atomic and takes effect on the next call to that function.
--mode jit semantics
Section titled “--mode jit semantics”The CLI --mode jit flag (default) requests JIT compilation for the toplevel
script and every reachable function. The semantics are:
- Toplevel script + functions attempt JIT compile when the bytecode is
JIT-compatible (passes
compile_program_selective’s per-function and main-code preflight). - On JIT-compile failure, the executor falls through to the bytecode
interpreter for the whole program. This is not silent-no-output — the
interpreter re-runs the same parsed
Programand produces the same observable result a--mode vminvocation would. - A one-line diagnostic is emitted to stderr at
tracing::infolevel when fall-through fires:The diagnostic is always visible (uses[jit-fallback] function main failed JIT compile: <reason>; running under interpretereprintln!, no subscriber required). Verbose JIT pipeline tracing is gated behind--trace-jit=shape_jit=debug(replaces the legacySHAPE_JIT_DEBUGenv-var per closure-wave-F migration). - Tier-up promotion is preserved on hot functions per the T1@100 / T2@10k
thresholds — fall-through only fires when the entire program cannot be JIT-
compiled at all (e.g. toplevel main code contains an opcode that the JIT
preflight rejects, such as
AllocSharedModuleBinding). Programs that JIT- compile successfully run the JIT path; tier promotion happens transparently on functions that cross the call-count thresholds.
The fall-through path is implemented in JITExecutor::execute_program
(crates/shape-jit/src/executor.rs). It catches every Err from the JIT
sub-pipeline — preflight rejection, Cranelift codegen failure, FFI linking
failure, JIT runtime signal, RETURN_TAG_NANBOXED surface-and-stop — and
re-dispatches to BytecodeExecutor::execute_program with the same Program.
Verifying fall-through behavior
Section titled “Verifying fall-through behavior”The supervisor-ratified corrected smoke harness reads stdout via tail -1 and
the exit code separately to avoid the tail | echo EXIT=$? defection that
masked silent-no-output across the entire project trajectory pre-W12:
out=$(timeout 30 ./target/release/shape run --mode $mode $file 2>/dev/null | tail -1)ec=$?echo "$mode/$name: $out (exit=$ec)"VM and JIT should produce identical stdout for any program that runs without
runtime error in either mode; [jit-fallback] appears on stderr only when the
JIT path could not compile the program at all.
Cross-Function Optimization (Tier 2)
Section titled “Cross-Function Optimization (Tier 2)”Tier 2 compilation goes beyond per-function code generation. It uses
per-function feedback to specialise call sites, inlines hot callees through
the CallPathPlan and HOF-inline passes, and (planned for v0.4) devirtualizes
indirect calls.
Inlining Policy
Section titled “Inlining Policy”Inlining is governed by a per-program CallPathPlan produced during the JIT
optimizer’s call-path analysis phase:
pub struct CallPathPlan { /// Call instruction indices that should prefer direct-call lowering. pub prefer_direct_call_sites: HashSet<usize>, /// Per-call-site parameter local slots that must be restored after a /// direct-call argument write into ctx.locals[0..argc). pub restore_param_slots_by_call_site: HashMap<usize, Vec<u16>>, /// Depth guard for nested inlining. pub inline_depth_limit: u8,}analyze_call_path (crates/shape-jit/src/optimizer/call_path.rs) walks
every Call instruction and decides per call site:
- A call site is added to
prefer_direct_call_siteswhen its argument count is ≤ 4 or when it sits inside a hot loop body (a loop the loop-lowering pass marked with an unroll factor greater than 1). inline_depth_limitdefaults to 4, capping how deep the inliner will recurse from any root call site. The pass bumps the limit to 6 when the whole program has ≤ 8 call instructions — small programs can afford a deeper inline budget.
There is no separate Tier 1 vs Tier 2 instruction budget — the JIT consults
the same CallPathPlan regardless of tier, and the depth guard is the only
hard ceiling. There is no stand-alone InlinePolicy type; the heuristics
above live entirely in the call_path analysis pass.
Tier 2 Cache Key
Section titled “Tier 2 Cache Key”Because Tier 2 compilation includes inlined callees, the cache key must account for the full compilation scope — not just the root function:
pub struct Tier2CacheKey { /// Hash of the root function blob. pub root_hash: [u8; 32], /// Sorted hashes of all inlined callee blobs. pub inlined_hashes: Vec<[u8; 32]>, /// Compiler version for invalidation. pub compiler_version: u32, /// Schema version at compilation time — bumped when object shapes /// change, staling any code that embedded shape guards. pub schema_version: u32, /// Feedback epoch at compilation time — bumped when a speculation /// assumption (e.g. a type guard) is invalidated. pub feedback_epoch: u32,}The combined_hash() method produces a single SHA-256 digest from these fields,
used as the cache lookup key. If the root function or any inlined callee
changes — or the schema version or feedback epoch advances — the combined hash
changes and the cached output is invalidated.
Constant Propagation
Section titled “Constant Propagation”When the Tier 2 compiler inlines a callee, arguments that are compile-time
constants at the call site (PushConst instructions) are propagated into the
inlined body. Parameter reads become the known constant value, which exposes
further optimization opportunities in the inlined region — dead branch
elimination, strength reduction, and constant folding. Cranelift’s own
constant-folding and dead-code passes then run over the merged IR.
This happens as part of the optimizing-compilation path
(compile_optimizing_function) and is keyed by Tier2CacheKey, so the same
root-plus-inlined-callees scope is compiled at most once.
Devirtualization (planned — v0.4)
Section titled “Devirtualization (planned — v0.4)”When the bytecode contains CallValue (an indirect call through a variable),
a future Tier 2 pass could resolve the target statically and rewrite the call.
This is not implemented in v0.3 — IC devirtualization is a v0.4 candidate
(§Q25.C.6 of the round-2 budget). The sketch below describes the intended
shape; no DevirtAnalysis or DevirtResult type exists in the source today:
DevirtAnalysis (planned, v0.4): - inspect a CallValue site and trace its target binding - Direct → target traces to a single known function; rewrite CallValue as a direct Call - Polymorphic → target traces to a small set of functions; emit an inline cache that checks common targets first - Unknown → target cannot be resolved; leave as indirect callUntil devirtualization lands, indirect calls through CallValue are lowered
as indirect dispatch and are not inlining candidates.
Deoptimization
Section titled “Deoptimization”Tier 2 optimizations are speculative — feedback-guided compilation embeds
shape guards so that inline-cached object-property accesses can run as
direct loads. If an object shape transitions at runtime (for example a
HashMap gains a property), any compiled code that guarded on the old shape
must be invalidated.
DeoptTracker (crates/shape-vm/src/deopt.rs) is the index that makes this
possible — it maps function IDs to the shape IDs they depend on, and keeps a
reverse index from shape ID back to the dependent functions:
pub struct DeoptTracker { /// function_id → set of ShapeIds it depends on dependencies: HashMap<u16, HashSet<ShapeId>>, /// shape_id → set of function_ids that depend on it shape_dependents: HashMap<ShapeId, HashSet<u16>>,}After a successful Tier 2 compilation, register(function_id, shape_ids)
records the shape_guards reported in the CompilationResult. When a shape
transition occurs, invalidate_shape(shape_id) is called:
- The
DeoptTrackerlooks up the shape ID inshape_dependents. - It returns the list of dependent function IDs and clears their dependency entries (including the reverse mappings for any other shapes they guarded).
- The caller removes those functions’ native code so the next call falls back to interpreted execution.
- The function’s tier state is reset to allow re-promotion once execution stabilizes on the new shape.
This guarantees correctness: optimized code that guarded on a shape is never executed after that shape transitions. The cost is a one-time recompilation if the function remains hot.
Performance Characteristics
Section titled “Performance Characteristics”| Tier | Throughput | Notes |
|---|---|---|
Tier 0 (Interpreted) | ~100ns/instruction (illustrative; awaiting v0.4 benchmark anchor) | Full bytecode interpretation with dispatch overhead |
Tier 1 (BaselineJit) | Near-native for numeric code | Function call overhead reduced; no cross-function optimization |
Tier 2 (OptimizingJit) | Native-class | Cross-function inlining eliminates call overhead; constant folding reduces work |
Content-addressed caching amplifies the benefit across programs: same blob hash produces the same native code, so a function is compiled at most once globally. Shared utility functions that appear in many programs are compiled on first encounter and reused from cache for every subsequent load.
Typical promotion timeline for a hot inner loop:
- First 100 calls: interpreted (Tier 0).
- Calls 100-10,000: Tier 1 native code (compiled in background, available within milliseconds of crossing the threshold).
- Beyond 10,000 calls: Tier 2 optimized code with inlined callees and propagated constants.
Fully Typed Native Values
Section titled “Fully Typed Native Values”The runtime is fully typed and zero-tag: every value has a compile-time-determined type, and there are no runtime type tags or tag-bit dispatch. The opcode encodes the type; the JIT generates code accordingly.
How values are represented
Section titled “How values are represented”Values are native machine types. Scalars are raw f64 in XMM registers,
raw i64/i32/i8/bool in GPR registers, and typed pointers to heap
objects. The opcode carries the type — there is no runtime type classification.
Arrays are typed contiguous buffers. Array<number> maps to
TypedArray<f64> — a contiguous f64 buffer with a refcounted header. Element
access is a single load instruction: movsd xmm0, [data + i*8]. No per-element
type checking.
Structs are C-compatible fixed layouts. type Point { x: number, y: number }
produces a #[repr(C)] layout with field offsets computed at compile time.
point.x compiles to load f64 [ptr + 8] — no schema lookup, no field name
resolution.
FFI uses typed signatures. The JIT-to-runtime FFI functions are
monomorphized per type rather than passing untyped words — for example
jit_v2_struct_get_f64(ptr: *const u8, offset: u32) -> f64
(crates/shape-jit/src/ffi/v2_struct.rs).
Heap objects share a unified header. All heap-allocated objects start with
an 8-byte HeapHeader containing an AtomicU32 refcount at offset 0 for
single-cycle access. Clone is atomic_add 1; drop is atomic_sub 1.
For the authoritative description of the typed runtime, see the runtime v2 spec.
Generics
Section titled “Generics”Generics are monomorphized. Array<number> and Array<i32> are different types
with different TypedArray instantiations, different opcodes, and different JIT
code paths. There is no type erasure or boxing at generic boundaries.
See Also
Section titled “See Also”- Content-Addressed Bytecode — the blob architecture that JIT compilation builds on
- Security Permissions — capability controls for JIT and native code execution