Skip to content

Native C Interop

This chapter is the single normative source for Shape native C interop.

If another chapter needs native interop details, it should link here instead of redefining syntax, marshalling, or lock behavior.

Use explicit language syntax, not annotation indirection:

extern C fn cos(x: number) -> number from "libm" as "cos";
extern C fn getenv(name: cstring) -> cstring? from "libc" as "getenv";
extern C fn hash_bytes(data: Vec<byte>) -> u64 from "libhash";
type C QuoteC {
bid: f64,
ask: f64,
}
extern C fn quote_mid(q: cview<QuoteC>) -> f64 from "libquote";
extern C fn quote_fill(q: cmut<QuoteC>, v: f64) -> void from "libquote";
  • extern C fn ... from "<alias-or-library>" [as "<symbol>"]; declares a native call.
  • type C defines ABI layout-compatible data types.
  • cview<T> and cmut<T> are pointer-backed zero-copy view carriers for type C.
  • Vec<T> in native signatures compiles to cslice<T> ({ data, len } descriptor by value).
  • Explicit slice annotations are CSlice<T> and CMutSlice<T> (ABI names: cslice<T>, cmut_slice<T>).
  • CMutSlice<T> parameters are mutable-reference parameters in Shape call semantics.

The extern ABI may be written quoted (extern "C" fn ...) or unquoted (extern C fn ...); both forms parse. The book uses the unquoted form consistently.

Native libraries are declared in [native-dependencies] in a project shape.toml or script frontmatter. Shape uses one shared resolver for compile time and runtime; the CLI, compiler, and VM do not resolve native aliases independently.

[native-dependencies]
libm = "libm.so.6"
duckdb = { provider = "system", version = "1.1.3", linux = "libduckdb.so", macos = "libduckdb.dylib", windows = "duckdb.dll" }
fastmath = { provider = "path", path = "./native/libfastmath.so" }
myrt = { provider = "vendored", cache_key = "myrt-2.0.1", targets = { "linux-x86_64" = "vendor/linux-x86_64/libmyrt.so", "linux-aarch64" = "vendor/linux-aarch64/libmyrt.so", "macos-aarch64" = "vendor/macos-aarch64/libmyrt.dylib" } }
openssl = { provider = "system", targets = { "linux-x86_64" = { value = "libssl.so.3" }, "macos-aarch64" = { value = "libssl.3.dylib" } } }
  • Shorthand string form means system unless the value looks like a filesystem path, then it means path.
  • Detailed tables may set provider = "system" | "path" | "vendored". If omitted, Shape infers path for path-like values and system otherwise.
  • targets keys use normalized host IDs os-arch[-env]. Current host IDs usually look like linux-x86_64, linux-aarch64, macos-aarch64, or windows-x86_64.
  • Target selection order is exact os-arch-env, then os-arch, then os. After that, legacy linux / macos / windows fields are still accepted as a compatibility fallback. The current OS field is preferred first, then path, and only then the remaining legacy OS fields.
  • system loads by soname or by an explicit path-like value.
  • path resolves relative to the declaring package root when not absolute.
  • vendored resolves relative to the declaring package root, then copies the selected library into Shape’s native cache before loading it.
  • .shapec bundles and shape publish currently embed only native dependency metadata, not the referenced .so / .dylib / .dll files themselves. path and vendored entries therefore require those native files to be distributed separately on disk. Registry-published packages do not yet carry native assets inside the uploaded bundle.
  • Resolution is transitive: the root project, dependency packages, and embedded .shapec bundle scopes all contribute [native-dependencies].
  • Native alias resolution is package-scoped. Two active packages may both declare the same alias name, and each extern C fn ... from "alias" resolves against the package that declared that foreign binding.

The current duckdb package uses provider = "system" with libduckdb.so / libduckdb.dylib / duckdb.dll, so the host must already provide DuckDB unless the package switches to path or vendored.

  • Project mode stores native artifacts in shape.lock.
  • Standalone scripts store native artifacts in <script>.lock.
  • In [build.external].mode = "update", Shape probes native libraries for the current host target and writes or refreshes matching lock artifacts. If multiple native prerequisites are broken, Shape reports all of them in one preflight error grouped by package.
  • In [build.external].mode = "frozen", Shape requires a matching artifact for the current target, provider, and fingerprint.
  • system entries that use loader names instead of paths should declare version; frozen mode errors if a system alias has no declared version.
  • One committed lockfile may contain multiple native artifacts for the same package@version::alias across different targets or fingerprints. Shape does not replay one foreign absolute path on every machine.
  • Standalone scripts currently resolve native dependencies in update mode and refresh <script>.lock when they run.
  • There is no separate native lockfile; native artifacts stay in the normal lock pipeline.

Many C APIs use out-parameters (T* out) — the caller allocates a slot and the function writes its result into it. Shape supports this directly with the out keyword on extern C fn parameters.

Mark pointer-typed parameters with out to let the compiler handle cell allocation, the C call, value readback, and cleanup automatically:

extern C fn duckdb_open(path: string, out out_db: ptr) -> i32 from "duckdb" as "duckdb_open";
extern C fn duckdb_connect(db: ptr, out out_conn: ptr) -> i32 from "duckdb" as "duckdb_connect";

Callers supply only the non-out arguments. The return value is an array containing the original return value followed by each out parameter’s value:

let [status, db] = duckdb_open("pricing_data.duckdb")
let [_, conn] = duckdb_connect(db)

Rules:

  • out parameters must have type ptr.
  • out cannot combine with const or &.
  • out parameters cannot have default values.
  • The generated stub allocates a pointer cell, passes its address to the C function, reads back the value, and frees the cell.

For cases that need finer control, use pointer cells directly:

from std::core::native use { ptr_new_cell, ptr_free_cell, ptr_read, ptr_write }
extern C fn duckdb_open(path: string, out_db: ptr) -> i32 from "duckdb" as "duckdb_open";
let cell = ptr_new_cell()
ptr_write(cell, 0)
duckdb_open("pricing_data.duckdb", cell)
let db = ptr_read(cell)
ptr_free_cell(cell)

The pointer cell is a pointer-sized memory slot allocated by the runtime.

Shape typeC ABI representationNotes
i8 / charint8_tchar aliases i8 at C boundary
u8 / byteuint8_tbyte is alias to u8
i16int16_tRange checked
u16uint16_tRange checked
i32int32_tRange checked
u32uint32_tRange checked
i64 / intint64_tint aliases i64
u64uint64_tRange checked
isizeintptr_tPointer-width signed integer
usizeuintptr_tPointer-width unsigned integer
ptrvoid*Opaque pointer carrier
f32floatPreserved width
f64 / numberdoublenumber aliases f64
bool_Bool / uint8_t ABI equivalentNormalized to boolean
cstringconst char*Null return is runtime error
cstring?const char* nullableMarshals to Option<string>
callback(fn(...)->R)Function pointerCall-scoped callback trampoline
Vec<T>cslice<T> ({ data: T*, len: usize })T must be scalar/pointer/cstring family
CSlice<T>cslice<T> ({ data: T*, len: usize })Explicit read-only slice ABI
CMutSlice<T>cmut_slice<T> ({ data: T*, len: usize })Explicit mutable slice ABI
cview<T>const T*T must be type C
cmut<T>T*Mutable view; write allowed
voidvoidMarshals to ()

Rules:

  • All narrowing conversions are explicit and range checked.
  • cstring rejects interior NUL on outbound conversion.
  • Use cstring? when null pointers are valid.
  • Width-aware scalars are preserved across VM/wire/native paths.
  • Vec<T>/cslice<T> marshalling is copy-in.
  • cmut_slice<T> marshalling is copy-in/copy-out with mandatory writeback into the referenced Shape variable after the call returns.

Shape -> C (arguments and type C field writes)

Section titled “Shape -> C (arguments and type C field writes)”
Target C typeAccepted Shape valuesImplicit coercions rejected
i8, i16, i32, i64, isizeexact integer-domain values (int, fitting native ints) and bool (false -> 0, true -> 1)floating-point values
u8, u16, u32, u64, usize, ptrexact non-negative integer-domain values and bool (0/1)negative integers, floating-point values
f32, f64number, f32, and inline language int (i64)native-width i64/u64/isize/usize/ptr to float without explicit cast
boolbool, or exact integer-domain value (0 => false, non-zero => true)floating-point values
cstringstring without interior NULNone, non-string values
cstring?None, Some(string), or bare stringnon-string/non-option values
cslice<T>Vec<T>non-array values, nested/object element types
cmut_slice<T>mutable reference to Vec<T>non-reference arguments, nested/object element types
cview<T>matching native view cview<T>object copies or mismatched layout names
cmut<T>matching native view cmut<T>read-only view when mutable required

Notes:

  • Narrowing integer conversions are range checked.
  • For lossy numeric changes, use explicit casts (as number, as i64, etc.).
  • type C field writes use the same coercion table as call arguments.
  • cmut_slice<T> supports writeback for all supported slice element types (i8/u8/i16/u16/i32/i64/u32/u64/isize/usize/f32/f64/bool/ptr/cstring/cstring?).
  • Name-based calls to extern C functions auto-insert the required reference for cmut_slice<T> params; dynamic call-value sites must pass a reference explicitly.

C -> Shape (return values and type C field reads)

Section titled “C -> Shape (return values and type C field reads)”
C ABI typeShape value
i8/u8/i16/u16/i32/u32width-aware native scalar
i64native i64 scalar (not auto-coerced to number)
u64native u64 scalar (not auto-coerced to number)
isize/usizenative pointer-width scalar
f32native f32 scalar
f64number
ptr / callback pointerptr
cstringstring; null pointer is runtime error
cstring?Option<string> (None on null)
cslice<T> / cmut_slice<T>Vec<T> (copied from native memory)
cview<T> / cmut<T>zero-copy native view wrapper; null pointer is runtime error
  • Integer-domain operations stay in integer domain when both operands are integer-domain values.
  • Mixed integer/float operations are allowed only when integer values are losslessly representable as f64 (|value| <= 2^53).
  • Otherwise execution requires an explicit cast and fails with a runtime type/coercion error if not cast.
  • Wire payloads preserve width-aware integer variants (i64, u64, isize, usize, ptr) as typed values.
  • These variants are intentionally not auto-coerced by generic as_number helpers.

type C uses C ABI layout semantics:

  • deterministic field order
  • computed size, align, and per-field offset
  • pointer-based field access via cview<T>/cmut<T> without object materialization

Example:

type C StatC {
size: i64,
mode: u32,
}
extern C fn stat(path: cstring, buf: cmut<StatC>) -> i32 from "libc";

type C is the production path for zero-copy native struct interop.

Core builtins provide Arrow C import:

from std::core::native use { table_from_arrow_c_typed }
let result: Result<Table<MyRow>, AnyError> =
table_from_arrow_c_typed(schema_ptr, array_ptr, "MyRow")
  • schema_ptr/array_ptr must point to Arrow C Data Interface ArrowSchema/ArrowArray.
  • type_name must match a registered Shape row type.
  • Schema mismatches return Result::Err (strict contract).

The builtin is defined in crates/shape-runtime/stdlib-src/core/native.shape (public wrapper table_from_arrow_c_typed) backed by the intrinsic __native_table_from_arrow_c_typed in crates/shape-runtime/stdlib-src/core/intrinsics.shape.

A full package-level DuckDB proof-of-concept is included at: shape/examples/packages/duckdb-native.

The compiler auto-registers conversion pairs for compatible object/layout names:

  • type C FooC <-> type Foo
  • type C CFoo <-> type Foo
  • type C FooLayout <-> type Foo

For compatible fields/types, conversion traits are generated in both directions (From/Into + TryInto wrappers as needed). If a Shape type Foo matches more than one type C companion by name (e.g. both FooC and FooLayout exist), the compiler reports a hard error and requires the project to pick one canonical companion name.

The pairing logic lives in crates/shape-vm/src/compiler/statements.rs::maybe_generate_native_type_conversions (invoked from register_native_struct_layout), with the name candidates in object_type_name_for_native_layout and native_layout_name_candidates_for_object in the same file.

Callbacks are declared inline:

extern C fn qsort_i32(
base: ptr,
count: usize,
elem_size: usize,
cmp: (a: ptr, b: ptr) => i32
) -> void from "libc" as "qsort";
  • Passing a Shape callable to callback(...) creates a call-scoped native trampoline.
  • Callback argument/return types follow the same marshalling matrix.
  • cstring/cstring?/cslice<_>/cmut_slice<_> callback return types are currently disallowed.

As of February 25, 2026:

  • CallForeign is lowered in JIT (jit_call_foreign) and no longer hard-falls back to VM dispatch.
  • Foreign functions are linked once per execution into a JIT foreign bridge state.
  • Native extern C entries run through the shared native ABI invoker from JIT, including callback trampolines.
  • Dynamic-language foreign entries (non-native ABI) still marshal through the shared runtime marshal/unmarshal path.
  • Signature-specialized direct native lowering (eliminating the generic bridge call per known C signature) remains an optimization milestone.

This chapter remains the source of truth as JIT native lowering lands.

Native dependency resolution writes lock artifacts with:

  • package identity (package_name, package_version, package_key)
  • alias
  • host
  • provider
  • load target
  • fingerprint
  • optional declared version/cache key

Artifact keys are namespaced as <package>@<version>::<alias> to keep transitive package native dependencies collision-safe in one lockfile.

In build.external.mode = "frozen":

  • unresolved/failed native probes are rejected
  • missing native lock artifacts are rejected
  • system aliases without declared versions are rejected
  • Linux: typically .so
  • macOS: typically .dylib
  • Windows: typically .dll

Always declare explicit per-platform entries when library names differ.