Skip to content

Polyglot Functions

Shape supports inline foreign language code through polyglot function blocks. Write Python, Julia, SQL, or any language directly inside Shape files and call the result like a regular function.

A polyglot function uses fn <language> name(params) -> ReturnType { body }:

fn python std_dev(values: Vec<number>) -> Result<number> {
import math
mean = sum(values) / len(values)
variance = sum((x - mean) ** 2 for x in values) / len(values)
return math.sqrt(variance)
}
// Call it like any Shape function
let sigma = std_dev([4.0, 7.0, 13.0, 2.0, 1.0])?

The body is written in the foreign language. Shape handles all data marshaling automatically. Every fn python (and any other dynamic-error-model language runtime) must declare its return type as Result<T> — the compiler rejects bare return types because the foreign runtime can throw on every call. See Python Extension :: Return Type Annotation for the error model details.

  1. The parser captures the foreign body as raw text (braces, strings, and nested blocks are tracked for correct boundary detection).
  2. The body is dedented automatically, stripping common leading whitespace so Python’s indentation sensitivity works correctly even though the block is nested inside Shape code.
  3. At compile time, the language runtime extension pre-compiles the body and surfaces syntax errors early.
  4. At call time, Shape marshals arguments to native objects, invokes the compiled function, and marshals the return value back.

Shape values are converted to native types at the language boundary and back on return.

Shape TypePython TypeStrategy
numberfloatDirect copy
intintDirect copy
boolboolDirect copy
stringstrUTF-8 via MessagePack
noneNoneSentinel
Vec<T>list[T]Recursive element marshaling
DataTablepd.DataFramePlanned: Arrow IPC zero-copy bridge (not yet implemented; see DataTable Bridge)
Option<T>Optional[T]None or marshaled value
Struct typesTypedDictSchema-driven field mapping (runtime path validates dicts against the declared object type; .pyi stub generation emits TypedDict declarations for editor support)
HashMap<K,V>dict[K,V]Recursive key/value marshaling
Python TypeShape TypeStrategy
floatnumberDirect copy
intint (i64) or numberRange check (Python int outside i64 range → number)
strstringUTF-8 via MessagePack
boolboolDirect copy
NonenoneSentinel
listVecRecursive, element type from return annotation
dictStruct typeSchema-guided: validates keys, constructs TypedObject
pd.DataFrameDataTablePlanned: Arrow IPC zero-copy bridge (not yet implemented; see DataTable Bridge)
np.ndarray (1D)Vec<number>Marshalled via .tolist() — numpy arrays are converted to Python lists first, then to Vec<number> (not zero-copy)

Closures and function objects cannot cross the language boundary (compile-time error).

Type annotations on parameters and return values serve double duty: Shape uses them for its own type checking, and the language runtime uses them to generate typed stubs for the child language server.

type Measurement {
timestamp: string,
value: number,
sensor_id: string,
}
fn python outlier_ratio(readings: Vec<Measurement>, z_threshold: number) -> Result<number> {
values = [r['value'] for r in readings]
mean = sum(values) / len(values)
std = (sum((v - mean) ** 2 for v in values) / len(values)) ** 0.5
outliers = [v for v in values if abs(v - mean) > z_threshold * std]
return len(outliers) / len(values)
}

Shape struct types like Measurement are exported as Python TypedDict classes, so r['value'] autocompletes in your editor.

A Python function can return a list of dicts to populate a Table<T>:

type Element {
symbol: string,
atomic_number: int,
atomic_mass: number,
}
fn python periodic_subset(min_z: int, max_z: int) -> Result<Vec<Element>> {
elements = [
{"symbol": "H", "atomic_number": 1, "atomic_mass": 1.008},
{"symbol": "He", "atomic_number": 2, "atomic_mass": 4.003},
{"symbol": "Li", "atomic_number": 3, "atomic_mass": 6.941},
{"symbol": "Be", "atomic_number": 4, "atomic_mass": 9.012},
{"symbol": "B", "atomic_number": 5, "atomic_mass": 10.81},
{"symbol": "C", "atomic_number": 6, "atomic_mass": 12.01},
{"symbol": "N", "atomic_number": 7, "atomic_mass": 14.01},
{"symbol": "O", "atomic_number": 8, "atomic_mass": 16.00},
]
return [e for e in elements if min_z <= e["atomic_number"] <= max_z]
}
let light_elements = periodic_subset(1, 4)?

Each dict in the returned list is validated against the Element schema, and the result can be used as a typed collection in Shape.

Mark a polyglot function async and it runs on the async executor:

async fn python fetch_json(url: string) -> Result<Vec<number>> {
import aiohttp
async with aiohttp.ClientSession() as session:
async with session.get(url) as response:
data = await response.json()
return data['values']
}
let values = await fetch_json("https://api.example.com/data")?

Polyglot functions support the same annotations as regular Shape functions:

fn python percentile(values: Vec<number>, pct: number) -> Result<number> {
sorted_v = sorted(values)
k = (len(sorted_v) - 1) * (pct / 100.0)
f = int(k)
c = f + 1
if c >= len(sorted_v):
return sorted_v[-1]
return sorted_v[f] + (k - f) * (sorted_v[c] - sorted_v[f])
}

The parser correctly handles nested braces, string literals, and escaped characters inside foreign bodies:

fn python process(data: Vec<number>) -> Result<number> {
config = {"threshold": 1.5, "method": "zscore"}
message = "processing {len(data)} items"
nested = {"inner": {"deep": True}}
return sum(data) / len(data)
}

Polyglot functions can be exported from modules:

pub fn python normalize(values: Vec<number>) -> Result<Vec<number>> {
lo = min(values)
hi = max(values)
span = hi - lo
if span == 0:
return [0.0] * len(values)
return [(v - lo) / span for v in values]
}

Python packages like NumPy work seamlessly inside polyglot functions. Vec<number> is marshaled to a Python list, and np.array(...) constructs an ndarray from that list (an O(n) copy — there is no zero-copy buffer path today; see DataTable Bridge Status for the Arrow IPC bridge that is planned but not yet implemented):

fn python correlation(xs: Vec<number>, ys: Vec<number>) -> Result<number> {
import numpy as np
return float(np.corrcoef(xs, ys)[0, 1])
}
fn python moving_average(values: Vec<number>, window: int) -> Result<Vec<number>> {
import numpy as np
arr = np.array(values)
kernel = np.ones(window) / window
smoothed = np.convolve(arr, kernel, mode='valid')
return smoothed.tolist()
}
let r = correlation([1, 2, 3, 4, 5], [2, 4, 5, 4, 5])?
let smoothed = moving_average([10, 20, 30, 40, 50, 60, 70], 3)?

Any package installed in the active Python environment is available.

Language runtimes are provided by extensions. Two first-party extensions ship today:

Each extension chapter covers install (shape ext install <name>), runtime loading (CLI shorthand, frontmatter, shape.toml), error model details, and LSP delegation.

Python exceptions are caught and mapped back to Shape source locations. The traceback line numbers inside __shape_fn__ are offset by the body_span start line, so errors point to the correct line in your Shape file.

Error: Python runtime error in 'process'
--> src/pipeline.shape:14:8
|
14 | result = compute(invalid_input)
| ^^^^^^^^^^^^^^^^^^^^^^
= TypeError: unsupported operand type(s) for +: 'NoneType' and 'int'

The Shape language server delegates to child language servers for foreign blocks:

  • Completions: Type pd. inside a Python block and get pandas autocomplete
  • Diagnostics: Python syntax errors appear at the correct Shape file position
  • Hover: Hover over pd.DataFrame and see the Python docstring
  • Type stubs: Shape struct types are exported as .pyi stubs so pyright understands your parameter types

The LSP creates a virtual Python document for each foreign block, wrapping the body in a typed def with proper imports. This gives the child language server full type context without any extra configuration.

For explicit LSP configuration keys (alwaysLoadExtensions / always_load_extensions) and Neovim examples, see Python Extension.

Nothing about Python (or any language) is hardcoded in Shape’s core. The system is entirely extension-driven:

  1. Parser: The foreign_language_id grammar rule accepts any identifier. fn python, fn julia, fn sql all parse identically.
  2. Extension loading: When a language runtime extension is loaded, it calls language_id() to self-declare its language string. The runtime registers this in a lookup table.
  3. Compiler resolution: When the compiler encounters fn <language> ..., it looks up the language in the registered runtimes. If not found, it produces a clear error: “No language runtime registered for ‘julia’. Install the julia extension.”
  4. LSP delegation: The extension provides get_lsp_config() declaring its child language server command. The Shape LSP discovers this dynamically.

A third party can publish a Julia extension that registers language_id() = "julia" and provides lsp_command = ["julia", "[email protected]", "-e", "..."], and fn julia ... works with full LSP support without any Shape core changes.

Language runtime extensions implement the shape.language_runtime capability:

VTable FunctionPurpose
initInitialize the language interpreter
register_typesReceive Shape type schemas for stub generation
compilePre-compile a function body, returning a handle
invokeCall a compiled function with marshaled arguments
dispose_functionRelease a compiled function handle
language_idReturn the language identifier string
get_lsp_configReturn child LSP server configuration
free_bufferFree a buffer allocated by compile/invoke/get_lsp_config/get_shape_source
dropTear down the runtime instance
error_modelWhether the runtime is Dynamic (every call can fail — return types are automatically wrapped in Result<T>) or Static (compile-time type safety; runtime errors not expected). Defaults to Dynamic when zero-initialized.
get_shape_sourceReturns a bundled .shape module source registered under the extension’s own namespace (e.g. python, typescript) — not under std::*. This is how import { eval } from typescript becomes importable. Optional; set to None if the extension has no bundled namespace.

Lifecycle: initregister_typescompile (per function) → invoke (per call) → dispose_functiondrop.

error_model and get_shape_source are read by the host during extension registration; they are not part of the per-call dispatch path.

Polyglot functions participate in Shape’s content-addressed bytecode system. Each foreign function is assigned a content hash based on its language, body text, parameter types, and return type. When a Shape function calls a polyglot function, the foreign function’s hash is recorded in the caller’s foreign_dependencies field and included in the caller’s blob hash computation.

This means two Shape functions that are otherwise identical but call different foreign implementations will produce different content hashes — preserving correct identity across the language boundary.

fn process(data: Vec<number>) -> number {
// calls: fn python std_dev(...) → foreign hash a1b2...
// calls: fn python normalize(...) → foreign hash c3d4...
// process.foreign_dependencies = [a1b2..., c3d4...]
// process.content_hash = SHA-256(bytecode + deps + foreign_deps + ...)
}

See Content-Addressed Bytecode for the full hashing architecture.

DataTablepd.DataFrame marshalling is planned but not yet implemented. The intent is to use Arrow IPC for zero-copy bridging between Shape’s columnar DataTable and pandas DataFrames (or pyarrow RecordBatches). Both directions currently return an error at the bridge boundary — see extensions/python/src/arrow_bridge.rs (datatable_to_python_ipc and python_ipc_to_datatable are stubs).

For tabular interchange today, transfer the data as Vec<T> of struct rows; that path goes through the standard list/dict marshalling and works end-to-end.

  • Polyglot functions compile to the same callable thunks as regular functions. The JIT does not need changes.
  • The compile step runs at Shape compile time, catching syntax errors before any code executes.
  • When Shape types change (file save), the LSP re-collects types and regenerates stubs for the child language server.
  • Closures and function objects cannot cross the language boundary. Use concrete types at the interface.