Polyglot Functions
Shape supports inline foreign language code through polyglot function blocks. Write Python, Julia, SQL, or any language directly inside Shape files and call the result like a regular function.
Syntax
Section titled “Syntax”A polyglot function uses fn <language> name(params) -> ReturnType { body }:
fn python std_dev(values: Vec<number>) -> Result<number> { import math mean = sum(values) / len(values) variance = sum((x - mean) ** 2 for x in values) / len(values) return math.sqrt(variance)}
// Call it like any Shape functionlet sigma = std_dev([4.0, 7.0, 13.0, 2.0, 1.0])?The body is written in the foreign language. Shape handles all data marshaling automatically. Every fn python (and any other dynamic-error-model language runtime) must declare its return type as Result<T> — the compiler rejects bare return types because the foreign runtime can throw on every call. See Python Extension :: Return Type Annotation for the error model details.
How It Works
Section titled “How It Works”- The parser captures the foreign body as raw text (braces, strings, and nested blocks are tracked for correct boundary detection).
- The body is dedented automatically, stripping common leading whitespace so Python’s indentation sensitivity works correctly even though the block is nested inside Shape code.
- At compile time, the language runtime extension pre-compiles the body and surfaces syntax errors early.
- At call time, Shape marshals arguments to native objects, invokes the compiled function, and marshals the return value back.
Type Marshaling
Section titled “Type Marshaling”Shape values are converted to native types at the language boundary and back on return.
Shape to Python
Section titled “Shape to Python”| Shape Type | Python Type | Strategy |
|---|---|---|
number | float | Direct copy |
int | int | Direct copy |
bool | bool | Direct copy |
string | str | UTF-8 via MessagePack |
none | None | Sentinel |
Vec<T> | list[T] | Recursive element marshaling |
DataTable | pd.DataFrame | Planned: Arrow IPC zero-copy bridge (not yet implemented; see DataTable Bridge) |
Option<T> | Optional[T] | None or marshaled value |
| Struct types | TypedDict | Schema-driven field mapping (runtime path validates dicts against the declared object type; .pyi stub generation emits TypedDict declarations for editor support) |
HashMap<K,V> | dict[K,V] | Recursive key/value marshaling |
Python to Shape
Section titled “Python to Shape”| Python Type | Shape Type | Strategy |
|---|---|---|
float | number | Direct copy |
int | int (i64) or number | Range check (Python int outside i64 range → number) |
str | string | UTF-8 via MessagePack |
bool | bool | Direct copy |
None | none | Sentinel |
list | Vec | Recursive, element type from return annotation |
dict | Struct type | Schema-guided: validates keys, constructs TypedObject |
pd.DataFrame | DataTable | Planned: Arrow IPC zero-copy bridge (not yet implemented; see DataTable Bridge) |
np.ndarray (1D) | Vec<number> | Marshalled via .tolist() — numpy arrays are converted to Python lists first, then to Vec<number> (not zero-copy) |
Closures and function objects cannot cross the language boundary (compile-time error).
Parameters and Return Types
Section titled “Parameters and Return Types”Type annotations on parameters and return values serve double duty: Shape uses them for its own type checking, and the language runtime uses them to generate typed stubs for the child language server.
type Measurement { timestamp: string, value: number, sensor_id: string,}
fn python outlier_ratio(readings: Vec<Measurement>, z_threshold: number) -> Result<number> { values = [r['value'] for r in readings] mean = sum(values) / len(values) std = (sum((v - mean) ** 2 for v in values) / len(values)) ** 0.5 outliers = [v for v in values if abs(v - mean) > z_threshold * std] return len(outliers) / len(values)}Shape struct types like Measurement are exported as Python TypedDict classes, so r['value'] autocompletes in your editor.
Returning Structured Data
Section titled “Returning Structured Data”A Python function can return a list of dicts to populate a Table<T>:
type Element { symbol: string, atomic_number: int, atomic_mass: number,}
fn python periodic_subset(min_z: int, max_z: int) -> Result<Vec<Element>> { elements = [ {"symbol": "H", "atomic_number": 1, "atomic_mass": 1.008}, {"symbol": "He", "atomic_number": 2, "atomic_mass": 4.003}, {"symbol": "Li", "atomic_number": 3, "atomic_mass": 6.941}, {"symbol": "Be", "atomic_number": 4, "atomic_mass": 9.012}, {"symbol": "B", "atomic_number": 5, "atomic_mass": 10.81}, {"symbol": "C", "atomic_number": 6, "atomic_mass": 12.01}, {"symbol": "N", "atomic_number": 7, "atomic_mass": 14.01}, {"symbol": "O", "atomic_number": 8, "atomic_mass": 16.00}, ] return [e for e in elements if min_z <= e["atomic_number"] <= max_z]}
let light_elements = periodic_subset(1, 4)?Each dict in the returned list is validated against the Element schema, and the result can be used as a typed collection in Shape.
Async Foreign Functions
Section titled “Async Foreign Functions”Mark a polyglot function async and it runs on the async executor:
async fn python fetch_json(url: string) -> Result<Vec<number>> { import aiohttp async with aiohttp.ClientSession() as session: async with session.get(url) as response: data = await response.json() return data['values']}
let values = await fetch_json("https://api.example.com/data")?Annotations
Section titled “Annotations”Polyglot functions support the same annotations as regular Shape functions:
fn python percentile(values: Vec<number>, pct: number) -> Result<number> { sorted_v = sorted(values) k = (len(sorted_v) - 1) * (pct / 100.0) f = int(k) c = f + 1 if c >= len(sorted_v): return sorted_v[-1] return sorted_v[f] + (k - f) * (sorted_v[c] - sorted_v[f])}Nested Braces and Strings
Section titled “Nested Braces and Strings”The parser correctly handles nested braces, string literals, and escaped characters inside foreign bodies:
fn python process(data: Vec<number>) -> Result<number> { config = {"threshold": 1.5, "method": "zscore"} message = "processing {len(data)} items" nested = {"inner": {"deep": True}} return sum(data) / len(data)}Visibility
Section titled “Visibility”Polyglot functions can be exported from modules:
pub fn python normalize(values: Vec<number>) -> Result<Vec<number>> { lo = min(values) hi = max(values) span = hi - lo if span == 0: return [0.0] * len(values) return [(v - lo) / span for v in values]}Using NumPy
Section titled “Using NumPy”Python packages like NumPy work seamlessly inside polyglot functions. Vec<number> is marshaled to a Python list, and np.array(...) constructs an ndarray from that list (an O(n) copy — there is no zero-copy buffer path today; see DataTable Bridge Status for the Arrow IPC bridge that is planned but not yet implemented):
fn python correlation(xs: Vec<number>, ys: Vec<number>) -> Result<number> { import numpy as np return float(np.corrcoef(xs, ys)[0, 1])}
fn python moving_average(values: Vec<number>, window: int) -> Result<Vec<number>> { import numpy as np arr = np.array(values) kernel = np.ones(window) / window smoothed = np.convolve(arr, kernel, mode='valid') return smoothed.tolist()}
let r = correlation([1, 2, 3, 4, 5], [2, 4, 5, 4, 5])?let smoothed = moving_average([10, 20, 30, 40, 50, 60, 70], 3)?Any package installed in the active Python environment is available.
Language Extension Setup
Section titled “Language Extension Setup”Language runtimes are provided by extensions. Two first-party extensions ship today:
- Python Extension —
fn python ...via PyO3 - TypeScript Extension —
fn typescript ...via V8 (deno_core)
Each extension chapter covers install (shape ext install <name>), runtime
loading (CLI shorthand, frontmatter, shape.toml), error model details, and
LSP delegation.
Error Handling
Section titled “Error Handling”Python exceptions are caught and mapped back to Shape source locations. The traceback line numbers inside __shape_fn__ are offset by the body_span start line, so errors point to the correct line in your Shape file.
Error: Python runtime error in 'process' --> src/pipeline.shape:14:8 |14 | result = compute(invalid_input) | ^^^^^^^^^^^^^^^^^^^^^^ = TypeError: unsupported operand type(s) for +: 'NoneType' and 'int'LSP Support
Section titled “LSP Support”The Shape language server delegates to child language servers for foreign blocks:
- Completions: Type
pd.inside a Python block and get pandas autocomplete - Diagnostics: Python syntax errors appear at the correct Shape file position
- Hover: Hover over
pd.DataFrameand see the Python docstring - Type stubs: Shape struct types are exported as
.pyistubs so pyright understands your parameter types
The LSP creates a virtual Python document for each foreign block, wrapping the body in a typed def with proper imports. This gives the child language server full type context without any extra configuration.
For explicit LSP configuration keys (alwaysLoadExtensions / always_load_extensions) and Neovim examples, see Python Extension.
Extension-Driven Architecture
Section titled “Extension-Driven Architecture”Nothing about Python (or any language) is hardcoded in Shape’s core. The system is entirely extension-driven:
- Parser: The
foreign_language_idgrammar rule accepts any identifier.fn python,fn julia,fn sqlall parse identically. - Extension loading: When a language runtime extension is loaded, it calls
language_id()to self-declare its language string. The runtime registers this in a lookup table. - Compiler resolution: When the compiler encounters
fn <language> ..., it looks up the language in the registered runtimes. If not found, it produces a clear error: “No language runtime registered for ‘julia’. Install the julia extension.” - LSP delegation: The extension provides
get_lsp_config()declaring its child language server command. The Shape LSP discovers this dynamically.
A third party can publish a Julia extension that registers language_id() = "julia" and provides lsp_command = ["julia", "[email protected]", "-e", "..."], and fn julia ... works with full LSP support without any Shape core changes.
Capability Contract
Section titled “Capability Contract”Language runtime extensions implement the shape.language_runtime capability:
| VTable Function | Purpose |
|---|---|
init | Initialize the language interpreter |
register_types | Receive Shape type schemas for stub generation |
compile | Pre-compile a function body, returning a handle |
invoke | Call a compiled function with marshaled arguments |
dispose_function | Release a compiled function handle |
language_id | Return the language identifier string |
get_lsp_config | Return child LSP server configuration |
free_buffer | Free a buffer allocated by compile/invoke/get_lsp_config/get_shape_source |
drop | Tear down the runtime instance |
error_model | Whether the runtime is Dynamic (every call can fail — return types are automatically wrapped in Result<T>) or Static (compile-time type safety; runtime errors not expected). Defaults to Dynamic when zero-initialized. |
get_shape_source | Returns a bundled .shape module source registered under the extension’s own namespace (e.g. python, typescript) — not under std::*. This is how import { eval } from typescript becomes importable. Optional; set to None if the extension has no bundled namespace. |
Lifecycle: init → register_types → compile (per function) → invoke (per call) → dispose_function → drop.
error_model and get_shape_source are read by the host during extension registration; they are not part of the per-call dispatch path.
Content-Addressed Identity
Section titled “Content-Addressed Identity”Polyglot functions participate in Shape’s content-addressed bytecode system.
Each foreign function is assigned a content hash based on its language, body
text, parameter types, and return type. When a Shape function calls a polyglot
function, the foreign function’s hash is recorded in the caller’s
foreign_dependencies field and included in the caller’s blob hash computation.
This means two Shape functions that are otherwise identical but call different foreign implementations will produce different content hashes — preserving correct identity across the language boundary.
fn process(data: Vec<number>) -> number { // calls: fn python std_dev(...) → foreign hash a1b2... // calls: fn python normalize(...) → foreign hash c3d4... // process.foreign_dependencies = [a1b2..., c3d4...] // process.content_hash = SHA-256(bytecode + deps + foreign_deps + ...)}See Content-Addressed Bytecode for the full hashing architecture.
DataTable Bridge Status
Section titled “DataTable Bridge Status”DataTable ↔ pd.DataFrame marshalling is planned but not yet implemented.
The intent is to use Arrow IPC for zero-copy bridging between Shape’s columnar
DataTable and pandas DataFrames (or pyarrow RecordBatches). Both directions
currently return an error at the bridge boundary — see
extensions/python/src/arrow_bridge.rs (datatable_to_python_ipc and
python_ipc_to_datatable are stubs).
For tabular interchange today, transfer the data as Vec<T> of struct rows;
that path goes through the standard list/dict marshalling and works end-to-end.
- Polyglot functions compile to the same callable thunks as regular functions. The JIT does not need changes.
- The
compilestep runs at Shape compile time, catching syntax errors before any code executes. - When Shape types change (file save), the LSP re-collects types and regenerates stubs for the child language server.
- Closures and function objects cannot cross the language boundary. Use concrete types at the interface.