Python Quickstart
A complete async example using the Yoda Python bindings (yoda). Compatible with both asyncio and anyio.
Install
pip install yoda
# or with uv:
uv pip install yodaBuild from source (requires a Rust toolchain and maturin):
uv run maturin develop # debug build
uv run maturin develop --release # optimised buildEnd-to-End Example
The snippet below is a complete, runnable program. It creates an in-memory engine, registers a users table, writes rows via OLTP, syncs CDC events to OLAP, and then runs an aggregate query returning PyArrow batches.
import asyncio
import pyarrow as pa
import yoda
async def main():
# 1. Configure the engine.
# olap_backend="datafusion" — default, pure-Rust, no C deps.
# storage_mode="inmemory" — no files written; ideal for demos/tests.
config = yoda.HtapConfig(
oltp_path="quickstart.db",
olap_backend="datafusion",
storage_mode="inmemory",
sync_mode="destructive",
)
# 2. Create the engine (synchronous constructor — Tokio runtime starts here).
engine = yoda.HtapEngine(config)
# 3. Register a table schema.
# columns is a list of (name, type_string) tuples.
# Supported scalar types: int8/16/32/64, uint8/16/32/64,
# float32/64, utf8/string/text, bool/boolean, binary/bytes.
schema = yoda.TableSchema(
name="users",
columns=[
("id", "int64"),
("name", "utf8"),
("age", "int32"),
],
pk=["id"],
)
await engine.register_table(schema)
# 4. Write rows — routed to SQLite OLTP.
await engine.execute("INSERT INTO users VALUES (1, 'Alice', 30)")
await engine.execute("INSERT INTO users VALUES (2, 'Bob', 25)")
await engine.execute("INSERT INTO users VALUES (3, 'Carol', 35)")
# 5. Flush CDC events from _yoda_cdc_log to the OLAP mirror.
result = await engine.sync_now()
print(f"Synced {result.events_processed} events "
f"({result.rows_inserted} inserted)")
# 6. Analytical query — GROUP BY routes automatically to OLAP.
batches = await engine.query("SELECT COUNT(*) AS n FROM users")
# batches is a list of pyarrow.RecordBatch objects.
table = pa.Table.from_batches(batches)
print(table.to_pandas())
# 7. Force a query to OLAP explicitly (skip the router).
batches2 = await engine.query_olap(
"SELECT AVG(age) AS avg_age FROM users"
)
print(pa.Table.from_batches(batches2).to_pandas())
asyncio.run(main())anyio compatibility
The bindings work transparently with anyio — just replace asyncio.run(main()) with anyio.run(main). The Tokio runtime is managed internally; no configuration is needed on the Python side.
Batch Writes
For loading many rows, execute_batch wraps all statements in a single SQLite transaction — substantially faster than individual execute calls because the per-commit fsync is amortized over the whole batch (see execute_batch in the Python API guide):
statements = [
f"INSERT INTO users VALUES ({i}, 'User{i}', {20 + i % 30})"
for i in range(4, 1004)
]
await engine.execute_batch(statements)Durable Parquet Storage
To persist OLAP data across restarts, switch the storage mode:
config = yoda.HtapConfig(
oltp_path="app.db",
olap_backend="datafusion",
storage_mode="parquet",
storage_path="/var/lib/myapp/olap",
sync_mode="destructive",
)Accepted storage_mode values: "inmemory", "arrow_ipc", "parquet". The latter two require storage_path.
Schema Evolution
Add or drop columns on a live engine without recreating it:
await engine.add_column("users", "email", "utf8")
await engine.drop_column("users", "age")Next Steps
- Architecture — CDC pipeline and engine internals
- Configuration Reference — all
HtapConfigfields - Sidecar Mode — follow an existing Postgres/SQLite DB
- Python API Reference — full method signatures and types