Skip to content

Python Quickstart

A complete async example using the Yoda Python bindings (yoda). Compatible with both asyncio and anyio.

Install

sh
pip install yoda
# or with uv:
uv pip install yoda

Build from source (requires a Rust toolchain and maturin):

sh
uv run maturin develop           # debug build
uv run maturin develop --release # optimised build

End-to-End Example

The snippet below is a complete, runnable program. It creates an in-memory engine, registers a users table, writes rows via OLTP, syncs CDC events to OLAP, and then runs an aggregate query returning PyArrow batches.

python
import asyncio
import pyarrow as pa
import yoda

async def main():
    # 1. Configure the engine.
    #    olap_backend="datafusion"  — default, pure-Rust, no C deps.
    #    storage_mode="inmemory"    — no files written; ideal for demos/tests.
    config = yoda.HtapConfig(
        oltp_path="quickstart.db",
        olap_backend="datafusion",
        storage_mode="inmemory",
        sync_mode="destructive",
    )

    # 2. Create the engine (synchronous constructor — Tokio runtime starts here).
    engine = yoda.HtapEngine(config)

    # 3. Register a table schema.
    #    columns is a list of (name, type_string) tuples.
    #    Supported scalar types: int8/16/32/64, uint8/16/32/64,
    #    float32/64, utf8/string/text, bool/boolean, binary/bytes.
    schema = yoda.TableSchema(
        name="users",
        columns=[
            ("id",   "int64"),
            ("name", "utf8"),
            ("age",  "int32"),
        ],
        pk=["id"],
    )
    await engine.register_table(schema)

    # 4. Write rows — routed to SQLite OLTP.
    await engine.execute("INSERT INTO users VALUES (1, 'Alice', 30)")
    await engine.execute("INSERT INTO users VALUES (2, 'Bob', 25)")
    await engine.execute("INSERT INTO users VALUES (3, 'Carol', 35)")

    # 5. Flush CDC events from _yoda_cdc_log to the OLAP mirror.
    result = await engine.sync_now()
    print(f"Synced {result.events_processed} events "
          f"({result.rows_inserted} inserted)")

    # 6. Analytical query — GROUP BY routes automatically to OLAP.
    batches = await engine.query("SELECT COUNT(*) AS n FROM users")
    # batches is a list of pyarrow.RecordBatch objects.
    table = pa.Table.from_batches(batches)
    print(table.to_pandas())

    # 7. Force a query to OLAP explicitly (skip the router).
    batches2 = await engine.query_olap(
        "SELECT AVG(age) AS avg_age FROM users"
    )
    print(pa.Table.from_batches(batches2).to_pandas())

asyncio.run(main())

anyio compatibility

The bindings work transparently with anyio — just replace asyncio.run(main()) with anyio.run(main). The Tokio runtime is managed internally; no configuration is needed on the Python side.

Batch Writes

For loading many rows, execute_batch wraps all statements in a single SQLite transaction — substantially faster than individual execute calls because the per-commit fsync is amortized over the whole batch (see execute_batch in the Python API guide):

python
statements = [
    f"INSERT INTO users VALUES ({i}, 'User{i}', {20 + i % 30})"
    for i in range(4, 1004)
]
await engine.execute_batch(statements)

Durable Parquet Storage

To persist OLAP data across restarts, switch the storage mode:

python
config = yoda.HtapConfig(
    oltp_path="app.db",
    olap_backend="datafusion",
    storage_mode="parquet",
    storage_path="/var/lib/myapp/olap",
    sync_mode="destructive",
)

Accepted storage_mode values: "inmemory", "arrow_ipc", "parquet". The latter two require storage_path.

Schema Evolution

Add or drop columns on a live engine without recreating it:

python
await engine.add_column("users", "email", "utf8")
await engine.drop_column("users", "age")

Next Steps

Released under the Apache-2.0 License.