TOML Configuration Reference
Full schema reference for the TOML file consumed by yd serve --config <path>. The file has three top-level sections: [engine], [[tables]], and the optional [sidecar].
[engine]
Core engine settings. All keys are optional except where noted.
oltp_path
| Type | Default |
|---|---|
| string | "yoda.db" |
Filesystem path to the SQLite database used for OLTP writes. Created if it does not exist. WAL mode and PRAGMA synchronous=NORMAL are applied automatically.
[engine]
oltp_path = "/var/lib/yoda/htap.db"olap_backend
| Type | Default |
|---|---|
| string | "datafusion" |
Which OLAP engine to instantiate. Accepted values:
"datafusion"— Apache DataFusion (pure Rust, natively async). Always available in default builds."duckdb"— DuckDB (C++ bundled, requires--features duckdb-backend).
[engine]
olap_backend = "datafusion"olap_in_memory
| Type | Default |
|---|---|
| bool | true |
Keep the OLAP backend entirely in memory. For DataFusion, prefer the [engine.datafusion_storage] sub-table instead; this flag is primarily for the DuckDB backend. Set to false and provide olap_path for durable DuckDB storage.
olap_path
| Type | Default |
|---|---|
| string (optional) | — |
Filesystem path for durable OLAP storage when olap_in_memory = false. For DuckDB this is a .duckdb file. Ignored for DataFusion when [engine.datafusion_storage] is set.
sync_interval_ms
| Type | Default |
|---|---|
| integer (ms) | 500 |
How often the background CDC sync loop polls _yoda_cdc_log and applies events to the OLAP mirror.
≤ 50— near real-time; adds measurable SQLite I/O pressure.100–500— balanced; recommended for most workloads.≥ 1000— low overhead; suitable for batch analytics.
[engine]
sync_interval_ms = 250sync_batch_size
| Type | Default |
|---|---|
| integer | 1000 |
Maximum number of CDC events consumed per sync cycle. Larger batches amortise per-transaction overhead and increase throughput at the cost of higher per-cycle latency.
read_pool_size
| Type | Default |
|---|---|
| integer | 4 |
Number of read connections in the OLTP connection pool (round-robin). Increase for workloads with many concurrent OLTP reads.
sync_mode
| Type | Default |
|---|---|
| string | "destructive" |
How CDC events are applied to the OLAP mirror:
"destructive"(alias"mirror") — standard mirror semantics.UPDATEoverwrites the row,DELETEremoves it."temporal"(aliases"scd2","scd_type_2") — SCD Type 2 append-only mode. Every change appends a new version with_yoda_valid_from,_yoda_valid_to, and_yoda_operationcolumns. Enables point-in-time queries.
See Sync modes for details.
[engine]
sync_mode = "temporal"rocksdb_cdc_path
| Type | Default | Feature |
|---|---|---|
| string (optional) | — | rocksdb-cdc |
Path to a RocksDB directory used as a durable CDC event buffer. When set, SQLite triggers still fire into _yoda_cdc_log, but a bridge drains them into RocksDB on each poll cycle. The sync engine then reads exclusively from RocksDB, giving crash-durable event buffering (5–7x faster CDC write path than SQLite triggers alone). Ignored in sidecar mode.
[engine]
rocksdb_cdc_path = "/var/lib/yoda/cdc-log"flight_port
| Type | Default | Feature |
|---|---|---|
| integer (optional) | — | flight-sql |
TCP port on which the Arrow Flight SQL gRPC server listens. When set, yd serve starts the Flight SQL endpoint at 0.0.0.0:<port>. Requires the binary to be compiled with --features flight-sql.
See FlightSQL for client examples.
[engine]
flight_port = 50051flight_auth_token
| Type | Default | Feature |
|---|---|---|
| string (optional) | — | flight-sql |
Bearer token that clients must supply in the authorization: Bearer <token> gRPC metadata header. Falls back to the YODA_FLIGHT_AUTH_TOKEN environment variable if this key is absent, so operators can avoid storing tokens on disk.
Prefer the environment variable
Store the token in YODA_FLIGHT_AUTH_TOKEN rather than in the TOML file to avoid leaking it via config-file access logs or version control.
log_format
| Type | Default |
|---|---|
| string | "text" |
Log format used in headless / stdout mode:
"text"— human-readable tracing output (default)."json"— structured JSON log lines compatible with Loki, Datadog, and the ELK stack.
[engine]
log_format = "json"schema_registry_path
| Type | Default |
|---|---|
| string (optional) | — |
Path to a JSON file where the schema registry is persisted between restarts. When set, HtapEngine::register_table writes the file atomically after each registration. On the next start, all previously registered tables are restored automatically — no need to re-declare them in [[tables]].
[engine]
schema_registry_path = "/var/lib/yoda/registry.json"metrics_port
| Type | Default | Feature |
|---|---|---|
| integer (optional) | — | metrics-exporter |
TCP port for the Prometheus metrics HTTP endpoint. When set, yd serve exposes counters and gauges from yoda-sync and yoda at http://0.0.0.0:<port>/metrics. Requires the binary to be compiled with --features metrics-exporter.
[engine]
metrics_port = 9100[engine.datafusion_storage]
Optional sub-table controlling where DataFusion persists Arrow data. When absent, DataFusion defaults to in-memory storage. Ignored when olap_backend = "duckdb".
mode
| Type | Default |
|---|---|
| string | "in_memory" (when section absent) |
Storage mode. Accepted values:
| Value | Aliases | Description | Requires |
|---|---|---|---|
"in_memory" | "memory", "inmemory" | No persistence (default) | — |
"arrow_ipc" | "ipc" | Arrow IPC files on local disk | path |
"parquet" | — | Parquet files on local disk | path |
"s3-parquet" | "s3_parquet" | Parquet on Amazon S3 | url, cloud-storage feature |
"gcs-parquet" | "gcs_parquet" | Parquet on Google Cloud Storage | url, cloud-storage feature |
path
| Type | Default |
|---|---|
| string (optional) | — |
Local filesystem path for arrow_ipc and parquet modes. Yoda creates the directory on first use.
url
| Type | Default |
|---|---|
| string (optional) | — |
Object-store URL for cloud storage modes. Examples:
"s3://my-bucket/yoda-data"fors3-parquet"gs://my-bucket/yoda-data"forgcs-parquet
[engine.datafusion_storage]
mode = "parquet"
path = "/var/lib/yoda/data"[engine.datafusion_storage]
mode = "s3-parquet"
url = "s3://analytics-bucket/htap"See Configuration for cloud-storage credential setup.
[[tables]]
Declares HTAP tables. Each entry is an element of the [[tables]] array. At least one entry is required for the engine to replicate data.
name
| Type | Required |
|---|---|
| string | yes |
Table name. Must match the SQLite table name exactly and consist only of [A-Za-z0-9_] characters.
ddl
| Type | Default |
|---|---|
| string (optional) | — |
CREATE TABLE statement executed on the OLTP layer at startup before registering the table. Use CREATE TABLE IF NOT EXISTS to make it idempotent.
[[tables]]
name = "orders"
ddl = "CREATE TABLE IF NOT EXISTS orders (id INTEGER PRIMARY KEY, amount REAL)"[[tables.columns]]
Array of column definitions. At least one column must have primary_key = true; the engine rejects tables with no primary key at startup.
name
| Type | Required |
|---|---|
| string | yes |
Column name. Must consist only of [A-Za-z0-9_] characters.
type
| Type | Required |
|---|---|
| string | yes |
Arrow data type. Accepted values:
| Value | Aliases | Arrow type |
|---|---|---|
"int64" | "integer", "bigint" | Int64 |
"int32" | "int" | Int32 |
"int16" | "smallint" | Int16 |
"int8" | "tinyint" | Int8 |
"uint64" | — | UInt64 |
"uint32" | — | UInt32 |
"utf8" | "text", "string", "varchar" | Utf8 |
"float64" | "double", "real" | Float64 |
"float32" | "float" | Float32 |
"boolean" | "bool" | Boolean |
"date" | "date32" | Date32 |
"timestamp" | — | Timestamp(Microsecond, None) |
"binary" | "blob" | Binary |
nullable
| Type | Default |
|---|---|
| bool | false |
Whether the column allows NULL values.
primary_key
| Type | Default |
|---|---|
| bool | false |
If true, includes this column in the table's primary key. Composite primary keys are supported by setting primary_key = true on multiple columns.
[sidecar]
Optional section that switches Yoda into sidecar mode: it follows an external database via timestamp-based CDC instead of using local SQLite triggers. Requires the sidecar Cargo feature (--features sidecar).
See Sidecar mode for a conceptual overview.
source_path
| Type | Required |
|---|---|
| string | yes |
Connection path or DSN for the external database:
- SQLite: local filesystem path, e.g.
"/data/app.db" - PostgreSQL: connection string, e.g.
"postgres://user:pass@host:5432/mydb"
source_type
| Type | Default |
|---|---|
| string | "sqlite" |
Source database type: "sqlite" or "postgres".
enable_local_oltp
| Type | Default |
|---|---|
| bool | false |
When true, a local Rusqlite engine is also started at oltp_path. This lets you perform local OLTP writes alongside the sidecar CDC source. Defaults to false for pure sidecar mode.
poll_batch_size
| Type | Default |
|---|---|
| integer | 500 |
Maximum number of rows fetched from the source database per poll cycle.
watermark_path
| Type | Default |
|---|---|
| string (optional) | — |
Path to a RocksDB directory for persisting the CDC watermark between restarts. When absent, the watermark is in-memory only and polling restarts from the beginning on each restart. Requires the rocksdb-watermark feature on yoda-sidecar (automatically enabled when the sidecar top-level feature is on).
[sidecar.delete_detection]
Optional sub-table controlling how deleted rows in the source database are detected.
mode
| Value | Description |
|---|---|
"disabled" | No delete detection (default when section absent) |
"soft_delete" | Detects deletes via a boolean/flag column |
"full_diff" | Periodically scans the source to detect missing rows |
column
| Type | When required |
|---|---|
| string | mode = "soft_delete" |
Name of the soft-delete flag column. A non-zero integer or non-empty string value indicates the row is deleted.
every_n_cycles
| Type | Default | When used |
|---|---|---|
| integer | 60 | mode = "full_diff" |
How many poll cycles between full-diff scans.
[[sidecar.tables]]
Per-table configuration for the sidecar CDC consumer. Mirrors [[tables]] but describes the source database schema (not the OLAP target).
table_name
| Type | Required |
|---|---|
| string | yes |
Name of the table in the source database.
primary_key
| Type | Required |
|---|---|
| array of strings | yes |
Column names forming the primary key. Composite primary keys are supported.
primary_key = ["tenant_id", "order_id"]created_at_column
| Type | Default |
|---|---|
| string | "created_at" |
Column carrying the row's creation timestamp. Used with updated_at_column to distinguish INSERTs from UPDATEs: if created_at == updated_at the event is classified as an INSERT.
updated_at_column
| Type | Default |
|---|---|
| string | "updated_at" |
Column carrying the row's last-update timestamp. Used as the watermark for incremental polling.
columns
| Type | Default |
|---|---|
| array of strings | [] (all columns) |
Explicit list of column names to replicate. When empty, all columns are replicated.
Complete examples
(a) Standard HTAP with DataFusion
[engine]
oltp_path = "app.db"
olap_backend = "datafusion"
sync_interval_ms = 500
sync_mode = "destructive"
read_pool_size = 4
log_format = "text"
[engine.datafusion_storage]
mode = "parquet"
path = "/var/lib/yoda/data"
[[tables]]
name = "users"
ddl = "CREATE TABLE IF NOT EXISTS users (id INTEGER PRIMARY KEY, name TEXT NOT NULL, email TEXT)"
[[tables.columns]]
name = "id"
type = "int64"
nullable = false
primary_key = true
[[tables.columns]]
name = "name"
type = "utf8"
nullable = false
[[tables.columns]]
name = "email"
type = "utf8"
nullable = true
[[tables]]
name = "events"
ddl = "CREATE TABLE IF NOT EXISTS events (id INTEGER PRIMARY KEY, user_id INTEGER, type TEXT, ts TEXT)"
[[tables.columns]]
name = "id"
type = "int64"
nullable = false
primary_key = true
[[tables.columns]]
name = "user_id"
type = "int64"
nullable = true
[[tables.columns]]
name = "type"
type = "utf8"
nullable = true
[[tables.columns]]
name = "ts"
type = "utf8"
nullable = true(b) Sidecar mode polling a PostgreSQL source
[engine]
oltp_path = ":memory:" # not used; OLTP is disabled
olap_backend = "datafusion"
sync_interval_ms = 1000
sync_mode = "temporal" # keep full history
log_format = "json"
[engine.datafusion_storage]
mode = "parquet"
path = "/var/lib/yoda/sidecar-data"
[sidecar]
source_path = "postgres://app_user:secret@db.internal:5432/production"
source_type = "postgres"
enable_local_oltp = false
poll_batch_size = 1000
watermark_path = "/var/lib/yoda/watermarks"
[sidecar.delete_detection]
mode = "soft_delete"
column = "deleted_at"
[[sidecar.tables]]
table_name = "orders"
primary_key = ["id"]
created_at_column = "created_at"
updated_at_column = "updated_at"
columns = ["id", "customer_id", "amount", "status", "created_at", "updated_at", "deleted_at"]
[[sidecar.tables]]
table_name = "customers"
primary_key = ["id"]
created_at_column = "created_at"
updated_at_column = "updated_at"(c) HTAP + FlightSQL + temporal mode + Prometheus metrics
[engine]
oltp_path = "/data/htap.db"
olap_backend = "datafusion"
sync_interval_ms = 200
sync_mode = "temporal"
read_pool_size = 8
log_format = "json"
schema_registry_path = "/data/registry.json"
# Arrow Flight SQL gRPC server (--features flight-sql)
flight_port = 50051
# flight_auth_token = "my-secret" # or set YODA_FLIGHT_AUTH_TOKEN env var
# Prometheus metrics (--features metrics-exporter)
metrics_port = 9100
[engine.datafusion_storage]
mode = "parquet"
path = "/data/olap"
[[tables]]
name = "transactions"
ddl = """
CREATE TABLE IF NOT EXISTS transactions (
id INTEGER PRIMARY KEY,
account_id INTEGER NOT NULL,
amount REAL NOT NULL,
currency TEXT,
ts TEXT NOT NULL
)
"""
[[tables.columns]]
name = "id"
type = "int64"
nullable = false
primary_key = true
[[tables.columns]]
name = "account_id"
type = "int64"
nullable = false
[[tables.columns]]
name = "amount"
type = "float64"
nullable = false
[[tables.columns]]
name = "currency"
type = "utf8"
nullable = true
[[tables.columns]]
name = "ts"
type = "utf8"
nullable = falseTemporal columns are added automatically
When sync_mode = "temporal", Yoda appends _yoda_valid_from, _yoda_valid_to, and _yoda_operation columns to the OLAP table automatically. Do not declare them in [[tables.columns]].
Feature-gated keys
flight_port, flight_auth_token, and metrics_port are silently ignored if the corresponding feature flag (flight-sql, metrics-exporter) was not enabled at compile time. Build with the appropriate --features flags or use a pre-built binary that includes them.