Skip to content

Configuration Reference

HtapConfig is the single configuration struct passed to HtapEngine::new. All fields have Default implementations, so you only need to set what differs from the defaults.

rust
use yoda::HtapConfig;
use std::time::Duration;

let config = HtapConfig {
    oltp_path: "/var/lib/myapp/oltp.db".to_string(),
    olap_in_memory: false,
    olap_path: Some("/var/lib/myapp/olap".to_string()),
    sync_interval: Some(Duration::from_millis(200)),
    ..HtapConfig::default()
};

Core Engine Fields

FieldTypeDefaultDescription
oltp_pathString"yoda.db"Path to the SQLite database file. Created on first use. WAL mode and PRAGMA synchronous=NORMAL are applied automatically.
olap_in_memorybooltrueKeep the OLAP engine entirely in memory. Set to false for durability across restarts.
olap_pathOption<String>NoneFilesystem path for durable OLAP storage. For DuckDB: a .duckdb file. For DataFusion: a directory. Ignored when olap_in_memory = true.
olap_backendOlapBackendTypeDataFusionWhich OLAP engine to instantiate. See OLAP Backends.
read_pool_sizeusize4Number of SQLite read connections. Increase for workloads with heavy concurrent OLTP reads. Values below 1 are clamped to 1.

CDC Sync Fields

FieldTypeDefaultDescription
sync_intervalOption<Duration>NoneWhen Some(d), a background loop polls CDC every d. When None, you call sync_now() manually. Values of 100–500 ms are typical for balanced HTAP workloads.
sync_batch_sizeu321000Maximum CDC events consumed per sync cycle. Larger values increase throughput at the cost of per-cycle latency. For append-heavy workloads, 5 000–50 000 is common.
sync_modeSyncModeDestructiveHow CDC events are applied to OLAP. Destructive mirrors current state; Temporal (SCD Type 2) appends a history row per change. See Sync Modes.
prune_after_syncbooltrueDelete processed events from _yoda_cdc_log after each successful cycle. Set to false only for debugging or replay scenarios.

DataFusion Storage Mode

Available when the datafusion-backend feature is enabled (the default).

FieldTypeDefaultDescription
datafusion_storageStorageModeInMemoryControls how DataFusion persists tables between queries. Ignored when olap_backend = DuckDb.

StorageMode Variants

DataFusion supports five storage modes — InMemory, ArrowIpc { path }, Parquet { path }, S3Parquet { url }, and GcsParquet { url }. The cloud variants require the cloud-storage feature. See OLAP Backends → DataFusion storage modes for the full comparison (durability, write speed, predicate pushdown, UPDATE/DELETE characteristics).

UPDATE/DELETE on cloud backends

S3Parquet and GcsParquet perform a full read-modify-write cycle for every UPDATE or DELETE. They are designed for append-heavy analytics, not high-frequency point mutations.


Schema Registry Persistence

FieldTypeDefaultDescription
schema_registry_pathOption<String>NonePath to a JSON file where registered table schemas are persisted across restarts. When set, register_table writes the updated registry atomically after each call. On restart, all previously registered tables are restored automatically — no need to call register_table again.

RocksDB CDC Buffer

Available when the rocksdb-cdc feature is enabled.

FieldTypeDefaultDescription
rocksdb_cdc_pathOption<String>NonePath to a RocksDB directory used as a durable CDC event buffer. SQLite triggers still fire into _yoda_cdc_log; a bridge drains them atomically into RocksDB on every poll cycle. The sync engine then reads exclusively from RocksDB, giving crash-durable event buffering. Ignored in sidecar mode.

Sidecar Mode Fields

Available when the sidecar feature is enabled. Set sidecar: Some(SidecarConfig { … }) to switch into sidecar mode.

FieldTypeDefaultDescription
sidecarOption<SidecarConfig>NoneSidecar CDC configuration. When Some, CDC events come from an external SQLite or PostgreSQL database via timestamp polling instead of local triggers.

SidecarConfig Fields

FieldTypeDescription
sourceSidecarSourceExternal database to follow. Use SidecarSource::Sqlite(path) or SidecarSource::Postgres(conn_str).
timestamp_configTimestampCdcConfigPer-table polling settings: which tables, timestamp columns, primary keys, batch size, and delete detection. See Sidecar Mode.
enable_local_oltpboolfalse by default. Set to true to also create a local SQLite write path at oltp_path.
watermark_pathOption<String>Path to a RocksDB directory for persisting the CDC watermark between restarts. Without this, the watermark is in-memory only and polling restarts from scratch after a process restart.

TOML Configuration (yd serve)

When running the engine as a service via yd serve --config config/htap.toml, all fields map to TOML keys under [engine]:

toml
[engine]
oltp_path            = "/var/lib/myapp/oltp.db"
olap_in_memory       = false
olap_path            = "/var/lib/myapp/olap"
sync_interval_ms     = 200          # milliseconds
sync_batch_size      = 5000
read_pool_size       = 4
sync_mode            = "destructive" # or "temporal"
schema_registry_path = "/var/lib/myapp/schema_registry.json"
log_format           = "json"        # "text" (default) or "json"
metrics_port         = 9100          # Prometheus endpoint (metrics-exporter feature)

See CLI Reference for the full service configuration and signal handling.

Released under the Apache-2.0 License.