Architecture
Yoda is a fully in-process HTAP (Hybrid Transactional/Analytical Processing) engine that embeds a low-latency SQLite write path alongside a high-throughput analytical engine (DuckDB or Apache DataFusion), bridged by trigger-based Change Data Capture so that analytical queries always see a near-real-time view of your OLTP data — without a separate server, ETL job, or network hop.
Standard HTAP Mode
Client
│
▼
HtapEngine (facade — crates/yoda/)
│
├─ SqlParserRouter ──────────────────► AST-based routing (sqlparser-rs)
│ ├── writes / DDL / simple SELECT → OLTP
│ └── aggregates / JOIN / CTE / window → OLAP
│
├─ RusqliteEngine (OLTP)
│ ├── write connection (WAL mode, dedicated OS thread)
│ ├── read pool (round-robin, 4 connections by default)
│ └── CDC connection ──► _yoda_cdc_log (trigger-populated)
│
├─ OlapBackend (enum — crates/yoda-olap/)
│ ├── DuckDbEngine (feature: duckdb-backend)
│ └── DataFusionEngine (feature: datafusion-backend, default)
│
└─ CdcSyncEngine (crates/yoda-sync/)
├── polls _yoda_cdc_log (seq watermark)
├── SyncMode::Destructive — mirror (UPDATE/DELETE in-place)
├── SyncMode::Temporal — SCD Type 2 (append-only history)
├── bulk INSERT via Arrow batch → OlapEngine::load_arrow()
└── background loop with CancellationToken shutdownSidecar Mode
In sidecar mode the local OLTP write path is optional. Instead, a TimestampCdcConsumer polls an external database (SQLite or PostgreSQL) for rows whose updated_at timestamp exceeds the last recorded watermark:
External DB (SQLite / PostgreSQL)
│ SELECT … WHERE updated_at > watermark ORDER BY updated_at, pk
▼
TimestampCdcConsumer<S: SourceConnector> (crates/yoda-sidecar/)
│ emits CdcEvent stream (Insert / Update / Delete)
│ watermark persisted in RocksDB (optional)
▼
CdcSyncEngine (temporal or destructive)
│ same DML pipeline as standard mode
▼
OlapBackend (DuckDB / DataFusion)No schema changes are required on the source database — only that each tracked table has an updated_at column (and optionally a deleted_at column for soft-delete detection).
Why Two Engines?
SQLite (via Rusqlite) excels at low-latency point writes: WAL mode gives sub-millisecond single-row commits, and the embedded nature means zero network overhead. DuckDB and DataFusion excel at vectorised scan-heavy queries over large tables: column-oriented storage, predicate pushdown, and SIMD execution make aggregates orders of magnitude faster than row-store databases.
Yoda lets each engine do what it does best. The SqlParserRouter makes the routing invisible to the caller — one query() call, no manual dispatch.
Why SQLite for OLTP?
- Serverless: no separate process, no port, no credentials.
- WAL mode: concurrent reads never block writes.
- Trigger-based CDC: SQLite's
AFTER INSERT/UPDATE/DELETEtriggers write compact JSON arrays to_yoda_cdc_logwith ~10 % less overhead thanjson_object()equivalents. forbid(unsafe_code): the async wrapper (yoda-tokio-rusqlite) uses dedicated OS threads +crossbeamchannels to makerusqlite::Connection(which is!Send) safely usable from async code with zero unsafe blocks.
Why DuckDB or DataFusion for OLAP?
DuckDB brings a battle-tested columnar SQL engine with ACID transaction support, native Arrow Appender bulk-loading (zero-copy), and MVCC-based concurrent reads. All operations run on blocking threads via spawn_blocking.
DataFusion is a pure-Rust natively async engine with pluggable storage backends (InMemory, Arrow IPC, Parquet, S3, GCS). It integrates naturally with Tokio and streams results without buffering the entire result set (RecordBatchBoxStream). It is the default because it has zero C/C++ dependencies.
See OLAP Backends for a detailed comparison and guidance on which to pick.
How CDC Works
- When
register_tableis called, Yoda installs three SQLite triggers on the target table —AFTER INSERT,AFTER UPDATE, andAFTER DELETE. - Each trigger appends one row to
_yoda_cdc_log: a monotonically increasing sequence number, a Unix timestamp, the operation code (I/U/D), the table name, and a JSON array snapshot of the row data. CdcSyncEnginepolls_yoda_cdc_logusing a stored watermark (last_synced_seq). On each cycle it fetches up tosync_batch_sizeevents, converts them to OLAP DML, and advances the watermark.- Consecutive INSERTs to the same table are batched into a single Arrow
RecordBatchand loaded viaOlapEngine::load_arrow()— an Arrow-native path that avoids SQL string construction entirely (5–7× faster than individualINSERTstatements for bulk workloads). - Processed events are pruned from
_yoda_cdc_logafter each successful cycle (prune_after_sync = trueby default) to keep the log table small.
For crash-durable CDC event buffering, set rocksdb_cdc_path — SQLite triggers still fire into _yoda_cdc_log, but a bridge drains them atomically into RocksDB before the sync engine reads them. See RocksDB CDC for details.
Key Crates
| Crate | Role |
|---|---|
yoda | HtapEngine facade, HtapConfig, integration tests |
yoda-core | Shared traits and types (CdcEvent, SyncMode, QueryTarget, …) |
yoda-tokio-rusqlite | Async SQLite wrapper (dedicated thread per connection) |
yoda-oltp-rusqlite | RusqliteEngine + CDC trigger setup |
yoda-sync | CdcSyncEngine, SqlParserRouter, CDC-to-DML converter |
yoda-datafusion | DataFusion OLAP engine with pluggable StorageMode |
yoda-duckdb | DuckDB OLAP engine with Arrow Appender bulk-load |
yoda-sidecar | TimestampCdcConsumer, watermark store, SourceConnector trait |
yoda-flight | Arrow Flight SQL gRPC server (flight-sql feature) |
yoda-tui | yd CLI + TUI dashboard |