Sidecar Mode
Sidecar mode lets Yoda follow an existing database — SQLite or PostgreSQL — and maintain a continuously updated OLAP mirror, optionally with full SCD Type 2 history. No schema changes are required on the source database; the only prerequisite is that each tracked table has an updated_at column (and optionally a deleted_at column for soft-delete detection).
Requires the sidecar Cargo feature:
yoda = { version = "1", features = ["sidecar"] }How It Works
Instead of installing SQLite triggers, the sidecar consumer polls the source database on each cycle:
SELECT <columns>
FROM <table>
WHERE (updated_at, pk1, pk2, …) > (watermark_ts, last_pk1, last_pk2, …)
ORDER BY updated_at, pk1, pk2, …
LIMIT <poll_batch_size>The watermark advances to the (updated_at, pk1, pk2, …) tuple of the last row seen. Composite primary keys are supported via SQL tuple comparison, so ties on updated_at are broken correctly.
INSERT vs UPDATE Heuristic
Because a WHERE updated_at > watermark query cannot inherently distinguish a first-ever insert from an update, the consumer uses:
created_at == updated_at→ emit asCdcOperation::Insertcreated_at != updated_at→ emit asCdcOperation::Update
This heuristic requires that the source table sets created_at only once at row creation and that updated_at is updated on every subsequent change.
Configuration
Rust
use yoda::{
HtapConfig, SidecarConfig, SidecarSource, SyncMode,
TimestampCdcConfig, TimestampTableConfig, DeleteDetection,
};
let config = HtapConfig {
// OLAP engine — receives the replicated data
olap_in_memory: false,
olap_path: Some("/var/lib/myapp/olap".to_string()),
sync_mode: SyncMode::Temporal, // optional: keep full history
sync_interval: Some(std::time::Duration::from_millis(500)),
sidecar: Some(SidecarConfig {
source: SidecarSource::Postgres(
"host=db.example.com user=analytics dbname=production".to_string()
),
timestamp_config: TimestampCdcConfig {
tables: vec![
TimestampTableConfig {
table_name: "users".to_string(),
primary_key: vec!["id".to_string()],
created_at_column: "created_at".to_string(),
updated_at_column: "updated_at".to_string(),
columns: vec![], // empty = SELECT *
},
TimestampTableConfig {
table_name: "orders".to_string(),
primary_key: vec!["order_id".to_string()],
created_at_column: "created_at".to_string(),
updated_at_column: "updated_at".to_string(),
columns: vec![
"order_id".to_string(),
"user_id".to_string(),
"total".to_string(),
"created_at".to_string(),
"updated_at".to_string(),
],
},
],
poll_batch_size: 500,
delete_detection: DeleteDetection::SoftDelete {
column: "deleted_at".to_string(),
},
},
enable_local_oltp: false, // pure sidecar — no local SQLite write path
watermark_path: Some("/var/lib/myapp/watermark-db".to_string()),
}),
..HtapConfig::default()
};Python
import yoda
config = yoda.HtapConfig(
olap_backend="datafusion",
storage_mode="parquet",
storage_path="/var/lib/myapp/olap",
sync_mode="temporal",
sidecar_source="host=db.example.com user=analytics dbname=production",
sidecar_source_type="postgres",
sidecar_tables=[
yoda.TimestampTableConfig(
table_name="users",
primary_key=["id"],
created_at_column="created_at",
updated_at_column="updated_at",
),
],
sidecar_poll_batch_size=500,
sidecar_delete_detection="soft_delete:deleted_at",
sidecar_enable_oltp=False,
)
engine = yoda.HtapEngine(config)TimestampCdcConfig Fields
| Field | Type | Description |
|---|---|---|
tables | Vec<TimestampTableConfig> | One entry per table to replicate. |
poll_batch_size | u32 | Rows fetched per table per cycle. Smaller values reduce memory pressure; larger values speed up initial bulk sync. |
delete_detection | DeleteDetection | Strategy for detecting deleted rows. See below. |
TimestampTableConfig Fields
| Field | Type | Description |
|---|---|---|
table_name | String | Table name in the source database. |
primary_key | Vec<String> | Primary key columns (at least one required). Used for watermark tie-breaking and CDC event keying. |
created_at_column | String | Column set once at row creation — used with updated_at for the INSERT/UPDATE heuristic. |
updated_at_column | String | Column updated on every change — the primary watermark column. |
columns | Vec<String> | Columns to SELECT. Empty means SELECT *. Provide an explicit list to reduce bandwidth or exclude irrelevant columns. |
Delete Detection
| Variant | Behaviour |
|---|---|
DeleteDetection::Disabled | Hard deletes are not detected. Rows deleted in the source remain in OLAP unchanged. Use when the source never hard-deletes rows or when stale data is acceptable. |
DeleteDetection::SoftDelete { column } | A second query polls WHERE column IS NOT NULL AND column > watermark. Rows returned are emitted as Delete events. The column value is used as the event timestamp so SCD Type 2 validity boundaries are correct. |
DeleteDetection::FullDiff { every_n_cycles } | Not yet implemented. Reserved for a future release. Configuring it logs a warning on each cycle and produces no delete events. |
Watermark Persistence
By default (watermark_path = None), the polling watermark is kept in memory and is lost on process restart. On the next start, the consumer replays from (updated_at = 0, pk = min), which re-processes all historical rows.
Set watermark_path to a RocksDB directory to persist the watermark durably:
watermark_path: Some("/var/lib/myapp/watermark-db".to_string()),With persistence, the consumer resumes from the last seen (updated_at, pk…) tuple after a restart, avoiding a full replay. This requires the rocksdb-watermark sub-feature, which is enabled automatically when the sidecar feature is on.
Pure Sidecar vs Hybrid
| Mode | enable_local_oltp | OLTP write path | Use case |
|---|---|---|---|
| Pure sidecar | false | Not available | Read-only OLAP layer on top of an existing app DB |
| Hybrid | true | Local SQLite at oltp_path | Yoda as both a local write store and a sidecar follower |
In pure sidecar mode, calling execute() returns HtapError::OltpNotAvailable. SELECT queries fall through to OLAP automatically.
Source Requirements
- Each tracked table must have a reliable
updated_atcolumn that is updated on every row modification. Rows with a stale or missingupdated_atwill be missed between poll cycles. - Composite primary keys are fully supported.
- The source database must allow the polling connection to run
SELECTqueries on the tracked tables. - For PostgreSQL, the connection string follows the
libpqformat (host=… user=… dbname=…).
Next Steps
- Sync Modes — use
SyncMode::Temporalwith sidecar for full history - Configuration Reference —
SidecarConfigandHtapConfigfields - OLAP Backends — choosing DataFusion or DuckDB for sidecar workloads