Skip to content

Temporal Mode (SCD Type 2)

Temporal mode turns Yoda's OLAP mirror into an append-only, time-versioned history. Instead of overwriting or deleting rows when data changes, every mutation is recorded as a new version with validity timestamps. This enables point-in-time queries, audit trails, and slowly-changing dimension (SCD Type 2) analytics.

Enable it by setting sync_mode = "temporal" in your config.

How it works

When SyncMode::Temporal is configured, three extra columns are automatically appended to every OLAP table at registration time via temporalize_schema():

ColumnArrow typeNullableMeaning
_yoda_valid_fromInt64NoTimestamp when this version became active
_yoda_valid_toInt64YesTimestamp when this version was superseded (NULL = currently active)
_yoda_operationUtf8No'I' insert · 'U' update · 'D' delete

Reserved column names

register_table() rejects any schema that already contains _yoda_valid_from, _yoda_valid_to, or _yoda_operation. Rename conflicting columns before registering.

Operation semantics

INSERT

A single row is appended with _yoda_valid_from = timestamp, _yoda_valid_to = NULL, _yoda_operation = 'I'.

sql
INSERT INTO users (id, name, _yoda_valid_from, _yoda_valid_to, _yoda_operation)
VALUES (1, 'Alice', 1700000000, NULL, 'I')

UPDATE

Two statements run in sequence:

  1. Close the previous version — set _yoda_valid_to on the currently-active row.
  2. Insert the new version — append the updated row with _yoda_operation = 'U'.
sql
-- 1. Close the previous version
UPDATE users
SET _yoda_valid_to = 1700000100
WHERE id = 1 AND _yoda_valid_to IS NULL

-- 2. Insert the new version
INSERT INTO users (id, name, _yoda_valid_from, _yoda_valid_to, _yoda_operation)
VALUES (1, 'Alice Smith', 1700000100, NULL, 'U')

DELETE

Two statements run in sequence:

  1. Close the current version — same UPDATE as for an update.
  2. Insert a tombstone — a row with every schema column populated from the deleted row's last-known values (taken from the CDC event's old_data) plus _yoda_operation = 'D'. Columns that the source CDC event omits fall back to NULL.
sql
-- 1. Close the current version
UPDATE users
SET _yoda_valid_to = 1700000200
WHERE id = 1 AND _yoda_valid_to IS NULL

-- 2. Insert a tombstone (non-PK columns carry the row's pre-delete values)
INSERT INTO users (id, name, _yoda_valid_from, _yoda_valid_to, _yoda_operation)
VALUES (1, 'Ada', 1700000200, NULL, 'D')

TIP

SQLite triggers in yoda-oltp-rusqlite capture the full pre-image on DELETE (json_array(OLD.*)), so all non-PK columns are present in the event. The NULL fallback only fires for sidecar / external sources that don't carry the deleted row's values.

Atomicity

BackendTemporal multi-statement ops
DuckDBAtomic — close + insert run inside a single transaction
DataFusionBest-effort — transactions are no-ops; a mid-op crash may leave a partially-closed version

For production temporal workloads requiring strong consistency, use the DuckDB backend.

Point-in-time queries

To reconstruct a table as it existed at timestamp T:

sql
SELECT *
FROM users
WHERE _yoda_valid_from <= T
  AND (_yoda_valid_to IS NULL OR _yoda_valid_to > T)

This returns exactly the rows that were active at T — including rows that have since been updated or deleted.

Configuration

toml
[engine]
oltp_path    = "app.db"
olap_backend = "datafusion"
sync_mode    = "temporal"   # enables SCD Type 2; default is "destructive"

See Configuration and Sync modes for full option details.

Worked example

1. Register a users table in temporal mode

toml
# htap.toml
[engine]
oltp_path    = "app.db"
olap_backend = "datafusion"
sync_mode    = "temporal"

[[tables]]
name = "users"
ddl  = "CREATE TABLE IF NOT EXISTS users (id INTEGER PRIMARY KEY, name TEXT, email TEXT)"

  [[tables.columns]]
  name        = "id"
  type        = "int64"
  nullable    = false
  primary_key = true

  [[tables.columns]]
  name     = "name"
  type     = "utf8"
  nullable = true

  [[tables.columns]]
  name     = "email"
  type     = "utf8"
  nullable = true

2. Write data through the OLTP path

sh
# Insert a user
yd exec --db app.db "INSERT INTO users VALUES (1, 'Alice', 'alice@example.com')"

# Update their email
yd exec --db app.db "UPDATE users SET email = 'alice@new.com' WHERE id = 1"

# Delete the user
yd exec --db app.db "DELETE FROM users WHERE id = 1"

After a sync cycle, the OLAP users table will contain three rows — the original INSERT, the UPDATE version, and the DELETE tombstone.

3. Point-in-time query

sh
# Show users as they existed at epoch timestamp 1700000050
# (between the INSERT at 1700000000 and the UPDATE at 1700000100)
yd query --db app.db \
  "SELECT id, name, email
   FROM users
   WHERE _yoda_valid_from <= 1700000050
     AND (_yoda_valid_to IS NULL OR _yoda_valid_to > 1700000050)"

4. Full audit history

sh
# Show all versions of user id = 1 in chronological order
yd query --db app.db \
  "SELECT id, name, email, _yoda_valid_from, _yoda_valid_to, _yoda_operation
   FROM users
   WHERE id = 1
   ORDER BY _yoda_valid_from"

Use cases

  • Audit history: know exactly what every row looked like at any past timestamp.
  • Slowly-changing dimensions: classic SCD Type 2 pattern for data warehouse fact tables.
  • Regulatory compliance: immutable record of every data change for GDPR / SOX / HIPAA audit trails.
  • Debugging: reproduce query results from a past point in time to diagnose data quality issues.
  • Point-in-time analytics: compare metrics across two historical snapshots without snapshots.

Source

crates/yoda-sync/src/temporal_converter.rstemporalize_schema, cdc_event_to_temporal_dml, cdc_events_to_temporal_batch.

Released under the Apache-2.0 License.