Add snapshots to your DAG
Related documentation
What are snapshots?
Analysts often need to "look back in time" at previous data states in their mutable tables. While some source data systems are built in a way that makes accessing historical data possible, this is not always the case. dbt provides a mechanism, snapshots, which records changes to a mutable table over time.
Snapshots implement type-2 Slowly Changing Dimensions over mutable source tables. These Slowly Changing Dimensions (or SCDs) identify how a row in a table changes over time. Imagine you have an orders table where the status field can be overwritten as the order is processed.
| id | status | updated_at |
|---|---|---|
| 1 | pending | 2024-01-01 |
Now, imagine that the order goes from "pending" to "shipped". That same record will now look like:
| id | status | updated_at |
|---|---|---|
| 1 | shipped | 2024-01-02 |
This order is now in the "shipped" state, but we've lost the information about when the order was last in the "pending" state. This makes it difficult (or impossible) to analyze how long it took for an order to ship. dbt can "snapshot" these changes to help you understand how values in a row change over time. Here's an example of a snapshot table for the previous example:
| id | status | updated_at | dbt_valid_from | dbt_valid_to |
|---|---|---|---|---|
| 1 | pending | 2024-01-01 | 2024-01-01 | 2024-01-02 |
| 1 | shipped | 2024-01-02 | 2024-01-02 | null |
Configuring snapshots
Configuration best practices
How snapshots work
When you run the dbt snapshot command:
- On the first run: dbt will create the initial snapshot table — this will be the result set of your
selectstatement, with additional columns includingdbt_valid_fromanddbt_valid_to. All records will have adbt_valid_to = nullor the value specified indbt_valid_to_current(available in dbt Core 1.9+) if configured. - On subsequent runs: dbt will check which records have changed or if any new records have been created:
- The
dbt_valid_tocolumn will be updated for any existing records that have changed. - The updated record and any new records will be inserted into the snapshot table. These records will now have
dbt_valid_to = nullor the value configured indbt_valid_to_current(available in dbt Core v1.9+).
- The
Snapshots can be referenced in downstream models the same way as referencing models — by using the ref function.
Detecting row changes
Snapshot "strategies" define how dbt knows if a row has changed. There are two strategies built-in to dbt:
- Timestamp — Uses an
updated_atcolumn to determine if a row has changed. - Check — Compares a list of columns between their current and historical values to determine if a row has changed.
Timestamp strategy (recommended)
The timestamp strategy uses an updated_at field to determine if a row has changed. If the configured updated_at column for a row is more recent than the last time the snapshot ran, then dbt will invalidate the old record and record the new one. If the timestamps are unchanged, then dbt will not take any action.
The timestamp strategy requires the following configurations:
| Config | Description | Example |
|---|---|---|
| updated_at | A column which represents when the source row was last updated. May support ISO date strings and unix epoch integers, depending on the data platform you use. | updated_at |
Example usage:
Check strategy
The check strategy is useful for tables which do not have a reliable updated_at column. This strategy works by comparing a list of columns between their current and historical values. If any of these columns have changed, then dbt will invalidate the old record and record the new one. If the column values are identical, then dbt will not take any action.
The check strategy requires the following configurations:
| Config | Description | Example |
|---|---|---|
| check_cols | A list of columns to check for changes, or all to check all columns | ["name", "email"] |
The check snapshot strategy can be configured to track changes to all columns by supplying check_cols = 'all'. It is better to explicitly enumerate the columns that you want to check. Consider using a surrogate key to condense many columns into a single column.
Example usage
Hard deletes (opt-in)
Snapshot meta-fields
Snapshot tables will be created as a clone of your source dataset, plus some additional meta-fields*.
In dbt Core v1.9+ (or available sooner in the "Latest" release track in dbt Cloud):
- These column names can be customized to your team or organizational conventions using the
snapshot_meta_column_namesconfig. - Use the
dbt_valid_to_currentconfig to set a custom indicator for the value ofdbt_valid_toin current snapshot records (like a future date such as9999-12-31). By default, this value isNULL. When set, dbt will use this specified value instead ofNULLfordbt_valid_tofor current records in the snapshot table. - Use the
hard_deletesconfig to track deleted records as new rows with thedbt_is_deletedmeta field when using thehard_deletes='new_record'field.
| Field | Meaning | Notes | Example |
|---|---|---|---|
dbt_valid_from | The timestamp when this snapshot row was first inserted and became valid. | This column can be used to order the different "versions" of a record. | snapshot_meta_column_names: {dbt_valid_from: start_date} |
dbt_valid_to | The timestamp when this row became invalidated. For current records, this is NULL by default or the value specified in dbt_valid_to_current. | The most recent snapshot record will have dbt_valid_to set to NULL or the specified value. | snapshot_meta_column_names: {dbt_valid_to: end_date} |
dbt_scd_id | A unique key generated for each snapshot row. | This is used internally by dbt. | snapshot_meta_column_names: {dbt_scd_id: scd_id} |
dbt_updated_at | The updated_at timestamp of the source record when this snapshot row was inserted. | This is used internally by dbt. | snapshot_meta_column_names: {dbt_updated_at: modified_date} |
dbt_is_deleted | A string value indicating if the record has been deleted. (True if deleted, False if not deleted). | Added when hard_deletes='new_record' is configured. | snapshot_meta_column_names: {dbt_is_deleted: is_deleted} |
All of these column names can be customized using the snapshot_meta_column_names config. Refer to this example for more details.
*The timestamps used for each column are subtly different depending on the strategy you use:
-
For the
timestampstrategy, the configuredupdated_atcolumn is used to populate thedbt_valid_from,dbt_valid_toanddbt_updated_atcolumns. -
For the
checkstrategy, the current timestamp is used to populate each column. If configured, thecheckstrategy uses theupdated_atcolumn instead, as with the timestamp strategy.