📘 Introduction

In many data warehouses, source tables only show the current state of a record. A customer might move from trial to paid, a subscription plan might change from basic to pro, or an account might become inactive. If the source system overwrites the old value, your analytics team can lose the history.

In this beginner-friendly tutorial, you will learn what dbt snapshots are, why they matter, and how to use them to track historical changes in your data warehouse. We will use a simple customer status example and create a snapshot that keeps previous versions of each customer record.

📌
This topic is also useful for the dbt Analytics Engineering Certification, because snapshots help you understand slowly changing dimensions, historical tracking, source freshness thinking, and how dbt manages changes in mutable source data.

💡 Why dbt snapshots matter

A dbt snapshot records how rows change over time. This is useful when a source table is mutable, which means old values can be overwritten.

For example, imagine a customers table that looks like this today:

customer_idcustomer_namestatusplanupdated_at
1Ada GmbHtrialstarter2026-05-01 09:00:00
2Byte AGpaidpro2026-05-01 10:30:00

One week later, customer 1 becomes a paid customer. The source table now shows only the new value:

customer_idcustomer_namestatusplanupdated_at
1Ada GmbHpaidstarter2026-05-08 12:00:00
2Byte AGpaidpro2026-05-01 10:30:00

Without a snapshot, it is hard to answer questions like: When did the customer convert from trial to paid? How long was the customer in trial? Which plan did the customer have at a specific point in time?

dbt snapshots solve this by storing multiple versions of the same business record.

💡
Think of a snapshot as historical memory for a table that normally only shows the latest state.

🧭 What are we implementing?

We will create a dbt snapshot for a customer source table. The snapshot will track changes to customer status and plan over time.

The workflow will look like this:

source customers table
        ↓
dbt snapshot
        ↓
snapshot table with history
        ↓
analytics query for current and historical states

We will use the timestamp strategy, because our example source table has a reliable updated_at column. This is the recommended strategy when the source system updates that timestamp whenever a row changes.

✅ Prerequisites

Before you begin, make sure you have:

  • A dbt project set up
  • dbt installed locally or available in dbt Cloud
  • A data warehouse connection configured
  • Basic knowledge of dbt sources and models
  • A source table or staging model with a unique key and an updated_at column

You can follow the structure even if your warehouse is Snowflake, BigQuery, Databricks, Postgres, or another dbt-supported platform. The exact SQL data loading step may differ by platform, but the dbt snapshot concept stays the same.

CTA Image

If you’d like to dive deeper into dbt (data build tool), our book Building Modern Data Pipelines with dbt: From Raw Data to Gold Standard with the Medallion Architecture provides a hands-on guide to designing modern data pipelines. It covers dbt’s core concepts and best practices, including building Bronze, Silver, and Gold layers with the Medallion Architecture. It also serves as a hands-on study guide for the dbt Analytics Engineering Certification.

View on Amazon

📁1️⃣ Create the snapshots folder

By default, dbt expects snapshot files in a folder called snapshots/ at the root of your dbt project.

Your project can look like this:

my_dbt_project/
├── dbt_project.yml
├── models/
│   └── staging/
│       └── stg_customers.sql
└── snapshots/
    └── customers_snapshot.sql

If your project does not have the folder yet, create it:

mkdir snapshots

🧾2️⃣ Prepare the source data

For this tutorial, imagine you have a source or staging model with customer records.

At the first snapshot run, the data looks like this:

customer_idcustomer_namestatusplanupdated_at
1Ada GmbHtrialstarter2026-05-01 09:00:00
2Byte AGpaidpro2026-05-01 10:30:00
3Cloud Factorytrialstarter2026-05-02 08:15:00

Later, customer 1 changes from trial to paid, and customer 3 upgrades from starter to pro. A snapshot will preserve the old versions instead of losing them.

This is the useful part: even before writing the snapshot file, you should know which column identifies each row and which column tells dbt that a row changed. In this example, customer_id is the unique key and updated_at is the change timestamp.

📝3️⃣ Define the snapshot

Create the file snapshots/customers_snapshot.sql:

{% snapshot customers_snapshot %}

{{
    config(
      target_schema='snapshots',
      unique_key='customer_id',
      strategy='timestamp',
      updated_at='updated_at'
    )
}}

select
    customer_id,
    customer_name,
    status,
    plan,
    updated_at
from {{ ref('stg_customers') }}

{% endsnapshot %}

This tells dbt to track the records returned by the select statement. When a customer row changes and the updated_at value becomes newer, dbt closes the previous version and inserts a new version into the snapshot table.

🎓
Academy tip: before you build a snapshot, always identify the unique key and the change detection column. If either one is wrong, your history can become misleading.

You can view this post with the tier: Academy Membership

Join academy now to read the post and get access to the full library of premium posts for academy members only.

Join Academy Already have an account? Sign In