📘 Introduction

When you build a data pipeline, loading data is not only about moving rows from a source to a destination. You also need to decide what should happen when the pipeline runs again.

Should the table be replaced completely? Should new rows be appended? Should existing rows be updated when the same customer or order appears again?

In this tutorial, you will learn when to use replace, append, or merge in a dlt pipeline. We will use small customer records and DuckDB so you can see the difference clearly.

💡 Why write dispositions matter

In dlt, the loading behavior is controlled with write_disposition. It tells dlt how to write data into the destination table.

The three most common options are:

  • replace: replace the table with the latest data
  • append: add new rows to the existing table
  • merge: update existing rows and insert new rows based on a key

The right choice depends on your source system and what the data means. Choosing the wrong option can create duplicate rows, delete useful history, or leave old values in your table.

✅ Prerequisites

Before we start, you should have:

☑️ Python 3.9 or newer installed
☑️ Basic knowledge of running terminal commands
☑️ A text editor such as VS Code
☑️ No API key or cloud account required

⚙️1️⃣ Create a project folder

First, create a new folder for the project:

mkdir dlt-write-dispositions-duckdb
cd dlt-write-dispositions-duckdb

Create and activate a virtual environment:

python -m venv .venv
source .venv/bin/activate

On Windows, activate it with:

.venv\Scripts\activate

📦2️⃣ Install package

Install dlt with DuckDB support:

pip install "dlt[duckdb]"

This installs dlt together with the DuckDB Python package and the dependencies needed for the DuckDB destination. You do not need to install DuckDB separately for this tutorial.

🧠3️⃣ What is write_disposition?

write_disposition is the setting that controls how dlt writes data to the destination table.

A simple pipeline run can look like this:

pipeline.run(
    customers,
    table_name="customers",
    write_disposition="replace",
)

Changing only this one value can change the behavior of the whole load.

replace is useful for full loads, append adds new data to the destination, and merge can update or deduplicate records by using a primary_key or merge_key.

🔁4️⃣ Use replace for full refresh loads

Use replace when the source gives you the full current dataset and you want the destination table to match it exactly.

Create a file named load_replace.py:

import dlt


customers = [
    {
        "customer_id": 1,
        "name": "Ana Silva",
        "country": "Portugal",
        "created_at": "2026-01-10",
        "updated_at": "2026-01-10",
    },
    {
        "customer_id": 2,
        "name": "John Miller",
        "country": "United States",
        "created_at": "2026-01-12",
        "updated_at": "2026-01-12",
    },
]

pipeline = dlt.pipeline(
    pipeline_name="write_dispositions",
    destination="duckdb",
    dataset_name="sales",
)

load_info = pipeline.run(
    customers,
    table_name="customers_replace",
    write_disposition="replace",
)

print(load_info)

Run it:

python load_replace.py

If you run this pipeline again with a different list of customers, dlt replaces the destination table with the latest result.

Use replace when:

  • the source sends a complete snapshot
  • you do not need old rows from previous runs
  • the table is small enough to reload
  • you want simple and predictable behavior

➕5️⃣ Use append for event-style data

Use append when each pipeline run brings new rows that should be added to the table.

You can view this post with the tier: Academy Membership

Join academy now to read the post and get access to the full library of premium posts for academy members only.

Join Academy Already have an account? Sign In