How to Load JSON Data into DuckDB using dlt

📘 Introduction

JSON is one of the most common formats for exchanging data. You will see it in API responses, configuration files, exported records, and small data projects.

In this tutorial, you will learn how to load JSON data into DuckDB using dlt. DuckDB gives us a fast local analytics database, and dlt helps us turn a simple JSON file into a repeatable data pipeline.

We will create a small JSON file, build a Python pipeline, load the records into DuckDB, and check the result with SQL.

💡 What are we implementing?

We will build a local pipeline with this flow:

JSON file -> Python JSON reader -> dlt -> DuckDB table -> SQL check

The goal is not only to import a file once. The goal is to understand a simple pattern that appears in many data engineering workflows:

read data from a local JSON file
turn the JSON data into records
load the records into a database
inspect the loaded table with SQL

For a beginner project, this is a useful combination because everything runs locally. You do not need a cloud account, database server, or API key.

✅ Prerequisites

Before we start, you should have:

☑️ Python 3.9 or newer installed
☑️ Basic knowledge of running terminal commands
☑️ A text editor such as VS Code
☑️ No API key or cloud account

⚙️1️⃣ Create a project folder

First, create a new folder for the project:

mkdir dlt-json-duckdb
cd dlt-json-duckdb

Create and activate a virtual environment:

python -m venv .venv
source .venv/bin/activate

On Windows, activate it with:

.venv\Scripts\activate

📦2️⃣ Install package

Now install dlt with DuckDB support:

pip install "dlt[duckdb]"

This installs dlt together with the DuckDB Python package and the dependencies needed for the DuckDB destination. You do not need to install DuckDB separately for this tutorial.

📄3️⃣ Create a small JSON file

Create a folder named data:

mkdir data

Inside that folder, create a file named customers.json:

[
  {
    "customer_id": 1,
    "name": "Ana Silva",
    "country": "Portugal",
    "created_at": "2026-01-10"
  },
  {
    "customer_id": 2,
    "name": "John Miller",
    "country": "United States",
    "created_at": "2026-01-12"
  },
  {
    "customer_id": 3,
    "name": "Mina Tanaka",
    "country": "Japan",
    "created_at": "2026-01-14"
  },
  {
    "customer_id": 4,
    "name": "Sofia Garcia",
    "country": "Spain",
    "created_at": "2026-01-15"
  }
]

This file contains a JSON array. Each object in the array represents one customer record.

Your project should now look like this:

dlt-json-duckdb/
├── .venv/
└── data/
    └── customers.json

This public part already gives us a clean project setup and a realistic JSON file. In the Academy section, we continue by building the dlt pipeline, loading the file into DuckDB, checking the table with SQL, and fixing common beginner mistakes.

You can view this post with the tier: Academy Membership

Join academy now to read the post and get access to the full library of premium posts for academy members only.

Join Academy Already have an account? Sign In

How to Load JSON Data into DuckDB using dlt

Data Engineer

GitHub Copilot vs Codex vs Claude Code: What Is the Difference?

How to Use GitHub Copilot in Visual Studio Code

What Is GitHub Copilot? Explained for Beginners

📘 Introduction

💡 What are we implementing?

✅ Prerequisites

⚙️1️⃣ Create a project folder

📦2️⃣ Install package

📄3️⃣ Create a small JSON file

You can view this post with the tier: Academy Membership

Understanding _dlt_id and _dlt_parent_id in dlt

How dlt Loads Nested JSON into Tables in DuckDB

When to Use replace, append, or merge in a dlt Pipeline

How to Load CSV Data into DuckDB using dlt

How to Load API Data into DuckDB with dlt