📘 Introduction

JSON is one of the most common formats for exchanging data. You will see it in API responses, configuration files, exported records, and small data projects.

In this tutorial, you will learn how to load JSON data into DuckDB using dlt. DuckDB gives us a fast local analytics database, and dlt helps us turn a simple JSON file into a repeatable data pipeline.

We will create a small JSON file, build a Python pipeline, load the records into DuckDB, and check the result with SQL.

💡 What are we implementing?

We will build a local pipeline with this flow:

JSON file -> Python JSON reader -> dlt -> DuckDB table -> SQL check

The goal is not only to import a file once. The goal is to understand a simple pattern that appears in many data engineering workflows:

  • read data from a local JSON file
  • turn the JSON data into records
  • load the records into a database
  • inspect the loaded table with SQL

For a beginner project, this is a useful combination because everything runs locally. You do not need a cloud account, database server, or API key.

✅ Prerequisites

Before we start, you should have:

☑️ Python 3.9 or newer installed
☑️ Basic knowledge of running terminal commands
☑️ A text editor such as VS Code
☑️ No API key or cloud account

⚙️1️⃣ Create a project folder

First, create a new folder for the project:

mkdir dlt-json-duckdb
cd dlt-json-duckdb

Create and activate a virtual environment:

python -m venv .venv
source .venv/bin/activate

On Windows, activate it with:

.venv\Scripts\activate

📦2️⃣ Install package

Now install dlt with DuckDB support:

pip install "dlt[duckdb]"

This installs dlt together with the DuckDB Python package and the dependencies needed for the DuckDB destination. You do not need to install DuckDB separately for this tutorial.

📄3️⃣ Create a small JSON file

Create a folder named data:

mkdir data

Inside that folder, create a file named customers.json:

[
  {
    "customer_id": 1,
    "name": "Ana Silva",
    "country": "Portugal",
    "created_at": "2026-01-10"
  },
  {
    "customer_id": 2,
    "name": "John Miller",
    "country": "United States",
    "created_at": "2026-01-12"
  },
  {
    "customer_id": 3,
    "name": "Mina Tanaka",
    "country": "Japan",
    "created_at": "2026-01-14"
  },
  {
    "customer_id": 4,
    "name": "Sofia Garcia",
    "country": "Spain",
    "created_at": "2026-01-15"
  }
]

This file contains a JSON array. Each object in the array represents one customer record.

Your project should now look like this:

dlt-json-duckdb/
├── .venv/
└── data/
    └── customers.json

This public part already gives us a clean project setup and a realistic JSON file. In the Academy section, we continue by building the dlt pipeline, loading the file into DuckDB, checking the table with SQL, and fixing common beginner mistakes.

You can view this post with the tier: Academy Membership

Join academy now to read the post and get access to the full library of premium posts for academy members only.

Join Academy Already have an account? Sign In