📘 Introduction

One of DuckDB’s most useful features is the ability to query CSV files directly—no need to load them into a database first. This tutorial will guide you through running SQL queries on a CSV file using Python.

✅ Prerequisites

Before you begin, make sure you have:

🐍☑️ Installed Python
📦☑️ Installed DuckDB and Pandas via pip
🌐☑️ Created and activated a virtual environment (venv)

📁1️⃣ Create Sample CSV file

First, create a sample CSV file called student.csv within your project folder:

project-folder/
├── student.csv

Add the following content to the .csv file:

id,name,date_of_birth,major
s1,Lara Fitzgerald,05.08.02,Data Analytics
s2,Mike Meyer,20.05.03,Data Engineering
s3,Eliza Gomez,01.01.02,Data Science
s4,Travis Robinson,19.12.01,Data Analytics
s5,Jackie Brown,23.05.03,Data Engineering
s6,Caleb Pearson,02.03.00,Data Science
s7,Ava Johansson,26.07.00,Data Science
s8,Nathan Williams,20.11.02,Data Engineering
s9,Cooper Harris,08.06.01,Data Analytics
s10,Murphy Fraser,01.12.03,Data Science

🐍2️⃣ Create Python script

In the same folder as the CSV file, create a Python file (.py) or Jupyter notebook (.ipynb):

project-folder/
├── student.csv
└── query_csv.ipynb

📥3️⃣ Import Libraries

Start by importing the required Python modules:

import duckdb

🐤4️⃣ Query CSV file via DuckDB

DuckDB allows you to run SQL queries directly on your CSV file. For example, let’s select all students whose major is Data Engineering:

You can view this post with the tier: Academy Membership

Join academy now to read the post and get access to the full library of premium posts for academy members only.

Join Academy Already have an account? Sign In