Build your first Python model in dbt: A Step-by-Step Tutorial

📘Introduction

If you're new to dbt (data build tool) and want to transform raw data into clean, analytics-ready tables using Python, you're in the right place. In this step-by-step tutorial, we’ll walk you through how to build your first Python model in dbt, step by step.

📌 This is a must-know topic for the dbt Analytics Engineering Certification Exam, so mastering it now puts you one step closer to passing the exam and leveling up your data engineering skills! 👨‍🎓

✅ Prerequisites

Before you start, make sure you have:

☑️ A dbt project set up
☑️ Set up Medallion Architecture
☑️ Source data loaded into your data warehouse
☑️ Source configurations defined in sources.yml

🐍 What are Python Models?

In dbt, a Python model is a .py file that contains a function named model() and returns a DataFrame - typically a Pandas DataFrame for most warehouses, or a PySpark DataFrame when using platforms like Databricks. When you run dbt run, dbt executes the Python code and materializes the result into your data warehouse.

Python models only support the following materializations:

table
incremental

✍️1️⃣ Specify Requirements

Let’s define the objective of our model:

We want to create a model that selects specific columns from the student table from the source udc. This source table exists in the schema landing in our data warehouse.

The table student contains the following data:

Here’s what we want our model to do:

✅ Select the columns: ID, Name, Major, Number

📁2️⃣ Create Python model

In your dbt project, navigate to the models folder. Create a new .py file in the appropriate layer. In our example we create a file named cleaned_student.py within the folder 02_bronze.