How to Rename Multiple DataFrame Columns at Once in PySpark

📘 Introduction

Renaming columns is one of the most common transformations you’ll perform when cleaning or standardizing data in PySpark. Whether you’re aligning tables from different systems, preparing data for machine learning, or simply making column names more readable, updating many column names at once can quickly become tedious if done one-by-one.

Fortunately, PySpark's withColumnsRenamed() function provides a clean and efficient way to rename multiple columns in a single step. Instead of chaining multiple withColumnRenamed() calls or rebuilding the schema manually, this method lets you pass in one dictionary that maps existing column names to their new names. This keeps your code shorter, clearer, and easier to maintain in larger pipelines.

💡

Did you know?
The withColumnsRenamed() method is a fairly new addition to PySpark.

It was introduced in Apache Spark 3.4.0, finally giving users an official, built-in way to rename multiple columns at once—without loops or verbose code.

💡 Why Use `withColumnsRenamed()`?

You’ll benefit from withColumnsRenamed() when you want to:

Standardize column names from different data sources
Apply multiple renames in one operation
Clean messy or inconsistent schemas
Avoid repetitive withColumnRenamed() chains
Ensure transformations stay simple and declarative

💡

withColumnsRenamed() gives you an elegant way to express “rename these columns to these new names,” without writing loops or complex logic.

✅ Prerequisites

Before starting, make sure you have:

🐍☑️ Python installed
🔥☑️ A working Spark environment

📦1️⃣ Install Libraries

Install the following Python packages using pip:

pip install pyspark

📥2️⃣ Import Libraries

Start by importing the required Python modules:

from agents import Agent, Runner

⚙️3️⃣ Build a Spark Session

Next, initialize your Spark session — the entry point for working with DataFrames:

spark = SparkSession.builder \
    .appName("PySparkTutorial") \
    .getOrCreate()

✍️4️⃣ Create a Sample DataFrame

Let’s create a simple DataFrame.

data = [
    (1, "Alice", 34),
    (2, "Bob", 29),
    (3, "Carol", 41)
]

df = spark.createDataFrame(
    data,
    ["id", "full_name", "years_old"]
)

df.show()

Output:

+---+---------+----------+
| id|full_name|years_old|
+---+---------+----------+
|  1|    Alice|        34|
|  2|      Bob|        29|
|  3|    Carol|        41|
+---+---------+----------+

🔄5️⃣ Rename Multiple Columns

Suppose you want to standardize these column names to:

full_name → name
years_old → age

This can be done in one clean line:

You can view this post with the tier: Academy Membership

Join academy now to read the post and get access to the full library of premium posts for academy members only.

Join Academy Already have an account? Sign In

How to Rename Multiple DataFrame Columns at Once in PySpark

Data Engineer

Query CSV files in Python using DuckDB

Query a Pandas DataFrame using DuckDB

How to install DuckDB with pip

📘 Introduction

💡 Why Use `withColumnsRenamed()`?

✅ Prerequisites

📦1️⃣ Install Libraries

📥2️⃣ Import Libraries

⚙️3️⃣ Build a Spark Session

✍️4️⃣ Create a Sample DataFrame

🔄5️⃣ Rename Multiple Columns

You can view this post with the tier: Academy Membership

PySpark coalesce() Function Explained

How to Ingest Data from Kafka Streams to Delta Tables Using PySpark in Databricks

How to Generate a Hash from Multiple Columns in PySpark

Spark Data Skew Explained: Causes, Optimization Techniques, and Best Practices

Spark Execution Explained: Understanding the Differences Between Jobs, Stages, and Tasks

How to Rename Multiple DataFrame Columns at Once in PySpark

Data Engineer

Query CSV files in Python using DuckDB

Query a Pandas DataFrame using DuckDB

How to install DuckDB with pip

📘 Introduction

💡 Why Use withColumnsRenamed()?

✅ Prerequisites

📦1️⃣ Install Libraries

📥2️⃣ Import Libraries

⚙️3️⃣ Build a Spark Session

✍️4️⃣ Create a Sample DataFrame

🔄5️⃣ Rename Multiple Columns

You can view this post with the tier: Academy Membership

PySpark coalesce() Function Explained

How to Ingest Data from Kafka Streams to Delta Tables Using PySpark in Databricks

How to Generate a Hash from Multiple Columns in PySpark

Spark Data Skew Explained: Causes, Optimization Techniques, and Best Practices

Spark Execution Explained: Understanding the Differences Between Jobs, Stages, and Tasks

💡 Why Use `withColumnsRenamed()`?