Pandas - Group a DataFrame and apply Aggregations

Introduction

One of the key tasks in data analysis is grouping data to gain insights and make informed decisions. In this tutorial, we will show you how to group the rows of a Pandas DataFrame and apply different aggregations on the grouped data. In order to do this, we will use the groupby() method of Pandas in combination with various aggregation functions.

Import Libraries

First, we import the following python modules:

import pandas as pd

Create Pandas DataFrame

Next, we create a Pandas DataFrame with some example data from a dictionary:

data = {
    "language": ["Python", "Python", "Java", "JavaScript", "Python"],
    "framework": ["Django", "FastAPI", "Spring", "ReactJS", "FastAPI"],
    "users": [20000, 9000, 7000, 5000, 13000]
}
df = pd.DataFrame(data)
df

Group DataFrame and Apply Aggregations

The groupby() method of Pandas allows you to group data of a Pandas DataFrame based on one or more columns.

Once grouped, you can use various aggregation functions to summarize the grouped data. For example, you could use one of the following aggregation functions:

Calculate number of rows for each group: count()
Calculate minimum of values for each group: min()
Calculate maximum of values for each group: max()
Calculate sum of values for each group: sum()
Calculate mean of values for each group: mean()

Group DataFrame by Single Column

We want to group the rows of the Pandas DataFrame based on the column "language". Besides, we want to calculate the mean of the column "users" for each group.

To do this, we use the groupby() method in combination with the mean() method of Pandas:

grouped_df = df.groupby("language")["users"].mean()
grouped_df

Group DataFrame by Multiple Columns

We want to group the rows of the Pandas DataFrame based on the columns "language" and "framework". Besides, we want to calculate the sum of the column "users" for each group.

To do this, we use the groupby() method in combination with the sum() method of Pandas:

grouped_df = df.groupby(["language", "framework"])["users"].sum()
grouped_df

Conclusion

Congratulations! Now you are one step closer to become an AI Expert. In this blog post, we've explored the basics of grouping data in Pandas DataFrames. This functionality is crucial for data analysis and gaining insights into large datasets.

You have seen that it is very easy to group data of a Pandas DataFrame and apply different aggregations to the grouped data. We can simply use the groupby() method in combination with specific aggregation methods of Pandas like count(), sum() or mean(). Try it yourself!

Instagram

Also check out our Instagram page. We appreciate your like or comment. Feel free to share this post with your friends.

Pandas - Group a DataFrame and apply Aggregations

Data Engineer

How to build your first Web Application with Gradio in Python: A Step-by-Step Guide

How to flatten a JSON column with a Dataflow in Microsoft Fabric

How to create a Machine Learning Model in Microsoft Fabric: A Step-by-Step Guide

Introduction

Import Libraries

Create Pandas DataFrame

Group DataFrame and Apply Aggregations

Group DataFrame by Single Column

Group DataFrame by Multiple Columns

Conclusion

Instagram

How to build your first Web Application with Gradio in Python: A Step-by-Step Guide

PySpark - How to create and use Broadcast Variables

How to install Python on Mac

How to read a Delta Table into a PySpark DataFrame in Microsoft Fabric

How to write a PySpark DataFrame to a Delta Table in Microsoft Fabric