Introduction

In this tutorial, we want to filter specific rows from a PySpark DataFrame based on specific conditions. In order to do this, we use the the filter() method of PySpark.

Import Libraries

First, we import the following python modules:

from pyspark.sql import SparkSession

Create SparkSession

Before we can work with Pyspark, we need to create a SparkSession. A SparkSession is the entry point into all functionalities of Spark.

In order to create a basic SparkSession programmatically, we use the following command:

spark = SparkSession \
    .builder \
    .appName("Python PySpark Example") \
    .getOrCreate()

Create PySpark DataFrame

Next, we create a PySpark DataFrame with some example data from a list. To do this, we use the method createDataFrame() and pass the data and the column names as arguments.

column_names = ["language", "framework", "users"]
data = [
    ("Python", "FastAPI", 9000),
    ("JavaScript", "ReactJS", 7000),
    ("Python", "Django", 20000),
    ("Java", "Spring", 12000),
]
df = spark.createDataFrame(data, column_names)
df.show()

Filtering with Column Conditions

Now, we would like to filter the rows of the DataFrame based on multiple conditions.

To do this, we use the filter() method of PySpark and pass the column conditions as argument:

df_filtered = df.filter((df.language == "Python") & (df.users >= 10000))
df_filtered.show()

Filtering with SQL Expression

Next, we would like to filter the rows of the DataFrame based on multiple conditions by using a SQL expression.

To do this, we use the filter() method of PySpark and pass the SQL expression as argument:

df_filtered = df.filter("language = 'Python' AND users >= 10000")
df_filtered.show()

Conclusion

Congratulations! Now you are one step closer to become an AI Expert. You have seen that it is very easy to filter specific rows from a PySpark DataFrame based on specific conditions. We can simply use the filter() method of PySpark. Try it yourself!

Instagram

Also check out our Instagram page. We appreciate your like or comment. Feel free to share this post with your friends.