
In this tutorial, we want to read a CSV file into a PySpark DataFrame. In order to do this, we use the csv() method and the format("csv").load() method of PySpark DataFrameReader. Besides, we use for creating a DataFrameReader instance.

Import Libraries

First, we import the following python modules:

from pyspark.sql import SparkSession

Create SparkSession

Before we can work with Pyspark, we need to create a SparkSession. A SparkSession is the entry point into all functionalities of Spark.

In order to create a basic SparkSession programmatically, we use the following command:

spark = SparkSession \
    .builder \
    .appName("Python PySpark Example") \

CSV File

We consider the CSV file "frameworks.csv" containing the following data:

We have to keep in mind the following attributes of the CSV file:

  • File includes a header with the column names.
  • Columns of the file are separated with semi-colon ;.
  • File path is "data/frameworks.csv".

Read CSV File into PySpark DataFrame

Next, we would like to read the CSV file into a PySpark DataFrame. The schema of the DataFrame should be inferred automatically from the underlying data. We can do this in two different ways.

Option 1: csv()

To do this, we first create a DataFrameReader instance with Afterwards, we use the csv() method in combination with the option() method of DataFrameReader:

df ="header",True) \
    .option("delimiter",";") \
    .option("inferSchema",True) \

Option 2: format("csv").load()

Now, we consider another option to read the CSV file into a PySpark DataFrame.

First, we create a DataFrameReader instance with Afterwards, we use the load() method in combination with the format() method and the option() method of DataFrameReader:

df ="header",True) \
    .option("delimiter",";") \
    .option("inferSchema",True) \
    .format("csv") \


Congratulations! Now you are one step closer to become an AI Expert. You have seen that it is very easy to read a CSV file into a PySpark DataFrame. We can simply use the csv() method or the format("csv").load() method of PySpark DataFrameReader. A DataFrameReader instance can be created with Try it yourself!


Also check out our Instagram page. We appreciate your like or comment. Feel free to share this post with your friends.