Introduction

In this tutorial, we want to write a PySpark DataFrame to a CSV file. In order to do this, we use the csv() method and the format("csv").save() method of PySpark DataFrameWriter. Besides, we use DataFrame.write for creating a DataFrameWriter instance.

Import Libraries

First, we import the following python modules:

from pyspark.sql import SparkSession

Create SparkSession

Before we can work with Pyspark, we need to create a SparkSession. A SparkSession is the entry point into all functionalities of Spark.

In order to create a basic SparkSession programmatically, we use the following command:

spark = SparkSession \
    .builder \
    .appName("Python PySpark Example") \
    .getOrCreate()

Create PySpark DataFrame

Next, we create the PySpark DataFrame with some example data from a list. To do this, we use the method createDataFrame() and pass the data and the column names as arguments.

column_names = ["language", "framework", "users"]
data = [
    ("Python", "Django", 20000),
    ("Python", "FastAPI", 9000),
    ("Java", "Spring", 7000),
    ("JavaScript", "ReactJS", 5000)
]
df = spark.createDataFrame(data, column_names)
df.show()

Write PySpark DataFrame to CSV File

Next, we would like to write the PySpark DataFrame to a CSV file. The file should have the following attributes:

  • File should include a header with the column names.
  • Columns of the file should be separated with semi-colon ;.
  • Existing file should be overwritten.
  • File path should be "data/frameworks.csv".

We can do this in two different ways.

Option 1: csv()

To do this, we first create a DataFrameWriter instance with df.write. Afterwards, we use the csv() method in combination with the option() method and the mode() method of DataFrameWriter:

df.write.option("header",True) \
    .option("delimiter",";") \
    .mode("overwrite") \
    .csv("data/frameworks.csv")

Option 2: format("csv").save()

Now, we consider another option to write the PySpark DataFrame to a CSV file.

First, we create a DataFrameWriter instance with df.write. Afterwards, we use the save() method in combination with the format() method, the option() method and the mode() method of DataFrameWriter:

df.write.option("header",True) \
    .option("delimiter",";") \
    .format("csv") \
    .mode("overwrite") \
    .save("data/frameworks.csv")

Conclusion

Congratulations! Now you are one step closer to become an AI Expert. You have seen that it is very easy to write a PySpark DataFrame to a CSV file. We can simply use the csv() method or the format("csv").save() method of PySpark DataFrameWriter. A DataFrameWriter instance can be created with DataFrame.write. Try it yourself!

Instagram

Also check out our Instagram page. We appreciate your like or comment. Feel free to share this post with your friends.