Introduction

In this tutorial, we want to join Pandas DataFrames. In order to do this, we use the the merge() method of Pandas.

Import Libraries

First, we import the following python modules:

import pandas as pd

Create Pandas DataFrames

Next, we create two Pandas DataFrames with some example data from dictionaries:

First, we create the PySpark DataFrame "df_languages":

data = {
    "id": [1, 2, 3, 4],
    "language": ["Python", "JavaScript", "C++", "Visual Basic"]
}
df_languages = pd.DataFrame(data)
df_languages

Next, we create the PySpark DataFrame "df_frameworks":

data = {
    "framework_id": [1, 2, 3, 4, 5, 6],
    "framework": ["Spring", "FastAPI", "ReactJS", "Django", "Flask", "AngularJS"],
    "language_id": [5, 1, 2, 1, 1, 2]
}
df_frameworks = pd.DataFrame(data)
df_frameworks

Inner Join

Now, we would like to join the two DataFrames over an inner join. The DataFrame "df_languages" has the primary key "id" and the foreign key in the DataFrame "df_frameworks" is "language_id".

To join the DataFrames, we use the merge() method of Pandas. We have to specify the join type and the key columns of both DataFrames. For an inner join, we set the parameter "how" to "inner":

df_joined = df_languages.merge(
    df_frameworks,
    how="inner",
    left_on="id",
    right_on="language_id"
)
df_joined = df_joined[["id", "language", "framework_id", "framework"]]
df_joined

Full Outer Join

In order to join the two DataFrames over a full outer join, we have to set the parameter "how" to "outer":

df_joined = df_languages.merge(
    df_frameworks,
    how="outer",
    left_on="id",
    right_on="language_id"
)
df_joined = df_joined[["id", "language", "framework_id", "framework"]]

Left Join

In order to join the two DataFrames over a left join, we have to set the parameter "how" to "left":

df_joined = df_languages.merge(
    df_frameworks,
    how="left",
    left_on="id",
    right_on="language_id"
)
df_joined = df_joined[["id", "language", "framework_id", "framework"]]
df_joined

Right Join

In order to join the two DataFrames over a right join, we have to set the parameter "how" to "right":

df_joined = df_languages.merge(
    df_frameworks,
    how="right",
    left_on="id",
    right_on="language_id"
)
df_joined = df_joined[["id", "language", "framework_id", "framework"]]
df_joined

Conclusion

Congratulations! Now you are one step closer to become an AI Expert. You have seen that it is very easy to join Pandas DataFrames. We can simply use the merge() method of Pandas. Try it yourself!

Instagram

Also check out our Instagram page. We appreciate your like or comment. Feel free to share this post with your friends.