Data Engineering

Data Engineering

19 posts
PySpark - How to create and use Broadcast Variables
Academy Membership PySparkPython

PySpark - How to create and use Broadcast Variables

Introduction In distributed computing environments like Apache Spark, efficient data handling is critical for performance. One useful feature for optimizing computations is broadcast variables. Broadcast variables allow you to share large read-only data across all nodes in a Spark cluster without duplicating the data for each task. In this tutorial,...

How to load a Sample Dataset for Real-Time Intelligence into an existing KQL Database in Microsoft Fabric

How to load a Sample Dataset for Real-Time Intelligence into an existing KQL Database in Microsoft Fabric

Introduction Microsoft Fabric offers a gallery of sample datasets that can be used to perform and practice Real-Time Analytics. These sample data sets offer a great opportunity to gain experience and familiarize yourself with the technologies and services in the Real-Time Intelligence Experience. In this tutorial we will show you...

PySpark - Concatenate String Columns of a DataFrame
Academy Membership PySparkPython

PySpark - Concatenate String Columns of a DataFrame

Introduction In this tutorial, we will show you how to concatenate multiple string columns of a PySpark DataFrame into a single column. In order to do this, we will use the functions concat() and concat_ws() of PySpark. Import Libraries First, we import the following python modules: from pyspark.sql...

You’ve successfully subscribed to Deep Learning Nerds | The ultimate Learning Platform for AI and Data Science
Welcome back! You’ve successfully signed in.
Great! You’ve successfully signed up.
Success! Your email is updated.
Your link has expired
Success! Check your email for magic link to sign-in.