Data Engineer - Deep Learning Nerds | The ultimate Learning Platform for AI and Data Science (Page 2)

43 posts

Academy Membership Microsoft Fabric Azure

How to upload Files into a Fabric Lakehouse

Introduction In order to work with data in Microsoft Fabric, you first need to get data into it. There are several ways to do this. One way is to manually upload files or folders from your local drive into a Fabric lakehouse. In this tutorial, we will explain step-by-step how...

by Data Engineer

Academy Membership PySpark Python

Exploring Data Transformation in PySpark: Native Spark Functions vs. UDFs vs. Pandas UDFs

Introduction Data transformation is a fundamental task in any data analysis or processing pipeline. In the realm of big data processing, Apache Spark has emerged as a powerful framework for handling large-scale data processing tasks efficiently. When it comes to transforming data within Spark, developers often have to choose between...

by Data Engineer

Academy Membership PySpark Python

PySpark - How to use Pandas User Defined Function (UDF)

Introduction In the realm of big data processing, PySpark has emerged as a powerful tool for handling large-scale datasets. Its distributed computing framework allows for efficient processing of massive volumes of data. However, despite its capabilities, performing certain data transformations in PySpark can sometimes be cumbersome and complex. That'...

by Data Engineer

Python PySpark

PySpark - Change Column Types of a DataFrame

Introduction Data manipulation tasks often involve converting column data types to ensure consistency and accuracy in analysis. In this tutorial, we will show you how to change column types of a PySpark DataFrame. In order to do this, we will use the cast() function of PySpark. Import Libraries First, we...

by Data Engineer

Academy Membership Python PySpark

PySpark - Window Functions

Introduction Window functions in PySpark are a powerful feature for data manipulation and analysis. They allow you to perform complex calculations on subsets of data within a DataFrame, without the need for expensive joins or subqueries. In this tutorial, we will show you how to use window functions in PySpark....

by Data Engineer

Academy Membership FastAPI Docker

How to containerize a FastAPI Application with Docker

Introduction FastAPI, a high-performance Python web framework, coupled with Docker, a powerful containerization tool, can significantly boost the efficiency of your development workflow. In this blog post, we'll walk you through the process of setting up a FastAPI project using a Dockerfile, providing a flexible and scalable solution...

by Data Engineer

Academy Membership Python PySpark

PySpark - Add an ID Column to a DataFrame

Introduction One common task when working with large datasets is the need to generate unique identifiers for each record. In this tutorial, we will explore how to easily add an ID column to a PySpark DataFrame. In order to do this, we use the monotonically_increasing_id() function of PySpark....

by Data Engineer

Python Data Engineering Pandas

Pandas - Group a DataFrame and apply Aggregations

Introduction One of the key tasks in data analysis is grouping data to gain insights and make informed decisions. In this tutorial, we will show you how to group the rows of a Pandas DataFrame and apply different aggregations on the grouped data. In order to do this, we will...

by Data Engineer

Academy Membership PySpark Python

PySpark - Group a DataFrame and apply Aggregations

Introduction One of the key tasks in data analysis is grouping data to gain insights and make informed decisions. In this tutorial, we will show you how to group the rows of a PySpark DataFrame and apply different aggregations on the grouped data. In order to do this, we will...

by Data Engineer