PySpark - FP - Course ID 58339 - Hands On 1

Uploaded by

Jegadeesan Singaram

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views2 pages

PySpark - FP - Course ID 58339 - Hands On 1

Uploaded by

Jegadeesan Singaram

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 2

Step 1: import the SparkSession Package

Step 2: Create a SparkSession object.

Step 3: Read the json file, and create a DataFrame with the Jason data. Display the DataFrame. Save the
DataFrame to a paraquet file with name Employees.

Step 4: From the DataFrame, display the associates who are mapped to ‘JAVA’ stream. Save the resultant
DataFrame to a parquet file with name JavaEmployees.

from pyspark.sql import SparkSession

# Step 2: Create a SparkSession object

spark = SparkSession.builder \

.appName("Employee Data Processing") \

.getOrCreate()

# Step 3: Read the JSON file and create a DataFrame

# Assuming the JSON file is named 'employees.json'

df = spark.read.json("employees.json")

# Display the DataFrame

df.show()

# Save the DataFrame to a Parquet file

df.write.parquet("Employees")

# Step 4: Filter associates mapped to the 'JAVA' stream

java_employees_df = df.filter(df.stream == "JAVA")

# Display the filtered DataFrame

java_employees_df.show()

# Save the filtered DataFrame to a Parquet file

Pyspark Basics
No ratings yet
Pyspark Basics
16 pages
Pyspark File Commands and Theory
No ratings yet
Pyspark File Commands and Theory
29 pages
PoC Proposal Template
100% (1)
PoC Proposal Template
43 pages
Master Pyspark Zero To Hero 1738689679
No ratings yet
Master Pyspark Zero To Hero 1738689679
102 pages
Pyspark Syntax Using Simple Examples
No ratings yet
Pyspark Syntax Using Simple Examples
28 pages
Top 100 Pyspark Functions For Data Engineers 1738131847
No ratings yet
Top 100 Pyspark Functions For Data Engineers 1738131847
30 pages
My Pyspark Practice Notes
100% (1)
My Pyspark Practice Notes
63 pages
Basic DataFrame Operation
No ratings yet
Basic DataFrame Operation
11 pages
PYSPARK Interview Questions
100% (3)
PYSPARK Interview Questions
126 pages
Json Function in Pyspark
No ratings yet
Json Function in Pyspark
26 pages
Pyspark IQ FREE Guide
100% (1)
Pyspark IQ FREE Guide
57 pages
Journal
No ratings yet
Journal
47 pages
w12 - Runningnotes 201026 001818
No ratings yet
w12 - Runningnotes 201026 001818
25 pages
EDA Python For Data Analsis
No ratings yet
EDA Python For Data Analsis
10 pages
Spark Class 1 Rough Notes
No ratings yet
Spark Class 1 Rough Notes
9 pages
Pyspark Cheatsheet
No ratings yet
Pyspark Cheatsheet
21 pages
PySpark Cheatsheet
No ratings yet
PySpark Cheatsheet
12 pages
PySpark Notes
No ratings yet
PySpark Notes
64 pages
Py 1731703428
No ratings yet
Py 1731703428
8 pages
Slide 10 PySpark - SQL
No ratings yet
Slide 10 PySpark - SQL
131 pages
DATAFRAME Vs DATASETS
No ratings yet
DATAFRAME Vs DATASETS
9 pages
PySpark - FP - Course ID 58339 - Hands On 4
No ratings yet
PySpark - FP - Course ID 58339 - Hands On 4
2 pages
(Big Data Analytics With PySpark) (CheatSheet)
No ratings yet
(Big Data Analytics With PySpark) (CheatSheet)
7 pages
Top 10 Production-Grade Reusable PySpark Scripts For Data Engineers - by Mayurkumar Surani - May, 2025 - Medium
No ratings yet
Top 10 Production-Grade Reusable PySpark Scripts For Data Engineers - by Mayurkumar Surani - May, 2025 - Medium
14 pages
Pyspark Coding Questions From StrataScratch Platform
No ratings yet
Pyspark Coding Questions From StrataScratch Platform
23 pages
Scenario Series 19 - Handling JSON in Pyspark
No ratings yet
Scenario Series 19 - Handling JSON in Pyspark
8 pages
Etl Commands For Pyspark
No ratings yet
Etl Commands For Pyspark
8 pages
Datasets and Dataframes: Org - Apache.Spark - Sql.Sparksession
No ratings yet
Datasets and Dataframes: Org - Apache.Spark - Sql.Sparksession
17 pages
Pyspark Funcamentals
No ratings yet
Pyspark Funcamentals
10 pages
PySpark Data Frame Questions PDF
100% (2)
PySpark Data Frame Questions PDF
57 pages
Must Know Pyspark Coding Before Databricks Interview
No ratings yet
Must Know Pyspark Coding Before Databricks Interview
7 pages
Py Spark 1
No ratings yet
Py Spark 1
11 pages
Python Pyspark Q's
No ratings yet
Python Pyspark Q's
16 pages
CS 2018 042
No ratings yet
CS 2018 042
8 pages
Page 02
No ratings yet
Page 02
2 pages
Python Data Exploratory Commands
No ratings yet
Python Data Exploratory Commands
9 pages
Spark Mini Project
No ratings yet
Spark Mini Project
1 page
ETL Processes Using PySpark
67% (3)
ETL Processes Using PySpark
7 pages
T09 Sparksql
No ratings yet
T09 Sparksql
30 pages
Pyspark Coding Interview Questions
No ratings yet
Pyspark Coding Interview Questions
19 pages
Suppose You Have A Large Dataset Stored in A Distributed File System Like HDFS
No ratings yet
Suppose You Have A Large Dataset Stored in A Distributed File System Like HDFS
11 pages
RDD
No ratings yet
RDD
4 pages
Py Spark
No ratings yet
Py Spark
7 pages
BDA All 37 Practical Answers
No ratings yet
BDA All 37 Practical Answers
3 pages
Fundamental Pyspark Operations 1708364268
No ratings yet
Fundamental Pyspark Operations 1708364268
10 pages
Comparison of SQL
No ratings yet
Comparison of SQL
11 pages
Day 11 Notes
No ratings yet
Day 11 Notes
3 pages
Spark Material
No ratings yet
Spark Material
6 pages
Spark Commands
No ratings yet
Spark Commands
3 pages
Spark Mini Project
No ratings yet
Spark Mini Project
1 page
SQL Cheat Sheet Python
100% (1)
SQL Cheat Sheet Python
1 page
Pyspark Distinct and Filter
No ratings yet
Pyspark Distinct and Filter
3 pages
PySpark Interview Questions
No ratings yet
PySpark Interview Questions
3 pages
Apache Spark
No ratings yet
Apache Spark
5 pages
Spark RDD Commands - Spark Core
No ratings yet
Spark RDD Commands - Spark Core
7 pages
Enterprise Application Development with Ext JS and Spring
From Everand
Enterprise Application Development with Ext JS and Spring
Gerald Gierer
No ratings yet
JDK Tutorials - Herong's Tutorial Examples
From Everand
JDK Tutorials - Herong's Tutorial Examples
Herong Yang
No ratings yet
Angular Generative AI: Building an intelligent CV enhancer with Google Gemini
From Everand
Angular Generative AI: Building an intelligent CV enhancer with Google Gemini
Abdelfattah Ragab
No ratings yet
Firebase Storage for Angular: A reliable file upload solution for your applications
From Everand
Firebase Storage for Angular: A reliable file upload solution for your applications
Abdelfattah Ragab
No ratings yet
How to a Developers Guide to 4k: Developer edition, #3
From Everand
How to a Developers Guide to 4k: Developer edition, #3
Xinc Cyberwizard
No ratings yet

PySpark - FP - Course ID 58339 - Hands On 1

Uploaded by

PySpark - FP - Course ID 58339 - Hands On 1

Uploaded by

Step 1: import the SparkSession Package

Step 2: Create a SparkSession object.

from pyspark.sql import SparkSession

# Step 2: Create a SparkSession object

.appName("Employee Data Processing") \

# Step 3: Read the JSON file and create a DataFrame

# Assuming the JSON file is named 'employees.json'

# Display the DataFrame

# Save the DataFrame to a Parquet file

# Step 4: Filter associates mapped to the 'JAVA' stream

# Display the filtered DataFrame

# Save the filtered DataFrame to a Parquet file

You might also like