0% found this document useful (0 votes)

7 views6 pages

Pyspark Spark SQL: Scenario Based Interview

The document provides a comparison of creating SQL queries using PySpark and Spark SQL to generate records for open job positions. It includes code snippets for initializing Spark sessions, creating DataFrames for job positions and employees, and performing joins to fill vacancies with 'Vacant'. The document emphasizes the use of both PySpark and SQL approaches for achieving the same result in handling job postings and employee data.

Uploaded by

Lapi Lapil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views6 pages

Pyspark Spark SQL: Scenario Based Interview

Uploaded by

Lapi Lapil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Scenario Based Interview

Pyspark vs
Spark SQL

Ganesh. R
Scenario: Create a SQL query that will produce an output with more records for open positions.
For example, if your employee name is "vacant" and you have vacant titles, add as many records
to the input as there are open postings.

from pyspark.sql import SparkSession

# Initialize Spark session

spark =
SparkSession.builder.appName("JobPositionsAndEmployees").getOrCreate()

# Create schema and data for job_positions

job_positions_schema = ["id", "title", "groups", "levels", "payscale",
"totalpost"]
job_positions_data = [
(1, 'General manager', 'A', 'l-15', 10000, 1),
(2, 'Manager', 'B', 'l-14', 9000, 5),
(3, 'Asst. Manager', 'C', 'l-13', 8000, 10)
]

# Create DataFrame for job_positions

job_positions_df = spark.createDataFrame(job_positions_data,
schema=job_positions_schema)

# Create schema and data for job_employees

job_employees_schema = ["id", "name", "position_id"]
job_employees_data = [
(1, 'John Smith', 1),
(2, 'Jane Doe', 2),
(3, 'Michael Brown', 2),
(4, 'Emily Johnson', 2),
(5, 'William Lee', 3),
(6, 'Jessica Clark', 3),
(7, 'Christopher Harris', 3),
(8, 'Olivia Wilson', 3),
(9, 'Daniel Martinez', 3),
(10, 'Sophia Miller', 3)
]

# Create DataFrame for job_employees

job_employees_df = spark.createDataFrame(job_employees_data,
schema=job_employees_schema)

# Show the DataFrames

job_positions_df.display()
job_employees_df.display()

###PySpark
from pyspark.sql.functions import col, lit, when
from pyspark.sql import Row
# Create DataFrame for job_employees
job_employees_df = spark.createDataFrame(job_employees_data,
schema=job_employees_columns)

# Create a DataFrame for all required rows (totalpost rows for each
job position)
expanded_positions = job_positions_df.rdd.flatMap(lambda row:
[Row(id=row['id'], title=row['title'], groups=row['groups'],
levels=row['levels'], payscale=row['payscale'],
totalpost=row['totalpost'], pos_num=i) for i in
range(row['totalpost'])]).toDF()

# Add a column pos_num to the job_employees_df to facilitate the join

job_employees_df_with_pos_num = job_employees_df.withColumn('pos_num',
lit(None).cast('int'))

# Perform the join and fill vacancies with "Vacant"

joined_df = expanded_positions.join(job_employees_df_with_pos_num,
(expanded_positions.id == job_employees_df_with_pos_num.position_id) &
(expanded_positions.pos_num == job_employees_df_with_pos_num.pos_num),
'left') \
.select('title', 'groups', 'payscale', when(col('name').isNull(),
lit('Vacant')).otherwise(col('name')).alias('name'))

# Show the result

joined_df.display()

###SQL

job_positions_df.createOrReplaceTempView("job_positions")
job_employees_df.createOrReplaceTempView("job_employees")

%sql
with cte as(
select
name,
position_id,
row_number() over(
order by
a.id
) as rn
from
job_employees as a
join job_positions as b on a.position_id = b.id
),
jp as (
select
a.id,
a.title,
a.groups,
a.payscale,
a.levels,
b.rn
from
job_positions as a
join cte as b on b.rn <= a.totalpost
)
select
a.title,
a.groups,
a.payscale,
coalesce(b.name, 'Vacant')
from
jp as a
left join cte as b on b.rn = a.rn
and b.position_id = a.id
order by
a.id,
b.rn;
IF YOU FOUND
THIS POST
USEFUL, PLEASE
SAVE IT.

Ganesh. R
+91-9030485102. Hyderabad, Telangana. [email protected]

https://medium.com/@rganesh0203 https://rganesh203.github.io/Portfolio/
https://github.com/rganesh203. https://www.linkedin.com/in/r-ganesh-a86418155/

https://www.instagram.com/rg_data_talks/ https://topmate.io/ganesh_r0203

Outsider - The Invisible Man - Chapter 4 - Share Any Manga On Man
No ratings yet
Outsider - The Invisible Man - Chapter 4 - Share Any Manga On Man
203 pages
APX 7500 Consolette Detailed Service Manual
No ratings yet
APX 7500 Consolette Detailed Service Manual
244 pages
Syscon Error Codes - PS3 Developer Wiki
No ratings yet
Syscon Error Codes - PS3 Developer Wiki
22 pages
Supply and Demand: Trading View
83% (6)
Supply and Demand: Trading View
53 pages
Pyspark Basics
No ratings yet
Pyspark Basics
16 pages
(Real) Trinity Model Question (Without DAV Watermark) Class 12
No ratings yet
(Real) Trinity Model Question (Without DAV Watermark) Class 12
168 pages
03-SC SOC Analyst - Threat Hunting
No ratings yet
03-SC SOC Analyst - Threat Hunting
22 pages
Databricks Vs SQL Cheat Sheet
No ratings yet
Databricks Vs SQL Cheat Sheet
11 pages
C++ Bible
No ratings yet
C++ Bible
77 pages
Redshift DG
No ratings yet
Redshift DG
733 pages
EDA Cheat Sheet
No ratings yet
EDA Cheat Sheet
7 pages
LAS - Empowerment Technologies - Week3-4
No ratings yet
LAS - Empowerment Technologies - Week3-4
12 pages
Pyspark Syntax Using Simple Examples
No ratings yet
Pyspark Syntax Using Simple Examples
28 pages
Pyspark SQL Basics Cheat Sheet: Python For Data Science
No ratings yet
Pyspark SQL Basics Cheat Sheet: Python For Data Science
1 page
Pyspark Coding Questions From StrataScratch Platform
No ratings yet
Pyspark Coding Questions From StrataScratch Platform
23 pages
Exploratory Data Analysis (Eda) With Pandas: (Cheatsheet)
No ratings yet
Exploratory Data Analysis (Eda) With Pandas: (Cheatsheet)
7 pages
DMS ORAL' - Qes
No ratings yet
DMS ORAL' - Qes
5 pages
Diskashur Pro2 Manual v3.6
No ratings yet
Diskashur Pro2 Manual v3.6
115 pages
IG 12 Win Desktop WinLogin Admin Iss2
No ratings yet
IG 12 Win Desktop WinLogin Admin Iss2
167 pages
Day 19 Master Pyspark
No ratings yet
Day 19 Master Pyspark
2 pages
Lecture 1
No ratings yet
Lecture 1
59 pages
Practical Examination 2020 Ip Set 1
100% (1)
Practical Examination 2020 Ip Set 1
3 pages
SQL Cheat Sheet Python
100% (1)
SQL Cheat Sheet Python
1 page
Unit 4 Spark SQL
No ratings yet
Unit 4 Spark SQL
49 pages
Transformation Notes
No ratings yet
Transformation Notes
12 pages
Session 12
No ratings yet
Session 12
67 pages
Pyspark and SQL
No ratings yet
Pyspark and SQL
57 pages
Journal
No ratings yet
Journal
47 pages
Lecture Week 4-Databases
No ratings yet
Lecture Week 4-Databases
28 pages
Group 3
No ratings yet
Group 3
56 pages
Mygica ATV1220T2-manual
No ratings yet
Mygica ATV1220T2-manual
24 pages
UsersGuide 30K 3.0.2 en 200714 PDF
No ratings yet
UsersGuide 30K 3.0.2 en 200714 PDF
95 pages
Wipro Data Analyst Interview Questions
No ratings yet
Wipro Data Analyst Interview Questions
29 pages
Write A C Program To Identify Different Types of Tokens in A Given Program
No ratings yet
Write A C Program To Identify Different Types of Tokens in A Given Program
46 pages
An Ce GC Ii 112 A - 7650a - Als
No ratings yet
An Ce GC Ii 112 A - 7650a - Als
20 pages
Half Yearly Answers
No ratings yet
Half Yearly Answers
10 pages
BCSL-021 Solved Assignment 2023-24 - Protected
No ratings yet
BCSL-021 Solved Assignment 2023-24 - Protected
18 pages
Day 76
No ratings yet
Day 76
10 pages
AWS Learning Material
No ratings yet
AWS Learning Material
13 pages
Python - Final 1
No ratings yet
Python - Final 1
17 pages
Pps Ui22cs57lab 10
No ratings yet
Pps Ui22cs57lab 10
17 pages
Day 62
No ratings yet
Day 62
9 pages
Day 60
No ratings yet
Day 60
10 pages
Day 24
No ratings yet
Day 24
8 pages
Pyspark Coding Interview Questions
No ratings yet
Pyspark Coding Interview Questions
19 pages
Set 1
No ratings yet
Set 1
16 pages
Comparison of SQL
No ratings yet
Comparison of SQL
11 pages
Q1. Difference Between Cache and Pe
No ratings yet
Q1. Difference Between Cache and Pe
13 pages
XII IP Model 1 Ans
No ratings yet
XII IP Model 1 Ans
8 pages
Day 77
No ratings yet
Day 77
10 pages
Unit 5 - Part B - Digital Presentations
No ratings yet
Unit 5 - Part B - Digital Presentations
7 pages
Py Spark 1
No ratings yet
Py Spark 1
11 pages
Day 57
No ratings yet
Day 57
11 pages
Day 27
No ratings yet
Day 27
6 pages
DBMS 3a (Employee, Department, Location)
No ratings yet
DBMS 3a (Employee, Department, Location)
6 pages
CSS Stylesheets
No ratings yet
CSS Stylesheets
31 pages
DBMS 3b (Employee Department Location)
No ratings yet
DBMS 3b (Employee Department Location)
9 pages
Gridview Column of Radio Buttons: Step 1: Creating The Enhancing The Gridview Web Pages
No ratings yet
Gridview Column of Radio Buttons: Step 1: Creating The Enhancing The Gridview Web Pages
27 pages
Spark Revision
No ratings yet
Spark Revision
16 pages
Day 28
No ratings yet
Day 28
5 pages
Practical Exam Papers (2024) (Set - 1 and 2) With Solutions
No ratings yet
Practical Exam Papers (2024) (Set - 1 and 2) With Solutions
8 pages
ED1072
No ratings yet
ED1072
7 pages
Code AMK
No ratings yet
Code AMK
10 pages
Practice Set 4
No ratings yet
Practice Set 4
6 pages
Program List Dbms
No ratings yet
Program List Dbms
8 pages
Practical Examination Sample Paper
No ratings yet
Practical Examination Sample Paper
4 pages
Practical Questions
No ratings yet
Practical Questions
7 pages
Topics - Oracle Licensing One Day Course
No ratings yet
Topics - Oracle Licensing One Day Course
4 pages
Questions For Preparation
No ratings yet
Questions For Preparation
9 pages
SQL & Python Interview Q&A
No ratings yet
SQL & Python Interview Q&A
7 pages
Business Intelligence and Analytics
No ratings yet
Business Intelligence and Analytics
8 pages
Ip Practical
No ratings yet
Ip Practical
3 pages
Practical Question
No ratings yet
Practical Question
3 pages
Executive Officer II (Cluster Allied Health Office)
No ratings yet
Executive Officer II (Cluster Allied Health Office)
3 pages
IoT - Internet of Things - Fruits and Vegetables Quality Monitoring Systems VIKAS KUMAR 17
No ratings yet
IoT - Internet of Things - Fruits and Vegetables Quality Monitoring Systems VIKAS KUMAR 17
3 pages
Practice Paper For Ip
No ratings yet
Practice Paper For Ip
3 pages
Aissce 2020 21
No ratings yet
Aissce 2020 21
3 pages
Interview Qs - Batch 34
No ratings yet
Interview Qs - Batch 34
5 pages
Brochure Philips Respironics V60 Plus Ventilator
No ratings yet
Brochure Philips Respironics V60 Plus Ventilator
4 pages
Deberes Macià 17 - 01 - 25
No ratings yet
Deberes Macià 17 - 01 - 25
2 pages
Big Data With Spark and Hadoop
No ratings yet
Big Data With Spark and Hadoop
9 pages
Pyspark Interview Questions
No ratings yet
Pyspark Interview Questions
4 pages
Sanya Sekhri Assignment
No ratings yet
Sanya Sekhri Assignment
2 pages
EDA With Pandas
No ratings yet
EDA With Pandas
8 pages
Data Cleaning Cheat Sheet
No ratings yet
Data Cleaning Cheat Sheet
2 pages
Class 12 IP Practical TASK
No ratings yet
Class 12 IP Practical TASK
3 pages
Py 4
No ratings yet
Py 4
2 pages
Muhammed Sop Harvard
No ratings yet
Muhammed Sop Harvard
1 page
Pyspark Distinct and Filter
No ratings yet
Pyspark Distinct and Filter
3 pages
The SQL LIKE Operator
No ratings yet
The SQL LIKE Operator
16 pages
MPC5xxx Programming The eTPU
No ratings yet
MPC5xxx Programming The eTPU
14 pages
User Administration - PostQuiz - Attempt Review
No ratings yet
User Administration - PostQuiz - Attempt Review
4 pages
The Basics of The Word Window
No ratings yet
The Basics of The Word Window
12 pages
Essential n8n Playbook
From Everand
Essential n8n Playbook
Leandro Calado
No ratings yet
DBMS Lab Manual
From Everand
DBMS Lab Manual
Jitendra Patel
1.5/5 (3)
Sql Plsql Oracle
From Everand
Sql Plsql Oracle
Andrew Igla
No ratings yet
SQL Server: Tips and Tricks - 1
From Everand
SQL Server: Tips and Tricks - 1
Priyanka Agarwal
5/5 (1)
JavaScript Patterns JumpStart Guide (Clean up your JavaScript Code)
From Everand
JavaScript Patterns JumpStart Guide (Clean up your JavaScript Code)
Dan Wahlin
4.5/5 (3)

Pyspark Spark SQL: Scenario Based Interview

Uploaded by

Pyspark Spark SQL: Scenario Based Interview

Uploaded by

Scenario Based Interview

from pyspark.sql import SparkSession

# Initialize Spark session

# Create schema and data for job_positions

# Create DataFrame for job_positions

# Create schema and data for job_employees

# Create DataFrame for job_employees

# Show the DataFrames

# Add a column pos_num to the job_employees_df to facilitate the join

# Perform the join and fill vacancies with "Vacant"

# Show the result

You might also like