0% found this document useful (0 votes)

209 views6 pages

Pyspark Vs Spark SQL

Uploaded by

Sozha Vendhan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

209 views6 pages

Pyspark Vs Spark SQL

Uploaded by

Sozha Vendhan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Scenario Based Interview

Pyspark vs
Spark SQL

Ganesh. R
#Problem Statement You are the restaurant owner and you want to analyze a possible
expansion (there will be at least one customer every day).

Compute the moving average of how much the customer paid in a seven days window (i.e.,
current day + 6 days before). average_amount should be rounded to two decimal places.

Return the result table ordered by visited_on in ascending order.

from pyspark.sql import SparkSession

from pyspark.sql import functions as F
from pyspark.sql.functions import col, sum, round, window
from pyspark.sql.types import DateType

# Initialize Spark session

spark = SparkSession.builder.appName("MovingAverage").getOrCreate()

# Sample data
data = [
(1, "Jhon", "2019-01-01", 100),
(2, "Daniel", "2019-01-02", 110),
(3, "Jade", "2019-01-03", 120),
(4, "Khaled", "2019-01-04", 130),
(5, "Winston", "2019-01-05", 110),
(6, "Elvis", "2019-01-06", 140),
(7, "Anna", "2019-01-07", 150),
(8, "Maria", "2019-01-08", 80),
(9, "Jaze", "2019-01-09", 110),
(1, "Jhon", "2019-01-10", 130),
(3, "Jade", "2019-01-10", 150),
]

# Create DataFrame
columns = ["customer_id", "name", "visited_on", "amount"]
df = spark.createDataFrame(data, schema=columns)

df.display()
df.printSchema()

root
|-- customer_id: long (nullable = true)
|-- name: string (nullable = true)
|-- visited_on: string (nullable = true)
|-- amount: long (nullable = true)

# Define a window specification

window_spec = Window.orderBy("visited_on").rowsBetween(-6, 0)

# Calculate the rolling sum and average

result_df = (
df.groupBy("visited_on")
.agg(sum("amount").alias("daily_amount"))
.withColumn("amount", sum("daily_amount").over(window_spec))
.withColumn("average_amount",
round(avg("daily_amount").over(window_spec), 2))
)

# Filter to include only rows where row_number >= 7

result_df = (
result_df.withColumn("row_number",
row_number().over(Window.orderBy("visited_on")))
.filter(col("row_number") >= 7)
.select("visited_on", "amount", "average_amount")
)

# Show the result

result_df.display()

df.createOrReplaceTempView("Customer")

%sql
WITH CustomerGrouped AS (
SELECT
visited_on,
SUM(amount) AS total_amount
FROM
Customer
GROUP BY
visited_on
),
MovingAverage AS (
SELECT
visited_on,
total_amount,
SUM(total_amount) OVER (
ORDER BY
visited_on ROWS BETWEEN 6 PRECEDING
AND CURRENT ROW
) AS sum_amount_7d
FROM
CustomerGrouped
)
SELECT
visited_on,
sum_amount_7d AS amount,
ROUND(sum_amount_7d / 7, 2) AS average_amount
FROM
MovingAverage
WHERE
DATEDIFF(
visited_on,
(
SELECT
MIN(visited_on)
FROM
CustomerGrouped
)
) >= 6
ORDER BY
visited_on;
IF YOU FOUND
THIS POST
USEFUL, PLEASE
SAVE IT.

Ganesh. R
+91-9030485102. Hyderabad, Telangana. [email protected]

https://medium.com/@rganesh0203 https://rganesh203.github.io/Portfolio/
https://github.com/rganesh203. https://www.linkedin.com/in/r-ganesh-a86418155/

https://www.instagram.com/rg_data_talks/ https://topmate.io/ganesh_r0203

PySpark Comprehensive Notes
No ratings yet
PySpark Comprehensive Notes
59 pages
Pyspark
No ratings yet
Pyspark
31 pages
Learning SQL Zero To Hero
100% (1)
Learning SQL Zero To Hero
110 pages
Top 12 Python Libraries
No ratings yet
Top 12 Python Libraries
15 pages
Moto Boxer 150
No ratings yet
Moto Boxer 150
32 pages
SCD in Databricks
No ratings yet
SCD in Databricks
16 pages
PySpark FP Course ID 58339
No ratings yet
PySpark FP Course ID 58339
44 pages
PySpark Cheat Sheet For RDD Operations
No ratings yet
PySpark Cheat Sheet For RDD Operations
1 page
PySpark Transformations Tutorial
100% (1)
PySpark Transformations Tutorial
58 pages
800+ Marked Question PDF
No ratings yet
800+ Marked Question PDF
446 pages
Databricks Course Curriculum
No ratings yet
Databricks Course Curriculum
2 pages
Window Function in Pyspark
100% (1)
Window Function in Pyspark
8 pages
Spark Optimizations & Deployment
No ratings yet
Spark Optimizations & Deployment
39 pages
Spark Interview Questions
No ratings yet
Spark Interview Questions
3 pages
Controller User Manual Stag-4 Qbox Basic Stag-4 Qbox Plus Stag-4 Qnext Plus Stag-300 Qmax Basic Stag-300 Qmax Plus
100% (1)
Controller User Manual Stag-4 Qbox Basic Stag-4 Qbox Plus Stag-4 Qnext Plus Stag-300 Qmax Basic Stag-300 Qmax Plus
65 pages
AECOM Handbook 2023 21 30
No ratings yet
AECOM Handbook 2023 21 30
10 pages
PySpark Tutorial For Beginners - Python Examples - Spark by (Examples)
No ratings yet
PySpark Tutorial For Beginners - Python Examples - Spark by (Examples)
19 pages
What Is Spark?: Up To 100× Faster
No ratings yet
What Is Spark?: Up To 100× Faster
56 pages
The Value of Urban Design
No ratings yet
The Value of Urban Design
4 pages
Pyspark Practice
No ratings yet
Pyspark Practice
42 pages
Pyspark Questions & Scenario Based
No ratings yet
Pyspark Questions & Scenario Based
25 pages
Slide 10 PySpark - SQL
No ratings yet
Slide 10 PySpark - SQL
131 pages
Top Pyspark InterviewQuestions
No ratings yet
Top Pyspark InterviewQuestions
21 pages
Data Engineering Roadmap 2023
No ratings yet
Data Engineering Roadmap 2023
1 page
Spark Interview QUestions
No ratings yet
Spark Interview QUestions
200 pages
4.236M Parts Catalog
100% (4)
4.236M Parts Catalog
53 pages
4 - Action and RDD Transformations
No ratings yet
4 - Action and RDD Transformations
25 pages
"Analytics Using Apache Spark": (Lightening Fast Cluster Computing)
No ratings yet
"Analytics Using Apache Spark": (Lightening Fast Cluster Computing)
99 pages
Etl Commands For Pyspark
No ratings yet
Etl Commands For Pyspark
8 pages
Spark A To Z
No ratings yet
Spark A To Z
63 pages
Faq
No ratings yet
Faq
170 pages
STATS Stem and Leaf Plots
No ratings yet
STATS Stem and Leaf Plots
5 pages
Learning Episode 12-"Selecting Non-Digital or Conventional Resources and Instructional Materials"
No ratings yet
Learning Episode 12-"Selecting Non-Digital or Conventional Resources and Instructional Materials"
6 pages
Scala Currying
No ratings yet
Scala Currying
13 pages
Manuel Castells - Local and Global (Cities in The Network Society)
No ratings yet
Manuel Castells - Local and Global (Cities in The Network Society)
11 pages
SQL Fundamentals
No ratings yet
SQL Fundamentals
61 pages
Dec 01 2020
No ratings yet
Dec 01 2020
298 pages
Pyspark Interview Questions: Click Here
0% (1)
Pyspark Interview Questions: Click Here
35 pages
Wing Chun PDF
No ratings yet
Wing Chun PDF
8 pages
Spark SQL Optimization
No ratings yet
Spark SQL Optimization
29 pages
O Reilly Data Lake Bootcamp Day 11694182865124
No ratings yet
O Reilly Data Lake Bootcamp Day 11694182865124
46 pages
Azure Comapny Wise Question
No ratings yet
Azure Comapny Wise Question
68 pages
Delta Table and Pyspark Interview Questions
100% (1)
Delta Table and Pyspark Interview Questions
14 pages
Statement of Account: Date Narration Chq./Ref - No. Value DT Withdrawal Amt. Deposit Amt. Closing Balance
No ratings yet
Statement of Account: Date Narration Chq./Ref - No. Value DT Withdrawal Amt. Deposit Amt. Closing Balance
26 pages
Python Tutorial
No ratings yet
Python Tutorial
37 pages
Capitalstructureplanning 1
No ratings yet
Capitalstructureplanning 1
51 pages
Arciga vs. Maniwang - Case Digest
100% (3)
Arciga vs. Maniwang - Case Digest
2 pages
Spark Interview Q&A
No ratings yet
Spark Interview Q&A
31 pages
PySpark Reference Guide
No ratings yet
PySpark Reference Guide
2 pages
Spark Tutorial
No ratings yet
Spark Tutorial
8 pages
IMC Group Assignment
No ratings yet
IMC Group Assignment
26 pages
Ministry Magazine - A Theological Approach To Pastoral Leadership Today
No ratings yet
Ministry Magazine - A Theological Approach To Pastoral Leadership Today
11 pages
EY & Zepto Data Analyst Interview Questions
No ratings yet
EY & Zepto Data Analyst Interview Questions
24 pages
PySpark VS SQL Interview Questions
100% (1)
PySpark VS SQL Interview Questions
16 pages
2
No ratings yet
2
8 pages
6 To 8 Final Answer Key
No ratings yet
6 To 8 Final Answer Key
8 pages
Xavier-Kuangchi Exemplary Alumni: Objectives
No ratings yet
Xavier-Kuangchi Exemplary Alumni: Objectives
5 pages
ETL Processes Using PySpark
67% (3)
ETL Processes Using PySpark
7 pages
Rosenbloom 2013
No ratings yet
Rosenbloom 2013
4 pages
Scheme of Work For Third Term JSS 3-1
No ratings yet
Scheme of Work For Third Term JSS 3-1
16 pages
Optimization of MACD and RSI Indicators: An Empirical Study of Indian Equity Market For Profitable Investment Decisions
No ratings yet
Optimization of MACD and RSI Indicators: An Empirical Study of Indian Equity Market For Profitable Investment Decisions
13 pages
Judicial Activism
No ratings yet
Judicial Activism
9 pages
Spark Optimization PDF
100% (1)
Spark Optimization PDF
14 pages
G.R. No. 141314 April 9, 2003 Republic of The Philippines, Represented by Energy Regulatory BOARD, Petitioner, MANILA ELECTRIC COMPANY, Respondent
No ratings yet
G.R. No. 141314 April 9, 2003 Republic of The Philippines, Represented by Energy Regulatory BOARD, Petitioner, MANILA ELECTRIC COMPANY, Respondent
2 pages
Practice Test 2 Bus2023 Spring09 Solutions
No ratings yet
Practice Test 2 Bus2023 Spring09 Solutions
15 pages
Notification Regarding Online Form Fill-Up For H. S. Final Examination-2025
No ratings yet
Notification Regarding Online Form Fill-Up For H. S. Final Examination-2025
2 pages
SQL To Pyspark Conversion
No ratings yet
SQL To Pyspark Conversion
9 pages
Sectors of Economy Sikkim
No ratings yet
Sectors of Economy Sikkim
7 pages
Data Bricks
No ratings yet
Data Bricks
20 pages
Vacation Test Paper - 23.05.2022
No ratings yet
Vacation Test Paper - 23.05.2022
5 pages
Apache Spark
No ratings yet
Apache Spark
62 pages
NAGARAJ CV 2024 - May
No ratings yet
NAGARAJ CV 2024 - May
3 pages
Databricks Questions
No ratings yet
Databricks Questions
23 pages
Apache Spark Interview Questions
No ratings yet
Apache Spark Interview Questions
12 pages
Learning Real-Time Processing With Spark Streaming - Sample Chapter
No ratings yet
Learning Real-Time Processing With Spark Streaming - Sample Chapter
30 pages
Databricks Question
No ratings yet
Databricks Question
7 pages
Innovation and Entrepreneurship Strategy
No ratings yet
Innovation and Entrepreneurship Strategy
3 pages
Databricks Sparkconfig 1669383836
No ratings yet
Databricks Sparkconfig 1669383836
1 page
20 PySpark Problems
No ratings yet
20 PySpark Problems
22 pages
Lec 21
No ratings yet
Lec 21
16 pages
Final Print Py Spark
No ratings yet
Final Print Py Spark
133 pages
Databricks For The SQL Developer: Gerhard Brueckl
No ratings yet
Databricks For The SQL Developer: Gerhard Brueckl
40 pages
Apache Druid: Sudhindra Tirupati Nagaraj
No ratings yet
Apache Druid: Sudhindra Tirupati Nagaraj
12 pages
Best Practices For Bucketing in Spark SQL - by David Vrba - Towards Data Science
No ratings yet
Best Practices For Bucketing in Spark SQL - by David Vrba - Towards Data Science
27 pages
Pyspark Cashing & Persisting - Complete Guide
No ratings yet
Pyspark Cashing & Persisting - Complete Guide
3 pages
DP600 Code Used 240514
No ratings yet
DP600 Code Used 240514
27 pages
Microsoft Certified: Azure Data Engineer Associate - Skills Measured
No ratings yet
Microsoft Certified: Azure Data Engineer Associate - Skills Measured
4 pages
PySpark Transformations
No ratings yet
PySpark Transformations
18 pages
Task 1 Vijaya Lakshman PDF
No ratings yet
Task 1 Vijaya Lakshman PDF
10 pages
SQL Interview Questions and Answers: What Is SQL and Where Does It Come From?
No ratings yet
SQL Interview Questions and Answers: What Is SQL and Where Does It Come From?
9 pages
SCD Type-1,2 Implementation in Pyspark
No ratings yet
SCD Type-1,2 Implementation in Pyspark
6 pages
Databricksmcqsquestionsandanswers
No ratings yet
Databricksmcqsquestionsandanswers
5 pages
Window Functions in SQL and PySpark
No ratings yet
Window Functions in SQL and PySpark
5 pages
Questions For Preparation
No ratings yet
Questions For Preparation
9 pages
TCS Rejected Many Due To Weak PySpark Logic!?
No ratings yet
TCS Rejected Many Due To Weak PySpark Logic!?
7 pages

Pyspark Vs Spark SQL

Uploaded by

Pyspark Vs Spark SQL

Uploaded by

Scenario Based Interview

Return the result table ordered by visited_on in ascending order.

from pyspark.sql import SparkSession

# Initialize Spark session

# Define a window specification

# Calculate the rolling sum and average

# Filter to include only rows where row_number >= 7

# Show the result

You might also like