Databricks Vs SQL Cheat Sheet

Uploaded by

Richard Smith

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

387 views11 pages

Databricks Vs SQL Cheat Sheet

Uploaded by

Richard Smith

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Data Engineering

Azure Databricks
Spark
vs.
SQL Functions

Cheat Sheet
1. Syntax Basics

SQL (Spark SQL

Aspect Spark (DataFrame API)
API)
Language
Python, Scala, Java SQL
Used
Execution Functional (methods on Declarative (SQL
Style DataFrame object) queries)
Familiarity for SQL
Ease of Use Ideal for programmatic logic
users
2. Select and Filter Operations\

SQL
Operation Spark Example (PySpark)
Equivalent
SELECT
Select column1,
df.select("column1", "column2")
Columns column2
FROM table
SELECT *
Filter FROM table
df.filter(col("column") > 10)
Rows WHERE
column > 10
SELECT
Alias column AS
df.select(col("column").alias("alias"))
Columns alias FROM
table
3. Aggregations

Operatio SQL
Spark Example (PySpark)
n Equivalent
SELECT
Group By column,
and df.groupBy("column").agg(avg("value" AVG(value)
Aggregat )) FROM table
e GROUP BY
column
SELECT
Count
df.count() COUNT(*)
Rows
FROM table
SELECT
Aggregat
SUM(value)
e
df.agg(sum("value"), max("value")) ,
Function
MAX(value)
s
FROM table
4. Joins

Spark Example
Operation SQL Equivalent
(PySpark)
SELECT * FROM df1 INNER
df1.join(df2, "key",
Inner Join JOIN df2 ON df1.key =
"inner")
df2.key
SELECT * FROM df1 LEFT
df1.join(df2, "key",
Left Join JOIN df2 ON df1.key =
"left")
df2.key
Cross SELECT * FROM df1 CROSS
df1.crossJoin(df2)
Join JOIN df2
5. Sorting

Operation Spark Example (PySpark) SQL Equivalent

SELECT * FROM
Order
df.orderBy("column") table ORDER BY
Ascending
column ASC
SELECT * FROM
Order
df.orderBy(desc("column")) table ORDER BY
Descending
column DESC
6. String Functions

Operati
Spark Example (PySpark) SQL Equivalent
on
SELECT
Substri df.select(substring("column", 1, SUBSTRING(colu
ng 3)) mn, 1, 3) FROM
table
SELECT * FROM
String
df.filter(col("column").contains("v table WHERE
Contain
alue")) column LIKE
s
'%value%'
SELECT
String df.select(regexp_replace("column REPLACE(colum
Replace ", "x", "y")) n, 'x', 'y') FROM
table
7. Date and Time Functions

Operation Spark Example (PySpark) SQL Equivalent

Current SELECT
df.select(current_date())
Date CURRENT_DATE()
SELECT
df.select(date_add(col("date"),
Add Days DATE_ADD(date,
5))
5) FROM table
SELECT
Date df.select(datediff(col("end"),
DATEDIFF(end,
Difference col("start")))
start) FROM table
8. Window Functions

Operat SQL
Spark Example (PySpark)
ion Equivalent
SELECT
column,
ROW_NUM
BER()
OVER
Row df.withColumn("row_num",
(PARTITIO
Numb row_number().over(Window.partitionBy("
N BY
er column")))
column)
AS
row_num
FROM
table
SELECT
column,
RANK()
OVER
df.withColumn("rank", (PARTITIO
Rank rank().over(Window.partitionBy("column" N BY
).orderBy("value"))) column
ORDER BY
value) AS
rank FROM
table
9. Null Handling

Spark Example
Operation SQL Equivalent
(PySpark)
Drop Null DELETE FROM table
df.na.drop()
Rows WHERE column IS NULL
SELECT
Replace df.na.fill("value",
COALESCE(column,
Null Values ["column"])
'value') FROM table
10. Miscellaneous

Operati SQL
Spark Example (PySpark)
on Equivalent
SELECT
Distinct DISTINCT
df.select("column").distinct()
Values column FROM
table
SELECT *
Sample FROM table
df.sample(fraction=0.1)
Rows TABLESAMPLE
(10 PERCENT)
Create Not
df.createOrReplaceTempView("view
Temp applicable in
_name")
View SQL directly

Database Systems Lab Manual
No ratings yet
Database Systems Lab Manual
137 pages
PySpark Comprehensive Notes
No ratings yet
PySpark Comprehensive Notes
59 pages
What Is Oracle
No ratings yet
What Is Oracle
10 pages
SQL - & - Pyspak
No ratings yet
SQL - & - Pyspak
6 pages
Pyspark Scenario-Based Interview Questions & Answers: Nitya Cloudtech PVT LTD
No ratings yet
Pyspark Scenario-Based Interview Questions & Answers: Nitya Cloudtech PVT LTD
12 pages
Non Unicode To Unicode
0% (2)
Non Unicode To Unicode
364 pages
Unit 3 Database Management System
No ratings yet
Unit 3 Database Management System
59 pages
Pyspark Basics
No ratings yet
Pyspark Basics
16 pages
Pyspark Syntax Using Simple Examples
No ratings yet
Pyspark Syntax Using Simple Examples
28 pages
Unit 4 Database Design and Development 4
No ratings yet
Unit 4 Database Design and Development 4
112 pages
Datastage Developer Guide
No ratings yet
Datastage Developer Guide
362 pages
Hospital
100% (1)
Hospital
5 pages
Pyspark Funcamentals
No ratings yet
Pyspark Funcamentals
10 pages
PySpark 30 Days Practice Guide?
100% (1)
PySpark 30 Days Practice Guide?
35 pages
Pyspark Cheatsheet
No ratings yet
Pyspark Cheatsheet
21 pages
Pyspark Scenario Based Qs
No ratings yet
Pyspark Scenario Based Qs
13 pages
Pyspark Interview 1738079940
No ratings yet
Pyspark Interview 1738079940
6 pages
SQL and PySpark
No ratings yet
SQL and PySpark
80 pages
Advanced SQL Concepts
No ratings yet
Advanced SQL Concepts
38 pages
Pyspark Hands On
No ratings yet
Pyspark Hands On
189 pages
SQ L Codes Example
No ratings yet
SQ L Codes Example
8 pages
Dbms Lab Manual
No ratings yet
Dbms Lab Manual
16 pages
PySpark Interview Cheatsheet 1741068112
No ratings yet
PySpark Interview Cheatsheet 1741068112
19 pages
Implementing User Profiles in ASP
No ratings yet
Implementing User Profiles in ASP
24 pages
HTML Code
No ratings yet
HTML Code
4 pages
Apache Spark Builtin Functions
No ratings yet
Apache Spark Builtin Functions
9 pages
Qus Bank 2
No ratings yet
Qus Bank 2
47 pages
JTI - ZefanyaPakasi
No ratings yet
JTI - ZefanyaPakasi
8 pages
Pyspark Practice
No ratings yet
Pyspark Practice
42 pages
Chapter 2
No ratings yet
Chapter 2
4 pages
DP080 Lecture 1
No ratings yet
DP080 Lecture 1
26 pages
Base de Datos Secretaria Del Medio Ambiente - SQL Server
No ratings yet
Base de Datos Secretaria Del Medio Ambiente - SQL Server
11 pages
Master Airflow With This Amazing Document!
No ratings yet
Master Airflow With This Amazing Document!
63 pages
20+ Key Difference in Spark
No ratings yet
20+ Key Difference in Spark
9 pages
Scenarios Where Bad Records Occur
No ratings yet
Scenarios Where Bad Records Occur
38 pages
Guide To Building AI Agents From Scratch
100% (5)
Guide To Building AI Agents From Scratch
17 pages
SQL Cheat Sheet Python
100% (1)
SQL Cheat Sheet Python
1 page
So What C5
No ratings yet
So What C5
3 pages
JavaMelody TAFJEEMonitoring T24tafj1.bankofabyssinia - Com 1 23 24
No ratings yet
JavaMelody TAFJEEMonitoring T24tafj1.bankofabyssinia - Com 1 23 24
4 pages
Data Engineer
No ratings yet
Data Engineer
19 pages
Lab Scalar Function and Arithmetic
No ratings yet
Lab Scalar Function and Arithmetic
19 pages
Must Know Pyspark Coding Before Databricks Interview
No ratings yet
Must Know Pyspark Coding Before Databricks Interview
7 pages
Prompting Techniques
100% (2)
Prompting Techniques
14 pages
Methods & Function in Databricks
No ratings yet
Methods & Function in Databricks
34 pages
Strings Shorted
No ratings yet
Strings Shorted
6 pages
SQL CV
No ratings yet
SQL CV
2 pages
Nist SourceCodeSecurityAnalyzers
No ratings yet
Nist SourceCodeSecurityAnalyzers
11 pages
PM Resume
No ratings yet
PM Resume
2 pages
Dell PowerProtect Data Manager Microsoft SQL Integration
No ratings yet
Dell PowerProtect Data Manager Microsoft SQL Integration
124 pages
SQL From WEBI Report
No ratings yet
SQL From WEBI Report
4 pages
02 Data - Engg - 23-24 Worksheet Practical#5b 1
No ratings yet
02 Data - Engg - 23-24 Worksheet Practical#5b 1
22 pages
PLSQL 2
No ratings yet
PLSQL 2
1 page
Fundamental Pyspark Operations 1708364268
No ratings yet
Fundamental Pyspark Operations 1708364268
10 pages
Data Engineering 101 SQL and PySpark 1727161935
No ratings yet
Data Engineering 101 SQL and PySpark 1727161935
58 pages
Etl Commands For Pyspark
No ratings yet
Etl Commands For Pyspark
8 pages
Pyspark Cheatsheet
No ratings yet
Pyspark Cheatsheet
10 pages
Day 89
No ratings yet
Day 89
9 pages
PySpark Cheatsheet
No ratings yet
PySpark Cheatsheet
12 pages
PYSPARK Interview Questions
100% (3)
PYSPARK Interview Questions
126 pages
Questions For Preparation
No ratings yet
Questions For Preparation
9 pages
DBMS Lab 8
No ratings yet
DBMS Lab 8
3 pages
Unit 4 Spark SQL
No ratings yet
Unit 4 Spark SQL
49 pages
PySpark Notes
No ratings yet
PySpark Notes
64 pages
HTML Code
No ratings yet
HTML Code
3 pages
Data Source Controls
No ratings yet
Data Source Controls
5 pages
Python Portfolio Project For Data Analyst
No ratings yet
Python Portfolio Project For Data Analyst
13 pages
Pyspark 12 Questions
No ratings yet
Pyspark 12 Questions
8 pages
PySpark Transformations
No ratings yet
PySpark Transformations
18 pages
Spark Essentials
No ratings yet
Spark Essentials
15 pages
SQL PySpark Cheat Sheet 1731729790
No ratings yet
SQL PySpark Cheat Sheet 1731729790
9 pages
SQL & pySPARK
No ratings yet
SQL & pySPARK
9 pages
Py Spark
No ratings yet
Py Spark
7 pages
SQL Vs Pyspark-1
No ratings yet
SQL Vs Pyspark-1
9 pages
Interview Q & A (SQL Spark HIVE Airflow AWS Kafka) - 1
No ratings yet
Interview Q & A (SQL Spark HIVE Airflow AWS Kafka) - 1
25 pages
Tiger Analytics 1735834470
No ratings yet
Tiger Analytics 1735834470
27 pages
Pyspark Coding Interview Questions
No ratings yet
Pyspark Coding Interview Questions
19 pages
TCS Rejected Many Due To Weak PySpark Logic!?
No ratings yet
TCS Rejected Many Due To Weak PySpark Logic!?
7 pages
Spark Commands
No ratings yet
Spark Commands
3 pages
Q1. Difference Between Cache and Pe
No ratings yet
Q1. Difference Between Cache and Pe
13 pages
SQL Learning Hub
No ratings yet
SQL Learning Hub
5 pages
Basic DataFrame Operation
No ratings yet
Basic DataFrame Operation
11 pages
Comparison of SQL
No ratings yet
Comparison of SQL
11 pages
Data Modelling
No ratings yet
Data Modelling
40 pages
Pyspark SQL and DataFrames
No ratings yet
Pyspark SQL and DataFrames
6 pages
Data Engineer Question
No ratings yet
Data Engineer Question
33 pages
Your Name Syllabus Checklist
No ratings yet
Your Name Syllabus Checklist
7 pages
AI Using Python
No ratings yet
AI Using Python
9 pages
IBM PySpark CheatSheet
No ratings yet
IBM PySpark CheatSheet
2 pages
A Project Report On: Food Ordering Management System
No ratings yet
A Project Report On: Food Ordering Management System
39 pages
Day 11 Notes
No ratings yet
Day 11 Notes
3 pages
Step-By-Step Method To Find Drop Off Points in A User Flow
No ratings yet
Step-By-Step Method To Find Drop Off Points in A User Flow
17 pages
Day11 Notes
No ratings yet
Day11 Notes
2 pages
Full Load
No ratings yet
Full Load
16 pages
Pyspark Distinct and Filter
No ratings yet
Pyspark Distinct and Filter
3 pages
SQL Vs PySpark 1678871778
No ratings yet
SQL Vs PySpark 1678871778
8 pages
PySpark Questions
No ratings yet
PySpark Questions
5 pages
File Types in Data Engineering!
No ratings yet
File Types in Data Engineering!
18 pages
Azure DE Roadmap2024
No ratings yet
Azure DE Roadmap2024
10 pages
?????? ???????? ??????????
No ratings yet
?????? ???????? ??????????
5 pages
Spark - groupByKey Vs reduceByKey
No ratings yet
Spark - groupByKey Vs reduceByKey
3 pages
PySpark Data Frame Questions PDF
100% (2)
PySpark Data Frame Questions PDF
57 pages
Pyspark Cashing & Persisting - Complete Guide
No ratings yet
Pyspark Cashing & Persisting - Complete Guide
3 pages
Discover India's Path To Net-Zero - Sustainable Growth & Green Energy!
No ratings yet
Discover India's Path To Net-Zero - Sustainable Growth & Green Energy!
1 page
SQL To Pyspark Conversion
No ratings yet
SQL To Pyspark Conversion
9 pages
EDA With Pandas
No ratings yet
EDA With Pandas
8 pages
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
From Everand
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
Eric Tome
No ratings yet
Beginning Microsoft SQL Server 2012 Programming
From Everand
Beginning Microsoft SQL Server 2012 Programming
Paul Atkinson
1/5 (1)
Crack the Data Analyst Interview: Real-Time Questions & Expert Answers
From Everand
Crack the Data Analyst Interview: Real-Time Questions & Expert Answers
Yash d.
No ratings yet
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
From Everand
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
Robert Johnson
No ratings yet

Databricks Vs SQL Cheat Sheet

Uploaded by

Databricks Vs SQL Cheat Sheet

Uploaded by

Data Engineering

SQL (Spark SQL

Operation Spark Example (PySpark) SQL Equivalent

Operation Spark Example (PySpark) SQL Equivalent

You might also like