Ultimate Developer Command Guide
Complete Python, PySpark & SQL Reference
All essential commands in one optimized cheat sheet
Python Commands
Command Function Example
print("Welcome to
print() Outputs data to console Python!") # Prints:
Welcome to Python!
my_list = [1, 2, 3];
len() Returns length of an object print(len(my_list)) #
Output: 3
for i in range(3):
range() Generates a sequence of numbers print(i) # Prints: 0, 1,
2
def greet(name): return
f"Hello, {name}";
def Defines a custom function
print(greet("Alice")) #
Prints: Hello, Alice
import math;
import Imports a module or library print(math.pi) # Prints:
3.141592653589793
squares = [x**2 for x in
[x for x in [1, 2, 3]];
Creates a list using comprehension
iterable] print(squares) # Prints:
[1, 4, 9]
x = 10; if x > 5:
print("Big") else:
if/elif/else Conditional logic
print("Small") # Prints:
Big
for fruit in ["apple",
for Iterates over a sequence "banana"]: print(fruit) #
Prints: apple, banana
count = 0; while count <
while Loops until condition is false 3: print(count); count +=
1 # Prints: 0, 1, 2
try: print(1/0) except
ZeroDivisionError:
try/except Handles exceptions print("Cannot divide by
zero") # Prints: Cannot
divide by zero
with open("example.txt",
"w") as f:
open() Opens a file for reading/writing
f.write("Hello") #
Creates file with text
my_list = [];
my_list.append(5);
list.append() Adds an item to a list
print(my_list) # Prints:
[5]
my_dict = {"key":
"value"};
dict.get() Retrieves value from dictionary
print(my_dict.get("key"))
# Prints: value
PySpark Commands
Command/Function Function Example
Initializes a Spark from pyspark.sql import SparkSession; spark =
SparkSession.builder
session SparkSession.builder.appName("MyApp").getOrCreate()
Loads CSV file into df = spark.read.csv("data.csv", header=True,
spark.read.csv()
a DataFrame inferSchema=True); df.show() # Displays CSV data
Displays first n
df.show() df.show(3) # Shows first 3 rows
rows of DataFrame
Displays
df.printSchema() df.printSchema() # Shows column names and types
DataFrame schema
Selects specific df.select("name", "age").show() # Shows name and
df.select()
columns age columns
Filters rows based df.filter(df.age > 25).show() # Shows rows where
df.filter()
on condition age > 25
df.where("salary > 50000").show() # Filters rows
df.where() Alias for filter
where salary > 50000
Groups data and df.groupBy("department").agg({"salary":
df.groupBy().agg()
applies aggregation "avg"}).show() # Shows avg salary per dept
Joins two df1.join(df2, df1.id == df2.id, "inner").show() #
df.join()
DataFrames Inner join on id
Adds or modifies a df.withColumn("age_plus_10", df.age + 10).show() #
df.withColumn()
column Adds column with age + 10
df.withColumnRenamed("old_name", "new_name").show()
df.withColumnRenamed() Renames a column
# Renames column
Drops specified
df.drop() df.drop("salary").show() # Drops salary column
columns
Replaces null df.fillna({"age": 0}).show() # Replaces null ages
df.fillna()
values with 0
Removes duplicate df.dropDuplicates(["name"]).show() # Drops
df.dropDuplicates()
rows duplicate names
Saves DataFrame df.write.csv("output.csv", mode="overwrite") #
df.write.csv()
as CSV Saves DataFrame to CSV
Registers
df.createOrReplaceTempView("temp_table") # Creates
df.createOrReplaceTempView() DataFrame as SQL
SQL view
table
Runs SQL query on spark.sql("SELECT name FROM temp_table WHERE age >
spark.sql()
DataFrame 30").show() # Runs SQL query
from pyspark.sql.window import Window; w =
Defines window for Window.partitionBy("dept").orderBy("salary");
Window.partitionBy()
ranking/aggregation df.withColumn("rank", row_number().over(w)).show()
# Adds rank column
SQL Commands
Command Function Example
SELECT name, age FROM
SELECT Retrieves data from a table employees # Selects
name and age columns
SELECT * FROM employees
WHERE age > 30 #
WHERE Filters rows based on condition
Filters employees older
than 30
SELECT * FROM employees
ORDER BY salary DESC #
ORDER BY Sorts result set
Sorts by salary in
descending order
SELECT department,
AVG(salary) FROM
GROUP BY Groups rows for aggregation employees GROUP BY
department # Avg salary
per dept
SELECT department,
COUNT(*) FROM employees
GROUP BY department
HAVING Filters grouped results
HAVING COUNT(*) > 5 #
Depts with > 5
employees
SELECT e.name,
d.dept_name FROM
employees e JOIN
JOIN Combines rows from multiple tables
departments d ON
e.dept_id = d.id #
Joins tables
SELECT e.name,
d.dept_name FROM
employees e LEFT JOIN
LEFT JOIN Includes all rows from left table
departments d ON
e.dept_id = d.id # Left
join
SELECT * FROM employees
LIMIT Restricts number of returned rows LIMIT 5 # Returns first
5 rows
INSERT INTO employees
(name, age) VALUES
INSERT INTO Adds new rows to a table
('Alice', 28) # Inserts
a new employee
UPDATE employees SET
salary = 60000 WHERE
UPDATE Modifies existing rows
name = 'Alice' #
Updates salary
DELETE FROM employees
WHERE age < 18 #
DELETE Removes rows from a table
Deletes rows where age
< 18
CREATE TABLE employees
(id INT, name
CREATE TABLE Creates a new table
VARCHAR(50), age INT) #
Creates employees table
ALTER TABLE employees
ADD COLUMN salary
ALTER TABLE Modifies table structure
DECIMAL(10,2) # Adds
salary column
DROP TABLE employees #
DROP TABLE Deletes a table
Deletes employees table
Cheat Sheet Summary
Your comprehensive reference for daily development tasks across Python, PySpark and SQL
Version 2.0 | Updated: August 2024
Print Tip: Use Ctrl+P (Win) / Cmd+P (Mac) to save as PDF