Pyspark Intro
Pyspark Intro
Basics
Initialize SparkSession:
python
DataFrame Operations
Create DataFrame:
python
Show Data:
python
df.show()
Filter Rows:
python
Select Columns:
python
1
df.select("column_name").show()
Aggregations
python
df.groupBy("column_name").agg({"another_column": "sum"}).show()
SQL Queries
python
df.createOrReplaceTempView("table_name")
String Operations
Concatenate Columns:
python
Window Functions
python
2
window_spec = Window.partitionBy("column_name").orderBy("another_column")
df.withColumn("row_number", row_number().over(window_spec)).show()