Pyspark_tutorial_3
Pyspark_tutorial_3
PySpark Tutorial 3:
Advanced DataFrame
Transformations
Objective
Prerequisites
Students need Python and PySpark installed. This ensures they can run
the code examples
Additional Exercises(optional ):
1. Analyze attendance records for each student and compute their attendance percentage for a
semester.
2. Use a window function to calculate the cumulative GPA for students across multiple
semesters.
3. Identify students who have consistently scored below a certain threshold across all subjects.
Objective:
Exercises:
Additional Exercises:
Objective:
Exercises:
Additional Exercises:
Objective:
Exercises:
Additional Exercises:
Objective:
Exercises:
Objective:
Exercises:
1. Replace missing values with the mean or median for numeric columns.
2. Identify and remove duplicate rows from a dataset.
3. Detect and correct inconsistent data (e.g., invalid scores like -1).
Additional Exercises: