This document summarizes key concepts and APIs in PySpark 3.0. It covers Spark fundamentals like RDDs, DataFrames and Datasets. It also covers PySpark modules for SQL, streaming, machine learning and graph processing. Finally it summarizes common DataFrame transformations and actions for manipulating data as well as Spark SQL functionality.
This document summarizes key concepts and APIs in PySpark 3.0. It covers Spark fundamentals like RDDs, DataFrames and Datasets. It also covers PySpark modules for SQL, streaming, machine learning and graph processing. Finally it summarizes common DataFrame transformations and actions for manipulating data as well as Spark SQL functionality.
Databricks Certified Associate Developer for Apache Spark Using Python: The ultimate guide to getting certified in Apache Spark using practical examples with Python
Mastering Data Engineering and Analytics with Databricks: A Hands-on Guide to Build Scalable Pipelines Using Databricks, Delta Lake, and MLflow (English Edition)