0% found this document useful (0 votes)
0 views

Spark DataFrame Basics

This document introduces Spark DataFrames, emphasizing their familiarity for users of pandas, R, SQL, or Excel. It highlights the transition from the older RDD syntax to a more user-friendly DataFrame syntax in Spark 2.0 and above, allowing for easier data manipulation and transformation. The course will culminate in a project analyzing historical stock data, reinforcing the skills learned throughout the section.

Uploaded by

abhimanyu thakur
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

Spark DataFrame Basics

This document introduces Spark DataFrames, emphasizing their familiarity for users of pandas, R, SQL, or Excel. It highlights the transition from the older RDD syntax to a more user-friendly DataFrame syntax in Spark 2.0 and above, allowing for easier data manipulation and transformation. The course will culminate in a project analyzing historical stock data, reinforcing the skills learned throughout the section.

Uploaded by

abhimanyu thakur
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Spark DataFrames

Let’s learn something!


Python and Spark

● In this course the main way we will be


working with Python and Spark is
through the DataFrame Syntax.
● If you’ve worked with pandas in Python,
R, SQL or even Excel, a DataFrame will
feel very familiar!
Python and Spark

● Spark DataFrames hold data in a column


and row format.
● Each column represents some feature or
variable.
● Each row represents an individual data
point.
Python and Spark

● Spark began with something known as


the “RDD” syntax which was a little ugly
and tricky to learn.
● Now Spark 2.0 and higher has shifted
towards a DataFrame syntax which is
much cleaner and easier to work with!
Python and Spark

● Spark DataFrames are able to input and


output data from a wide variety of
sources.
● We can then use these DataFrames to
apply various transformations on the
data.
Python and Spark

● At the end of the transformation calls, we


can either show or collect the results to
display or for some final processing.
● In this section we’ll cover all the main
features of working with DataFrames
that you need to know.
Python and Spark

● Once we have a solid understanding of


Spark DataFrames, we can move on to
utilizing the DataFrame MLlib API for
Machine Learning.
Python and Spark

● After this section you will have a section


for a “DataFrame Project”.
● This Project will be an analysis of some
historical stock data information using all
the Spark knowledge from this section of
the course.
Python and Spark

● It will serve as a quick exercise review to


test all the skills learned in this section.
● Let’s get started with learning the basics
of Spark DataFrames!
Spark DataFrames
Project Exercise
Solutions

You might also like