0% found this document useful (0 votes)
4K views1 page

Py Spark Final

The document provides instructions for entering a PySpark shell, creating a DataFrame, performing operations on the DataFrame like describing columns and selecting/ordering data, and then stopping the Spark environment and exiting the shell. It shows how to import SparkSession and related packages, create a SparkSession, create a sample DataFrame with sample data, print the schema, describe the 'Age' column, select and order columns, and finally stop Spark and exit the shell.

Uploaded by

roy.scar2196
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4K views1 page

Py Spark Final

The document provides instructions for entering a PySpark shell, creating a DataFrame, performing operations on the DataFrame like describing columns and selecting/ordering data, and then stopping the Spark environment and exiting the shell. It shows how to import SparkSession and related packages, create a SparkSession, create a sample DataFrame with sample data, print the schema, describe the 'Age' column, select and order columns, and finally stop Spark and exit the shell.

Uploaded by

roy.scar2196
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 1

Entering PySpark Shell

Enter the pyspark shell:

pyspark

Import the Spark session package:

from __future__ import print_function

from pyspark.sql import *

from pyspark import SparkContext

from pyspark.sql.readwriter import DataFrameWriter

from pyspark.sql import SparkSession

Create the Spark session:

spark = SparkSession \
.builder \
.appName("Python Spark SQL basic example") \
.config("spark.some.config.option", "some-value") \
.getOrCreate()

A Spark session is created.

CONTINUE
Terminal
Powered By Katacoda

Creating a DataFrame
Create a DataFrame:

df = spark.createDataFrame([("1","Jack", 22,"Data Science"), ("2","Luke", 21,"Data


Analytics"),("3","Leo", 24,"Micro Services"),("4","Mark", 21,"Data Analytics")],
["ID", "Name","Age","Area of Intrest"])

Display the schema:

df.printSchema()

More Operations
Describe the column 'Age', and observe the various statistical parameters:

df.describe('Age').show()

Select the columns ID, Name, and Age, and display the result in descending order:

df.select('ID','Name','Age').orderBy('Name',ascending=False).show()

Stop the Spark environment:

spark.stop()

Exit from the shell:

exit()

You might also like