0% found this document useful (0 votes)

14 views4 pages

PySpark Exam Setup and Basic Code Guide

Uploaded by

Nakul Arora

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views4 pages

PySpark Exam Setup and Basic Code Guide

Uploaded by

Nakul Arora

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

PySpark Exam Setup and Basic Code

Guide
### 1. Unzipping the File

Assuming the file is zipped, unzip it first:

```bash
cd /home/ashok/Documents
unzip <your_zip_file_name>.zip
```

### 2. Start Hadoop

Ensure that Hadoop is started as it is essential for PySpark operations:

```bash
start-dfs.sh
start-yarn.sh
```
Check if Hadoop is running by opening these URLs:
- HDFS: http://localhost:9870
- YARN: http://localhost:8088

### 3. Navigate to the Folder

Go to the folder where the notebook file is located:

```bash
cd /home/ashok/Documents/qpaper
```

### 4. Start PySpark Notebook

Start the PySpark Jupyter Notebook using the following command:

```bash
pysparknb
```
### 5. Basic PySpark Code in the Notebook

Once you have opened the notebook (.ipynb), you can start with these basic codes to ensure
everything is working fine.

##### a. Import Required Libraries

```python
from pyspark.sql import SparkSession
```

##### b. Initialize Spark Session

```python
spark = SparkSession.builder \
.appName("Exam Setup") \
.getOrCreate()

# Check Spark version to verify the environment

print(spark.version)
```

##### c. Basic DataFrame Setup

Create a small DataFrame to verify if PySpark is working:

```python
# Sample Data
data = [("Ashok", 1), ("John", 2), ("Doe", 3)]

# Creating DataFrame
df = spark.createDataFrame(data, ["Name", "ID"])

# Show DataFrame
df.show()
```

##### d. Reading Data from HDFS

To ensure HDFS is running, you can check with the following command:
```bash
!hdfs dfs -ls /
```
You can also read files from HDFS if needed:
```python
# Example to read from HDFS if a file is stored there
df = spark.read.csv("hdfs://localhost:9000/path/to/your/file.csv", header=True)
df.show()
```

##### e. Basic DataFrame Operations

You can perform a few basic operations to manipulate data:

```python
# Show schema
df.printSchema()

# Select specific columns

df.select("Name").show()

# Filter rows
df.filter(df["ID"] > 1).show()

# Group by a column
df.groupBy("Name").count().show()
```

##### f. Saving DataFrame to HDFS

If you need to save the DataFrame back to HDFS:

```python
df.write.csv("hdfs://localhost:9000/path/to/output_folder", header=True)
```

### 6. Saving and Renaming the Notebook

Periodically save your progress by pressing Ctrl+S. After completing your work or at any
point, rename the notebook as instructed:
- Click the title at the top of the notebook.
- Rename it to your roll number (e.g., '123456').

### 7. Shut Down Spark and Hadoop After Completion

Once you finish your work, stop Hadoop services to free up resources:
```bash
stop-yarn.sh
stop-dfs.sh
```

Hands - On Exercise: Using The Spark Shell..................................
100% (2)
Hands - On Exercise: Using The Spark Shell..................................
13 pages
Etl Commands For Pyspark
No ratings yet
Etl Commands For Pyspark
8 pages
A Tolerated Margin of Mess
100% (1)
A Tolerated Margin of Mess
41 pages
14-QSP - 59 Procedure For Temporary Change of Process Controls PDF
50% (2)
14-QSP - 59 Procedure For Temporary Change of Process Controls PDF
1 page
Ordered and Unordered Lists
No ratings yet
Ordered and Unordered Lists
21 pages
RDD
No ratings yet
RDD
4 pages
Suppose You Have A Large Dataset Stored in A Distributed File System Like HDFS
No ratings yet
Suppose You Have A Large Dataset Stored in A Distributed File System Like HDFS
11 pages
PySpark Cheatsheet
No ratings yet
PySpark Cheatsheet
12 pages
Pyspark Funcamentals
No ratings yet
Pyspark Funcamentals
10 pages
Fundamental Pyspark Operations 1708364268
No ratings yet
Fundamental Pyspark Operations 1708364268
10 pages
Spark Interview Questions PDF 2
No ratings yet
Spark Interview Questions PDF 2
19 pages
SIC - Big Data - Chapter 6 - Workbook
No ratings yet
SIC - Big Data - Chapter 6 - Workbook
133 pages
Pyspark Tutorial
100% (2)
Pyspark Tutorial
27 pages
Apache Spark with Scala - cheatsheet (1) (1)
No ratings yet
Apache Spark with Scala - cheatsheet (1) (1)
7 pages
Colab Spark Initialize Step
No ratings yet
Colab Spark Initialize Step
1 page
EDA Python for Data Analsis
No ratings yet
EDA Python for Data Analsis
10 pages
Slide 10 PySpark - SQL
No ratings yet
Slide 10 PySpark - SQL
131 pages
PySpark Interview Questions
No ratings yet
PySpark Interview Questions
3 pages
PySpark_RDD_Cheat_Sheet
No ratings yet
PySpark_RDD_Cheat_Sheet
1 page
Apache_Spark_Lecture_Notes
No ratings yet
Apache_Spark_Lecture_Notes
4 pages
Unit IV spark
No ratings yet
Unit IV spark
23 pages
Day1 Main
No ratings yet
Day1 Main
188 pages
Apache Spark
No ratings yet
Apache Spark
6 pages
Spark Material
No ratings yet
Spark Material
6 pages
Final Note
No ratings yet
Final Note
31 pages
Pyspark RDD Cheat Sheet Python For Data Science
No ratings yet
Pyspark RDD Cheat Sheet Python For Data Science
1 page
Lab Spark
No ratings yet
Lab Spark
3 pages
Spark optimisation
No ratings yet
Spark optimisation
7 pages
Python Data Exploratory Commands
No ratings yet
Python Data Exploratory Commands
9 pages
Pyspark Study Material
No ratings yet
Pyspark Study Material
5 pages
Note
No ratings yet
Note
14 pages
Databricks
No ratings yet
Databricks
7 pages
pyspark
No ratings yet
pyspark
6 pages
Big+Data+with+Apache+Spark+3+and+Python+From+Zero+to+Expert
No ratings yet
Big+Data+with+Apache+Spark+3+and+Python+From+Zero+to+Expert
28 pages
Databricks Cloud Workshop: SF, 2015-05-20! Download Slides
100% (1)
Databricks Cloud Workshop: SF, 2015-05-20! Download Slides
168 pages
Spark Commands
No ratings yet
Spark Commands
3 pages
Py Spark 3 Quick Reference Guide
No ratings yet
Py Spark 3 Quick Reference Guide
2 pages
CSE413_201-15-3452_LAB-REPORT_02
No ratings yet
CSE413_201-15-3452_LAB-REPORT_02
6 pages
PySpark Core Print
No ratings yet
PySpark Core Print
8 pages
AmirghodsiSiama 2017 RunningYourFirstProgr ApacheSpark2XMachineL
No ratings yet
AmirghodsiSiama 2017 RunningYourFirstProgr ApacheSpark2XMachineL
9 pages
Pyspark Basics
No ratings yet
Pyspark Basics
16 pages
Lecture 10 - Spark
No ratings yet
Lecture 10 - Spark
87 pages
Spark-Tutorial - IV - Python
No ratings yet
Spark-Tutorial - IV - Python
212 pages
Pyspark
No ratings yet
Pyspark
10 pages
DataGrokr Technical Assignment - Data Engineering - Internshala
No ratings yet
DataGrokr Technical Assignment - Data Engineering - Internshala
5 pages
(Big Data Analytics With PySpark) (CheatSheet)
No ratings yet
(Big Data Analytics With PySpark) (CheatSheet)
7 pages
Intro To Apache Spark: Paco Nathan, Download Slides
No ratings yet
Intro To Apache Spark: Paco Nathan, Download Slides
86 pages
Spark With Python Notes
No ratings yet
Spark With Python Notes
206 pages
2.RDDs in Spark
No ratings yet
2.RDDs in Spark
38 pages
Spark Essentials
No ratings yet
Spark Essentials
15 pages
RDD Actions
No ratings yet
RDD Actions
18 pages
Pyspark Essentials
No ratings yet
Pyspark Essentials
24 pages
Apache Spark Cheatsheet (2014)
No ratings yet
Apache Spark Cheatsheet (2014)
9 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
30 pages
Pyspark PDF
100% (1)
Pyspark PDF
397 pages
master_pyspark_zero_to_hero_1738689679
No ratings yet
master_pyspark_zero_to_hero_1738689679
102 pages
1731556887911
No ratings yet
1731556887911
275 pages
DS-CMPSC 410 Topic 5 Spark-Submit, Persist, GroupBy
No ratings yet
DS-CMPSC 410 Topic 5 Spark-Submit, Persist, GroupBy
45 pages
Per Partition
No ratings yet
Per Partition
3 pages
What Is Spark?: Up To 100× Faster
No ratings yet
What Is Spark?: Up To 100× Faster
56 pages
Databricks Apache Spark Certified Developer Master Cheat Sheet
100% (1)
Databricks Apache Spark Certified Developer Master Cheat Sheet
29 pages
Scala PDF
No ratings yet
Scala PDF
29 pages
Configuration of a Simple Samba File Server, Quota and Schedule Backup
From Everand
Configuration of a Simple Samba File Server, Quota and Schedule Backup
Dr. Hedaya Alasooly
No ratings yet
Case Study - A Window On Life
No ratings yet
Case Study - A Window On Life
2 pages
Marketing Analysis of Apple Inc
No ratings yet
Marketing Analysis of Apple Inc
18 pages
Personality Development Programme 2023
No ratings yet
Personality Development Programme 2023
16 pages
Nestle
No ratings yet
Nestle
15 pages
Icici Bank Report 2023
No ratings yet
Icici Bank Report 2023
325 pages
GFA09 Financial Intermediaries
No ratings yet
GFA09 Financial Intermediaries
16 pages
Coal Feeder Accuracy
100% (4)
Coal Feeder Accuracy
18 pages
Cash-Los Logs
No ratings yet
Cash-Los Logs
19 pages
Sedan Comparison
No ratings yet
Sedan Comparison
24 pages
Mat 015 Theory
No ratings yet
Mat 015 Theory
2 pages
The Electron Transport System Also Called The Electron Transport Chain
No ratings yet
The Electron Transport System Also Called The Electron Transport Chain
5 pages
Cryptography and Network Security Long
No ratings yet
Cryptography and Network Security Long
121 pages
Panti Ramos, Darío. Trabajo de Estadistica Descriptiva e Inferencial
No ratings yet
Panti Ramos, Darío. Trabajo de Estadistica Descriptiva e Inferencial
13 pages
Ocvirk - Short Bearing Approximation For Full Journal Bearings - 1952
No ratings yet
Ocvirk - Short Bearing Approximation For Full Journal Bearings - 1952
62 pages
History of Cells
No ratings yet
History of Cells
14 pages
Srinivasan Et Al 2019 - Food & Metabolism of Alcohol
No ratings yet
Srinivasan Et Al 2019 - Food & Metabolism of Alcohol
9 pages
Jurnal Filtrasi Fix
No ratings yet
Jurnal Filtrasi Fix
9 pages
1st Mock Exam Routine (IGCSE)
No ratings yet
1st Mock Exam Routine (IGCSE)
1 page
Muhurta - Raman
100% (4)
Muhurta - Raman
77 pages
Tutorial Materials Selection
No ratings yet
Tutorial Materials Selection
2 pages
RSI + Stochastic + MACD + Heikin Ashi Candle by Danhy989 (2)
No ratings yet
RSI + Stochastic + MACD + Heikin Ashi Candle by Danhy989 (2)
2 pages
Factors Influencing Teachers' Laptop Purchases PDF
No ratings yet
Factors Influencing Teachers' Laptop Purchases PDF
6 pages
WI-10 Final Unit Test - Syringe Pump
No ratings yet
WI-10 Final Unit Test - Syringe Pump
16 pages
SRS RPC
No ratings yet
SRS RPC
64 pages
Side Push Surface Mount Type With 1.55mm Height SKTD Series
No ratings yet
Side Push Surface Mount Type With 1.55mm Height SKTD Series
3 pages
Full Microwave and RF Design Volume 1 Radio Systems Michael Steer Ebook All Chapters
100% (2)
Full Microwave and RF Design Volume 1 Radio Systems Michael Steer Ebook All Chapters
40 pages
N Tesla HOW COSMIC FORCES SHAPE OUR DESTINIES PDF
67% (3)
N Tesla HOW COSMIC FORCES SHAPE OUR DESTINIES PDF
5 pages
Komatsu Hydraulic Excavator Pc35 45r8 Shop Manual
100% (37)
Komatsu Hydraulic Excavator Pc35 45r8 Shop Manual
20 pages
Potential and Kinetic Energy Practice Cpo
No ratings yet
Potential and Kinetic Energy Practice Cpo
6 pages
Industrial Furnaces
No ratings yet
Industrial Furnaces
20 pages
VDO 2011 Instrumentation Catalog Reprint
No ratings yet
VDO 2011 Instrumentation Catalog Reprint
68 pages
Physics First Year
No ratings yet
Physics First Year
28 pages
Factorisation Self Practice
No ratings yet
Factorisation Self Practice
8 pages

PySpark Exam Setup and Basic Code Guide

Uploaded by

PySpark Exam Setup and Basic Code Guide

Uploaded by

PySpark Exam Setup and Basic Code

Assuming the file is zipped, unzip it first:

### 2. Start Hadoop

Ensure that Hadoop is started as it is essential for PySpark operations:

### 3. Navigate to the Folder

Go to the folder where the notebook file is located:

### 4. Start PySpark Notebook

Start the PySpark Jupyter Notebook using the following command:

##### a. Import Required Libraries

##### b. Initialize Spark Session

# Check Spark version to verify the environment

##### c. Basic DataFrame Setup

Create a small DataFrame to verify if PySpark is working:

##### d. Reading Data from HDFS

##### e. Basic DataFrame Operations

You can perform a few basic operations to manipulate data:

# Select specific columns

##### f. Saving DataFrame to HDFS

If you need to save the DataFrame back to HDFS:

### 6. Saving and Renaming the Notebook

### 7. Shut Down Spark and Hadoop After Completion

You might also like