0% found this document useful (0 votes)

18 views2 pages

Spark 5

Uploaded by

yaso.ponnaganti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views2 pages

Spark 5

Uploaded by

yaso.ponnaganti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

spark5

August 20, 2024

[1]: # Install Apache Spark if not already installed

!pip install PySpark

Collecting PySpark
Downloading pyspark-3.5.2.tar.gz (317.3 MB)
�� 317.3/317.3
MB 4.1 MB/s eta 0:00:00
Preparing metadata (setup.py) … done
Requirement already satisfied: py4j==0.10.9.7 in /usr/local/lib/python3.10/dist-
packages (from PySpark) (0.10.9.7)
Building wheels for collected packages: PySpark
Building wheel for PySpark (setup.py) … done
Created wheel for PySpark: filename=pyspark-3.5.2-py2.py3-none-any.whl
size=317812365
sha256=098cabe5072f576a6420082b780d1ceeb0d89ade653e76de07fa10971b334ad5
Stored in directory: /root/.cache/pip/wheels/34/34/bd/03944534c44b677cd5859f24
8090daa9fb27b3c8f8e5f49574
Successfully built PySpark
Installing collected packages: PySpark
Successfully installed PySpark-3.5.2

[2]: # Import necessary libraries

from pyspark.sql import SparkSession

[3]: # Create a SparkSession

spark = SparkSession.builder.appName("My Pipeline").getOrCreate()

[4]: # Ingest data from a CSV file

df = spark.read.csv("/content/data.csv", header=True, inferSchema=True)

[5]: df.show()

+-----+---+-------+
| Name|Age|Country|
+-----+---+-------+
|Alice| 25| UK|
| Jhon| 54| USA|

1
| Nani| 23| India|
| Bob| 16|Germany|
+-----+---+-------+

[6]: # Transform data by filtering and aggregating

df_transformed = df.filter(df["age"] > 18).groupBy("country").count()

[7]: # Store output in a Parquet file

df_transformed.write.parquet("output.parquet")

[8]: # showing transformed data

df_transformed.show()

+-------+-----+
|country|count|
+-------+-----+
| India| 1|
| USA| 1|
| UK| 1|
+-------+-----+

[9]: # Stop the SparkSession

spark.stop()

[ ]:

4 BNI Python Training
100% (1)
4 BNI Python Training
126 pages
Pyspark Basics
No ratings yet
Pyspark Basics
16 pages
(Big Data Analytics With PySpark) (CheatSheet)
No ratings yet
(Big Data Analytics With PySpark) (CheatSheet)
7 pages
Slide 10 PySpark - SQL
No ratings yet
Slide 10 PySpark - SQL
131 pages
PySpark FP Course ID 58339
No ratings yet
PySpark FP Course ID 58339
44 pages
PySpark SQL Cheat Sheet Python PDF
No ratings yet
PySpark SQL Cheat Sheet Python PDF
1 page
Py Spark 3 Quick Reference Guide
No ratings yet
Py Spark 3 Quick Reference Guide
2 pages
My Pyspark Practice Notes
100% (1)
My Pyspark Practice Notes
63 pages
Databricks Pyspark 1712042928
100% (1)
Databricks Pyspark 1712042928
21 pages
PySpark Reference Guide
No ratings yet
PySpark Reference Guide
2 pages
PySpark SQL Cheat Sheet Python
No ratings yet
PySpark SQL Cheat Sheet Python
1 page
Pyspark Questions
No ratings yet
Pyspark Questions
63 pages
Unstructured Dataload Into Hive Database Through PySpark
No ratings yet
Unstructured Dataload Into Hive Database Through PySpark
9 pages
PySpark SQL Cheat Sheet Python
100% (2)
PySpark SQL Cheat Sheet Python
1 page
Installation Guide Apache Kylin
100% (1)
Installation Guide Apache Kylin
17 pages
Learning Apache Spark With Python
No ratings yet
Learning Apache Spark With Python
10 pages
Pyspark Tutorial
100% (2)
Pyspark Tutorial
27 pages
Py Spark Final
No ratings yet
Py Spark Final
1 page
PySpark Notes
No ratings yet
PySpark Notes
190 pages
Big Data Analytics Digital Assignment 1
No ratings yet
Big Data Analytics Digital Assignment 1
5 pages
Spark With Python Notes
No ratings yet
Spark With Python Notes
206 pages
Pyspark With Docker
100% (1)
Pyspark With Docker
15 pages
07 Spark Dataframes
100% (1)
07 Spark Dataframes
45 pages
Pandas Handbook
No ratings yet
Pandas Handbook
33 pages
1 - Introduction ToPySpark
No ratings yet
1 - Introduction ToPySpark
26 pages
PySpark SQL Cheat Sheet Python PDF
No ratings yet
PySpark SQL Cheat Sheet Python PDF
1 page
649550 PySpark实战指南：利用Python和Spark构建数据密集型应用并规模化部署
No ratings yet
649550 PySpark实战指南：利用Python和Spark构建数据密集型应用并规模化部署
202 pages
Chapter 3
No ratings yet
Chapter 3
33 pages
Spark-Tutorial - IV - Python
No ratings yet
Spark-Tutorial - IV - Python
212 pages
Analysis of Heart Disease Dataset
No ratings yet
Analysis of Heart Disease Dataset
16 pages
Spark1 1
No ratings yet
Spark1 1
3 pages
PR8 - BDA - Ipynb - Colaboratory
No ratings yet
PR8 - BDA - Ipynb - Colaboratory
2 pages
Data Pipeline
No ratings yet
Data Pipeline
6 pages
Big Data
No ratings yet
Big Data
7 pages
Interview Prep
No ratings yet
Interview Prep
24 pages
Dev New
No ratings yet
Dev New
44 pages
Pyspark
No ratings yet
Pyspark
10 pages
Untitled Document
No ratings yet
Untitled Document
7 pages
Page 02
No ratings yet
Page 02
2 pages
Pyspark NLP From Scratch
No ratings yet
Pyspark NLP From Scratch
3 pages
Word Count
No ratings yet
Word Count
3 pages
Ip Project File
No ratings yet
Ip Project File
7 pages
Rest of The Ip Project
No ratings yet
Rest of The Ip Project
26 pages
Spark Python Install
No ratings yet
Spark Python Install
3 pages
Data Engineering 101 PySpark Vs Pandas 1721887961
No ratings yet
Data Engineering 101 PySpark Vs Pandas 1721887961
36 pages
Colab Spark Initialize Step
No ratings yet
Colab Spark Initialize Step
1 page
Pandas Documentation PDF
No ratings yet
Pandas Documentation PDF
86 pages
Py Spark
No ratings yet
Py Spark
177 pages
Project Documentation
No ratings yet
Project Documentation
36 pages
Python For Data Science (NumPy & Pandas)
No ratings yet
Python For Data Science (NumPy & Pandas)
2 pages
Selectionofattributes
No ratings yet
Selectionofattributes
41 pages
CSV Processor
No ratings yet
CSV Processor
3 pages
Car Analytics Solution
No ratings yet
Car Analytics Solution
4 pages
Code
No ratings yet
Code
13 pages
Page 01
No ratings yet
Page 01
2 pages
Learning Spark - Chapter 2
No ratings yet
Learning Spark - Chapter 2
6 pages
PythonUNIT 5
No ratings yet
PythonUNIT 5
20 pages
Day 11 Notes
No ratings yet
Day 11 Notes
3 pages
Firebase Storage for Angular: A reliable file upload solution for your applications
From Everand
Firebase Storage for Angular: A reliable file upload solution for your applications
Abdelfattah Ragab
No ratings yet
The Little Book of Sitecore® Tips: Volume 1
From Everand
The Little Book of Sitecore® Tips: Volume 1
Neil P Shack
No ratings yet

Spark 5

Uploaded by

Spark 5

Uploaded by

spark5

August 20, 2024

[1]: # Install Apache Spark if not already installed

[2]: # Import necessary libraries

[3]: # Create a SparkSession

[4]: # Ingest data from a CSV file

[6]: # Transform data by filtering and aggregating

[7]: # Store output in a Parquet file

[8]: # showing transformed data

[9]: # Stop the SparkSession

You might also like