Assignment I (DF)

The document provides instructions for analyzing stock data from Walmart using Spark. It includes tasks like loading a CSV file, examining the schema and columns, calculating descriptive statistics, and correlations between variables. It also asks to view the execution plan and shuffle sizes for specific tasks run on the Spark web UI.

Uploaded by

HPot PotTech

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views10 pages

Assignment I (DF)

Uploaded by

HPot PotTech

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Assignment I (DataFrame)

Analysis of Stocks Data

Load the Walmart Stock CSV File, have Spark infer the data types.

What are the column names?

What does the Schema look like?

Print out the first 5 columns.

Use describe() to learn about the DataFrame.

Format the numbers to just show up to two decimal places.

Create a new dataframe with a column called HV Ratio that is the ratio of
the High Price versus volume of stock traded for a day.
What day had the Peak High in Price?

What is the mean of the Close column?

What is the max and min of the Volume column?

How many days was the Close lower than 60 dollars?

What percentage of the time was the High greater than 80 dollars ?

What is the Pearson correlation between High and Volume?

What is the max High per year?

What is the average Close for each Calendar Month?

Use spark web UI to view it execution plan of task no 15. Provide how
much data get shuffle for this task.

Total bytes shuffled: 960 B

There were 5 jobs with total 8 stages as shown in the picture. 4 jobs had 960 B shuffle read divided
between them and the first job had 960 B shuffled write.
HDFS

There was only 1 block present.

Pyspark Basics
No ratings yet
Pyspark Basics
16 pages
Stock MGMT System - Task 1-2 - Almas
No ratings yet
Stock MGMT System - Task 1-2 - Almas
5 pages
Fall209 Spark SQL MC
No ratings yet
Fall209 Spark SQL MC
96 pages
Data Visualization Assignment: +91-7022374614 US: 1-800-216-8930 (Toll Free)
No ratings yet
Data Visualization Assignment: +91-7022374614 US: 1-800-216-8930 (Toll Free)
2 pages
Spark Walmart Data Analysis Project
0% (1)
Spark Walmart Data Analysis Project
17 pages
Slide 10 PySpark - SQL
No ratings yet
Slide 10 PySpark - SQL
131 pages
Assignment I (Dataframe) : Analysis of Stocks Data
No ratings yet
Assignment I (Dataframe) : Analysis of Stocks Data
9 pages
Stocks Data Analysis and Visualization Using Python Assignment
No ratings yet
Stocks Data Analysis and Visualization Using Python Assignment
4 pages
Data Understanding and Preparation
No ratings yet
Data Understanding and Preparation
48 pages
Python For Finance - The Complete Beginner's Guide - by Behic Guven - Jul, 2020 - Towards Data Science PDF
100% (1)
Python For Finance - The Complete Beginner's Guide - by Behic Guven - Jul, 2020 - Towards Data Science PDF
12 pages
18CN627 Big Data Framework For Data Science: Centre For Excellence in Computational Engineering and Networking
No ratings yet
18CN627 Big Data Framework For Data Science: Centre For Excellence in Computational Engineering and Networking
1 page
DA Practice Questions - Unit - 1
No ratings yet
DA Practice Questions - Unit - 1
5 pages
Spark DataFrames Project Exercise - Jupyter Notebook
No ratings yet
Spark DataFrames Project Exercise - Jupyter Notebook
7 pages
PFDA
No ratings yet
PFDA
23 pages
DA Practice Questions - Unit - 1
No ratings yet
DA Practice Questions - Unit - 1
4 pages
Data Engineer
No ratings yet
Data Engineer
19 pages
AARYADAV
No ratings yet
AARYADAV
78 pages
Ip Practical Notes
No ratings yet
Ip Practical Notes
6 pages
Big Data With Spark and Hadoop
No ratings yet
Big Data With Spark and Hadoop
9 pages
Docse
No ratings yet
Docse
3 pages
DPV - Session 1
No ratings yet
DPV - Session 1
51 pages
Practical TOGAF 9 Sample Soln 2014Q2
No ratings yet
Practical TOGAF 9 Sample Soln 2014Q2
35 pages
Lab06 Spark Dataframes
No ratings yet
Lab06 Spark Dataframes
12 pages
Sample Questions
No ratings yet
Sample Questions
3 pages
Assvid
No ratings yet
Assvid
13 pages
Practical 2023
No ratings yet
Practical 2023
10 pages
10 - Jayesh - Prakash - Rane
No ratings yet
10 - Jayesh - Prakash - Rane
26 pages
Practice 1,2
No ratings yet
Practice 1,2
8 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
41 pages
Practical Paper IP Sets
No ratings yet
Practical Paper IP Sets
10 pages
DMV Lab 7
No ratings yet
DMV Lab 7
9 pages
Project 1
No ratings yet
Project 1
4 pages
Time Series Updated
No ratings yet
Time Series Updated
11 pages
Nifty - 50 - Time - Series Rohit
No ratings yet
Nifty - 50 - Time - Series Rohit
12 pages
Grade 12 IP - Practical File Questions 2024-2025
No ratings yet
Grade 12 IP - Practical File Questions 2024-2025
6 pages
Developing Cloud Native Applications With Microservices Architecture - Google Slides
No ratings yet
Developing Cloud Native Applications With Microservices Architecture - Google Slides
1 page
Elastic Stack 7
No ratings yet
Elastic Stack 7
280 pages
Question Bank-BDA (Module 1&2) 2
No ratings yet
Question Bank-BDA (Module 1&2) 2
5 pages
Python For Data Analysis Edgar
No ratings yet
Python For Data Analysis Edgar
49 pages
Learning Tensorflow
No ratings yet
Learning Tensorflow
9 pages
Democracy Administration 1
No ratings yet
Democracy Administration 1
34 pages
Pandas PDF
No ratings yet
Pandas PDF
25 pages
ABC Guide On Citizen Engagement
No ratings yet
ABC Guide On Citizen Engagement
11 pages
Hello, World: Artificial Intelligence and Its Use in The Public Sector
No ratings yet
Hello, World: Artificial Intelligence and Its Use in The Public Sector
185 pages
Yash Dbms
No ratings yet
Yash Dbms
56 pages
Assignment II
No ratings yet
Assignment II
11 pages
Assignment 2: Hive
No ratings yet
Assignment 2: Hive
11 pages
ComplexArithmetic - Jupyter Notebook
No ratings yet
ComplexArithmetic - Jupyter Notebook
14 pages
Flipkart Training: Exploratory Data Analysis
No ratings yet
Flipkart Training: Exploratory Data Analysis
9 pages
Reading and Plotting Stock Data Notes
No ratings yet
Reading and Plotting Stock Data Notes
2 pages
Pyspark Coding Interview Questions
No ratings yet
Pyspark Coding Interview Questions
19 pages
CODE1
No ratings yet
CODE1
5 pages
L6 and 7-Data Preprocessing-Coding
No ratings yet
L6 and 7-Data Preprocessing-Coding
34 pages
Problem Submission: This Notebook Is About Using SPARK Dataframe Functions To Process Nsedata - CSV
No ratings yet
Problem Submission: This Notebook Is About Using SPARK Dataframe Functions To Process Nsedata - CSV
3 pages
Pandas Tasks Manipulation
No ratings yet
Pandas Tasks Manipulation
1 page
Task 1: This Notebook Illustrates The Use of "MAP-REDUCE" To Calculate Averages From The Data Contained in Nsedata - CSV
No ratings yet
Task 1: This Notebook Illustrates The Use of "MAP-REDUCE" To Calculate Averages From The Data Contained in Nsedata - CSV
5 pages
Unit 4 Spark SQL
No ratings yet
Unit 4 Spark SQL
49 pages
Daily Transactions Problem Statement
No ratings yet
Daily Transactions Problem Statement
27 pages
Prac 1
No ratings yet
Prac 1
5 pages
Daily Transactions Problem Statement Major Project
No ratings yet
Daily Transactions Problem Statement Major Project
8 pages
Problem Statement Major Project
No ratings yet
Problem Statement Major Project
8 pages
Myinterview Qs
No ratings yet
Myinterview Qs
9 pages
Python SQL
No ratings yet
Python SQL
5 pages
Basic DataFrame Operation
No ratings yet
Basic DataFrame Operation
11 pages
Practical File Questions
No ratings yet
Practical File Questions
2 pages
Python Programming Mock Exam
No ratings yet
Python Programming Mock Exam
20 pages
Linked Int Question Experience
No ratings yet
Linked Int Question Experience
2 pages
Python Data Science Cookbook
From Everand
Python Data Science Cookbook
Taryn Voska
No ratings yet
Python Data Science Cookbook: Practical solutions across fast data cleaning, processing, and machine learning workflows with pandas, NumPy, and scikit-learn
From Everand
Python Data Science Cookbook: Practical solutions across fast data cleaning, processing, and machine learning workflows with pandas, NumPy, and scikit-learn
Taryn Voska
No ratings yet
Coding Interview Questions and Answers
From Everand
Coding Interview Questions and Answers
Chinmoy Mukherjee
No ratings yet
Pivot Tables: Easy Excel Essentials, #1
From Everand
Pivot Tables: Easy Excel Essentials, #1
M.L. Humphrey
No ratings yet

Assignment I (DF)

Uploaded by

Assignment I (DF)

Uploaded by

Assignment I (DataFrame)

Analysis of Stocks Data

What are the column names?

Print out the first 5 columns.

Format the numbers to just show up to two decimal places.

What is the mean of the Close column?

How many days was the Close lower than 60 dollars?

What is the Pearson correlation between High and Volume?

What is the average Close for each Calendar Month?

Total bytes shuffled: 960 B

There was only 1 block present.

You might also like