Open navigation menu

Scribd

0% found this document useful (0 votes)

46 views9 pages

Assignment I (Dataframe) : Analysis of Stocks Data

The document discusses analyzing stock data from a Walmart CSV file using Spark. It includes loading the file, viewing the schema and column names, printing the first 5 columns, describing statistics, and calculating various metrics like the high-low ratio, peak price day, mean close, correlation between high and volume, max high per year, and average close by month. It also provides the amount of shuffled data for task 15 in the Spark UI.

Uploaded by

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views9 pages

Assignment I (Dataframe) : Analysis of Stocks Data

The document discusses analyzing stock data from a Walmart CSV file using Spark. It includes loading the file, viewing the schema and column names, printing the first 5 columns, describing statistics, and calculating various metrics like the high-low ratio, peak price day, mean close, correlation between high and volume, max high per year, and average close by month. It also provides the amount of shuffled data for task 15 in the Spark UI.

Uploaded by

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Assignment I (DataFrame)

Analysis of Stocks Data

Load the Walmart Stock CSV File, have Spark infer the data types.

What are the column names?

What does the Schema look like?

For printing the schema the rdd was first converted to dataframe since rdds do not have any
schema.

Print out the first 5 columns.

Use describe() to learn about the DataFrame.

Since describe() does not exist for rdds, hence we manually find all the statistics. We also use
tabulate to print the table in a nice format. Note: tabulate needs to be installed using pip.
Format the numbers to just show up to two decimal places.

Create a new dataframe with a column called HV Ratio that is the ratio of
the High Price versus volume of stock traded for a day.
What day had the Peak High in Price?

What is the mean of the Close column?

What is the max and min of the Volume column?

How many days was the Close lower than 60 dollars?

What percentage of the time was the High greater than 80 dollars ?

What is the Pearson correlation between High and Volume?

What is the max High per year?
What is the average Close for each Calendar Month?

Use spark web UI to view it execution plan of task no 15. Provide how
much data get shuffle for this task.

Shuffled data = 371.0 B

You might also like

Learn R Programming in 24 Hours
From Everand
Learn R Programming in 24 Hours
Alex Nordeen
No ratings yet
R For Dummies
From Everand
R For Dummies
Andrie de Vries
4/5 (14)
Getting Started with SAS Programming: Using SAS Studio in the Cloud
From Everand
Getting Started with SAS Programming: Using SAS Studio in the Cloud
Ron Cody
No ratings yet
Practical and Efficient SAS Programming: The Insider's Guide
From Everand
Practical and Efficient SAS Programming: The Insider's Guide
Martha Messineo
No ratings yet
Learning R Programming
From Everand
Learning R Programming
Kun Ren
5/5 (3)
Assignment I (DF)
No ratings yet
Assignment I (DF)
10 pages
10 - Jayesh - Prakash - Rane
No ratings yet
10 - Jayesh - Prakash - Rane
26 pages
Spark Walmart Data Analysis Project
0% (1)
Spark Walmart Data Analysis Project
17 pages
Stocks Data Analysis and Visualization Using Python Assignment
No ratings yet
Stocks Data Analysis and Visualization Using Python Assignment
4 pages
SAS For Dummies
From Everand
SAS For Dummies
Chris Hemedinger
No ratings yet
Beginning R: The Statistical Programming Language
From Everand
Beginning R: The Statistical Programming Language
Mark Gardener
4.5/5 (4)
Python For Finance - The Complete Beginner's Guide - by Behic Guven - Jul, 2020 - Towards Data Science PDF
100% (1)
Python For Finance - The Complete Beginner's Guide - by Behic Guven - Jul, 2020 - Towards Data Science PDF
12 pages
Spark DataFrames Project Exercise - Jupyter Notebook
No ratings yet
Spark DataFrames Project Exercise - Jupyter Notebook
7 pages
HW0 Warmup
No ratings yet
HW0 Warmup
2 pages
Pandas For Machine Learning: Acadview
No ratings yet
Pandas For Machine Learning: Acadview
18 pages
Data Acqusition Final Report
No ratings yet
Data Acqusition Final Report
17 pages
Pivot Tables: Easy Excel Essentials, #1
From Everand
Pivot Tables: Easy Excel Essentials, #1
M.L. Humphrey
No ratings yet
Preparing Data for Analysis with JMP
From Everand
Preparing Data for Analysis with JMP
Robert Carver
No ratings yet
Statistical Analysis with R For Dummies
From Everand
Statistical Analysis with R For Dummies
Joseph Schmuller
5/5 (1)
Coding Interview Questions and Answers
From Everand
Coding Interview Questions and Answers
Chinmoy Mukherjee
No ratings yet
Reading and Plotting Stock Data Notes
No ratings yet
Reading and Plotting Stock Data Notes
2 pages
PROC REPORT by Example: Techniques for Building Professional Reports Using SAS: Techniques for Building Professional Reports Using SAS
From Everand
PROC REPORT by Example: Techniques for Building Professional Reports Using SAS: Techniques for Building Professional Reports Using SAS
Lisa Fine
No ratings yet
Stock MGMT System - Task 1-2 - Almas
No ratings yet
Stock MGMT System - Task 1-2 - Almas
5 pages
Creating your MySQL Database: Practical Design Tips and Techniques
From Everand
Creating your MySQL Database: Practical Design Tips and Techniques
Marc Delisle
3/5 (1)
Stock Market Analysis ? Pro2 My
No ratings yet
Stock Market Analysis ? Pro2 My
32 pages
Python Programming Mock Exam
No ratings yet
Python Programming Mock Exam
20 pages
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
0.1 Stock Data
100% (1)
0.1 Stock Data
4 pages
Pandas PDF
No ratings yet
Pandas PDF
25 pages
Just Enough R: Learn Data Analysis with R in a Day
From Everand
Just Enough R: Learn Data Analysis with R in a Day
Sivakumaran Raman
3.5/5 (2)
Data Understanding and Preparation
No ratings yet
Data Understanding and Preparation
48 pages
Python Data Science Cookbook
From Everand
Python Data Science Cookbook
Taryn Voska
No ratings yet
Python Data Science Cookbook: Practical solutions across fast data cleaning, processing, and machine learning workflows with pandas, NumPy, and scikit-learn
From Everand
Python Data Science Cookbook: Practical solutions across fast data cleaning, processing, and machine learning workflows with pandas, NumPy, and scikit-learn
Taryn Voska
No ratings yet
Assignment 3 Fin 320 Managerial Finance Statistical Analysis
No ratings yet
Assignment 3 Fin 320 Managerial Finance Statistical Analysis
4 pages
The Data Detective's Toolkit: Cutting-Edge Techniques and SAS Macros to Clean, Prepare, and Manage Data
From Everand
The Data Detective's Toolkit: Cutting-Edge Techniques and SAS Macros to Clean, Prepare, and Manage Data
Kim Chantala
No ratings yet
Asm2024 1
No ratings yet
Asm2024 1
33 pages
Frank Kane's Taming Big Data with Apache Spark and Python
From Everand
Frank Kane's Taming Big Data with Apache Spark and Python
Frank Kane
No ratings yet
Homework 4
No ratings yet
Homework 4
1 page
FINAL-Nguyễn Quỳnh Chi-2013316663
No ratings yet
FINAL-Nguyễn Quỳnh Chi-2013316663
1 page
Ip Practical Notes
No ratings yet
Ip Practical Notes
6 pages
Question Bank-BDA (Module 1&2) 2
No ratings yet
Question Bank-BDA (Module 1&2) 2
5 pages
Hadoop Blueprints
From Everand
Hadoop Blueprints
Anurag Shrivastava
No ratings yet
Assvid
No ratings yet
Assvid
13 pages
FusionCharts Beginner's Guide
From Everand
FusionCharts Beginner's Guide
Sanket Nadhani
No ratings yet
Day 17
No ratings yet
Day 17
9 pages
Mastering DynamoDB
From Everand
Mastering DynamoDB
Tanmay Deshpande
No ratings yet
How To Develop A Performance Reporting Tool with MS Excel and MS SharePoint
From Everand
How To Develop A Performance Reporting Tool with MS Excel and MS SharePoint
S. Alyafei
No ratings yet
Pandas 2
No ratings yet
Pandas 2
17 pages
Data Science with R: Beginner to Expert
From Everand
Data Science with R: Beginner to Expert
Narayana Nemani
No ratings yet
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
Prediction Model
No ratings yet
Prediction Model
8 pages
SAS Viya: The R Perspective
From Everand
SAS Viya: The R Perspective
Yue Qi
No ratings yet
C++ Programming Language
From Everand
C++ Programming Language
Younish Pathan
No ratings yet
Pivot Tables for everyone. From simple tables to Power-Pivot: Useful guide for creating Pivot Tables in Excel
From Everand
Pivot Tables for everyone. From simple tables to Power-Pivot: Useful guide for creating Pivot Tables in Excel
Olga Maria Stefania Cucaro
No ratings yet
Microsoft Access 2003
From Everand
Microsoft Access 2003
Jitendra Patel
5/5 (1)
THE SQL LANGUAGE: Master Database Management and Unlock the Power of Data (2024 Beginner's Guide)
From Everand
THE SQL LANGUAGE: Master Database Management and Unlock the Power of Data (2024 Beginner's Guide)
JAMIE POWERS
No ratings yet
DMV Lab 7
No ratings yet
DMV Lab 7
9 pages
Infographics Powered by SAS: Data Visualization Techniques for Business Reporting
From Everand
Infographics Powered by SAS: Data Visualization Techniques for Business Reporting
Travis Murphy
No ratings yet
Microsoft Access: Database Creation and Management through Microsoft Access
From Everand
Microsoft Access: Database Creation and Management through Microsoft Access
Steven Bright
No ratings yet
Assignment II
No ratings yet
Assignment II
11 pages
Assignment 2: Hive
No ratings yet
Assignment 2: Hive
11 pages
ComplexArithmetic - Jupyter Notebook
No ratings yet
ComplexArithmetic - Jupyter Notebook
14 pages
Flipkart Training: Exploratory Data Analysis
No ratings yet
Flipkart Training: Exploratory Data Analysis
9 pages
Elastic Stack 7
No ratings yet
Elastic Stack 7
280 pages
ABC Guide On Citizen Engagement
No ratings yet
ABC Guide On Citizen Engagement
11 pages
Democracy Administration 1
No ratings yet
Democracy Administration 1
34 pages
Learning Tensorflow
No ratings yet
Learning Tensorflow
9 pages
Hello, World: Artificial Intelligence and Its Use in The Public Sector
No ratings yet
Hello, World: Artificial Intelligence and Its Use in The Public Sector
185 pages
Developing Cloud Native Applications With Microservices Architecture - Google Slides
No ratings yet
Developing Cloud Native Applications With Microservices Architecture - Google Slides
1 page
Practical TOGAF 9 Sample Soln 2014Q2
No ratings yet
Practical TOGAF 9 Sample Soln 2014Q2
35 pages