Assignment I (DataFrame)
Analysis of Stocks Data
Load the Walmart Stock CSV File, have Spark infer the data types.
What are the column names?
What does the Schema look like?
Print out the first 5 columns.
Use describe() to learn about the DataFrame.
Format the numbers to just show up to two decimal places.
Create a new dataframe with a column called HV Ratio that is the ratio of
the High Price versus volume of stock traded for a day.
What day had the Peak High in Price?
What is the mean of the Close column?
What is the max and min of the Volume column?
How many days was the Close lower than 60 dollars?
What percentage of the time was the High greater than 80 dollars ?
What is the Pearson correlation between High and Volume?
What is the max High per year?
What is the average Close for each Calendar Month?
Use spark web UI to view it execution plan of task no 15. Provide how
much data get shuffle for this task.
Total bytes shuffled: 960 B
There were 5 jobs with total 8 stages as shown in the picture. 4 jobs had 960 B shuffle read divided
between them and the first job had 960 B shuffled write.
HDFS
There was only 1 block present.