Assignment I (Dataframe) : Analysis of Stocks Data
Assignment I (Dataframe) : Analysis of Stocks Data
For printing the schema the rdd was first converted to dataframe since rdds do not have any
schema.
Since describe() does not exist for rdds, hence we manually find all the statistics. We also use
tabulate to print the table in a nice format. Note: tabulate needs to be installed using pip.
Format the numbers to just show up to two decimal places.
Create a new dataframe with a column called HV Ratio that is the ratio of
the High Price versus volume of stock traded for a day.
What day had the Peak High in Price?
Use spark web UI to view it execution plan of task no 15. Provide how
much data get shuffle for this task.