Syed Zeppelin Assignment
Syed Zeppelin Assignment
The Client who has given you this data would like a Zeppelin notebook returned with the
following breakdown:
worldsales.printSchema ()
3. Filter the dataframe to show units sold greater than 8000 and unit cost greater than
500 ("&&" operator can be used for multiple "AND" conditions)
grouping.coalesce(1).write.csv("/tmp/grouped")
6. Save this new subset dataframe as a csv file into HDFS – make sure it is saved as a
single file in HDFS
worldsales.createOrReplaceTempView("SalesnView")
7. Create two views using the “createOrReplaceTempView” command
grouping.createOrReplaceTempView("RegionView")
10. Using SQL select all from “Regionview” view and show in a line graph.
12. Using SQL select from the “Salesview” view – the region and sum of total_profit and
group by region and display in a Bar chart
select Region, Sum(Total_Profit) from SalesnView group by Region
13. Using SQL select from the “Salesview” view – show the total profit as profit, the total
revenue as revenue and the total cost as cost from “Salesview”, group by region