0% found this document useful (0 votes)

118 views8 pages

Syed Zeppelin Assignment

The document outlines a Zeppelin notebook assignment that involves loading CSV data into a Spark dataframe, filtering and aggregating the data, creating views, and performing SQL queries and visualizations to analyze sales data by region, including filtering for units sold and costs over thresholds, counting regions, saving grouped data to HDFS, finding sums and averages of values grouped by region to display in graphs and charts.

Uploaded by

Syed Shouiab

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

118 views8 pages

Syed Zeppelin Assignment

Uploaded by

Syed Shouiab

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Zeppelin-Spark Assignment

The Client who has given you this data would like a Zeppelin notebook returned with the
following breakdown:

1. Load data into a Spark dataframe

Val empdata = (spark.read.option("header", "true") .option("inferSchema",

"true").csv("/tmp/worldsales.csv"))

2. Print the dataframe schema

worldsales.printSchema ()
3. Filter the dataframe to show units sold greater than 8000 and unit cost greater than
500 ("&&" operator can be used for multiple "AND" conditions)

val filtered = worldsales.select("Units_Sold", "Unit_Cost").filter($"Units_Sold" > 8000 &&

$"Unit_Cost" > 500) .show()

4. Aggregate the dataframe via group by “Region” and count

val grouping = worldsales.select ("Region") •groupBy("Region").agg(count("Region") .alias

("RegionCount")) .orderBy(desc ("RegionCount"))
5. Create a separate dataframe with the above group by results

grouping.coalesce(1).write.csv("/tmp/grouped")

6. Save this new subset dataframe as a csv file into HDFS – make sure it is saved as a
single file in HDFS

worldsales.createOrReplaceTempView("SalesnView")
7. Create two views using the “createOrReplaceTempView” command

grouping.createOrReplaceTempView("RegionView")

8. View on “Salesview” from the first dataframe

9. View on “Regionview” from the second dataframe

10. Using SQL select all from “Regionview” view and show in a line graph.

select * from RegionView

11. Using SQL, from the “Salesview” view, Select the region and sum of units sold, and
group by region

select Region, Sum(Units_Sold) from SalesnView group by Region

12. Using SQL select from the “Salesview” view – the region and sum of total_profit and
group by region and display in a Bar chart
select Region, Sum(Total_Profit) from SalesnView group by Region

13. Using SQL select from the “Salesview” view – show the total profit as profit, the total
revenue as revenue and the total cost as cost from “Salesview”, group by region

select Region, Sum(Total_Profit) as Profit, Sum(Total_Revenue) as Revenue, Sum(Total_Cost)

as Cost from SalesnView group by Region
14. The client is in the process of opening up a new store and they are looking at the
best location to do so - They need to see the avg profit in each region as a
percentage (pie chart) compared to other regions.

select Region,Avg (Total _Profit) from SalesnView group by Region

Tableau Class Room Notes From RRITEC Part2
100% (1)
Tableau Class Room Notes From RRITEC Part2
52 pages
Power Bi (Lab File) Ilma Hafeez
No ratings yet
Power Bi (Lab File) Ilma Hafeez
13 pages
DAV ESX Answer
No ratings yet
DAV ESX Answer
58 pages
Data Visualization - Day 2 - in Class Exercises - Graphs in Tableau - Solution Final
No ratings yet
Data Visualization - Day 2 - in Class Exercises - Graphs in Tableau - Solution Final
43 pages
Amogh Borgave Data Visuaization FA-10
No ratings yet
Amogh Borgave Data Visuaization FA-10
23 pages
Tableau Lab Manual
No ratings yet
Tableau Lab Manual
6 pages
A Step-By-Step Guide For Tableau
No ratings yet
A Step-By-Step Guide For Tableau
21 pages
Dimensions and Dependent Attributes
No ratings yet
Dimensions and Dependent Attributes
20 pages
Tableau by Mughammd Asrol Smallsize
No ratings yet
Tableau by Mughammd Asrol Smallsize
177 pages
TABLEAU PORTFOLIO PROJECT Add This INTERACTIVE DASHBOARD To Your Data Portfolio (English (Auto-Generated) ) (DownloadYoutubeSubtitles - Com)
No ratings yet
TABLEAU PORTFOLIO PROJECT Add This INTERACTIVE DASHBOARD To Your Data Portfolio (English (Auto-Generated) ) (DownloadYoutubeSubtitles - Com)
27 pages
Tableau Assignment
No ratings yet
Tableau Assignment
15 pages
Dva
No ratings yet
Dva
19 pages
Tableau Experiment
No ratings yet
Tableau Experiment
87 pages
Dashboard Designer PDF
No ratings yet
Dashboard Designer PDF
53 pages
Adv Tableau
No ratings yet
Adv Tableau
8 pages
Lab - Exploring DataLake With Athena and Quicksight PDF
No ratings yet
Lab - Exploring DataLake With Athena and Quicksight PDF
22 pages
Nikita MKTM Ca2
No ratings yet
Nikita MKTM Ca2
14 pages
Manmohan Pandey Lab Mannual
No ratings yet
Manmohan Pandey Lab Mannual
30 pages
R Programming Lab
No ratings yet
R Programming Lab
57 pages
Use Sample Superstore Excel Sheet For Remaining Questions
No ratings yet
Use Sample Superstore Excel Sheet For Remaining Questions
2 pages
Building Interactive Dashboards: Unit 4
No ratings yet
Building Interactive Dashboards: Unit 4
26 pages
Session 9-14 Tableau
No ratings yet
Session 9-14 Tableau
55 pages
Tableau
No ratings yet
Tableau
15 pages
Session 9-11 Tableau
No ratings yet
Session 9-11 Tableau
27 pages
Lecture 3.1.5
No ratings yet
Lecture 3.1.5
11 pages
DV Lab Manual
No ratings yet
DV Lab Manual
38 pages
ANL201 Study Unit 4 - 2023
No ratings yet
ANL201 Study Unit 4 - 2023
53 pages
Session 9-13 Tableau
No ratings yet
Session 9-13 Tableau
44 pages
DV Lab Manual1
No ratings yet
DV Lab Manual1
48 pages
Different Methods
No ratings yet
Different Methods
19 pages
Hands On Training Steps BEGINNER WORKSHOP
No ratings yet
Hands On Training Steps BEGINNER WORKSHOP
2 pages
SDC Lab Manual (Mid - 1)
No ratings yet
SDC Lab Manual (Mid - 1)
35 pages
Tableau 4
No ratings yet
Tableau 4
16 pages
LAB5
No ratings yet
LAB5
14 pages
Leveling Up With Advanced SQL
No ratings yet
Leveling Up With Advanced SQL
13 pages
Course Title: Visual Analytics and Applications
No ratings yet
Course Title: Visual Analytics and Applications
36 pages
Financial Performance Dashboard (Business Analyst)
No ratings yet
Financial Performance Dashboard (Business Analyst)
14 pages
Tableau Intro
No ratings yet
Tableau Intro
21 pages
OAC - Assignment Level 2
No ratings yet
OAC - Assignment Level 2
5 pages
Lab 6
No ratings yet
Lab 6
13 pages
Tableau Demonstration
No ratings yet
Tableau Demonstration
16 pages
Data Visualization Lab Manual-1
No ratings yet
Data Visualization Lab Manual-1
11 pages
Program 3 Lab - Manual
No ratings yet
Program 3 Lab - Manual
6 pages
Visual Analytics Using Tableau-Class 3
No ratings yet
Visual Analytics Using Tableau-Class 3
16 pages
Basic Data Visualizations in Tableau
No ratings yet
Basic Data Visualizations in Tableau
2 pages
Tableau
No ratings yet
Tableau
10 pages
Business Intelligence Notes
No ratings yet
Business Intelligence Notes
16 pages
Tableau
No ratings yet
Tableau
5 pages
DMV Lab 12
No ratings yet
DMV Lab 12
8 pages
Region Based On Sales 2 - Compressed
No ratings yet
Region Based On Sales 2 - Compressed
11 pages
Day 1
No ratings yet
Day 1
8 pages
Hands-On Lab 6 - Advanced Dashboard Capabilities in Cognos Analytics (30 Min)
No ratings yet
Hands-On Lab 6 - Advanced Dashboard Capabilities in Cognos Analytics (30 Min)
13 pages
TABLEAU
No ratings yet
TABLEAU
6 pages
Assignment 1 - Basic Graphs and Charts in Tableau
No ratings yet
Assignment 1 - Basic Graphs and Charts in Tableau
6 pages
Ba CH-3
No ratings yet
Ba CH-3
6 pages
DV Assignment1 PowerBI Part1
No ratings yet
DV Assignment1 PowerBI Part1
5 pages
Task
No ratings yet
Task
2 pages
Activity 6
No ratings yet
Activity 6
2 pages
Problem Statement 2.1
No ratings yet
Problem Statement 2.1
1 page

Syed Zeppelin Assignment

Uploaded by

Syed Zeppelin Assignment

Uploaded by

Zeppelin-Spark Assignment

1. Load data into a Spark dataframe

Val empdata = (spark.read.option("header", "true") .option("inferSchema",

2. Print the dataframe schema

val filtered = worldsales.select("Units_Sold", "Unit_Cost").filter($"Units_Sold" > 8000 &&

4. Aggregate the dataframe via group by “Region” and count

val grouping = worldsales.select ("Region") •groupBy("Region").agg(count("Region") .alias

8. View on “Salesview” from the first dataframe

select * from RegionView

select Region, Sum(Units_Sold) from SalesnView group by Region

select Region, Sum(Total_Profit) as Profit, Sum(Total_Revenue) as Revenue, Sum(Total_Cost)

select Region,Avg (Total _Profit) from SalesnView group by Region

You might also like