0% found this document useful (0 votes)
54 views68 pages

4 Data-Visualization

Data visualization is the presentation of data in a graphical format. It allows humans to more easily recognize patterns and trends. Visualizations can explain complex ideas simply and tell a story faster than text alone. The sample data shows sales information that can be analyzed through visualization. Graphs are created from the data, allowing questions about order methods, product lines, and revenue to be answered more easily.

Uploaded by

Pratyush Jain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views68 pages

4 Data-Visualization

Data visualization is the presentation of data in a graphical format. It allows humans to more easily recognize patterns and trends. Visualizations can explain complex ideas simply and tell a story faster than text alone. The sample data shows sales information that can be analyzed through visualization. Graphs are created from the data, allowing questions about order methods, product lines, and revenue to be answered more easily.

Uploaded by

Pratyush Jain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 68

D ata

Science
Data
Visualization
Agenda

01 Understanding Data Visualization 02 Base Graphics in R

03 Visualization with GGPLOT2


Data Visualization
Data Visualization

Data visualization is basically presentation of data with graphics. It's a way to summarize your findings and display it in a
form that facilitates interpretation and can help in identifying patterns or trends
Importance of Data Visualization
Importance of Data Visualization

A picture is worth a 1000 words. Humans are easily attracted to visuals and most of us prefer to understand a particular
scenario through pictures instead of text

It is faster to recognize a result than to read a paragraph. When done well, visualizations explain complex ideas simply.
Oftentimes charts tell the story much faster than a prose description or a table presentation
Importance of Data Visualization
Below is the sample data: Sales Products
Order method Product line Product type Product Year Revenue Quantity Gross margin
Retailer country Retailer type
type
Personal
United States Telephone Golf Shop Accessories Navigation Trail Master 2012 1095 3 0.34767123

Camping
United States Telephone Department Store Equipment Sleeping Bags Hibernator 2012 160103.2 1160 0.3769019

Camping Hibernator Self -


United States Telephone Department Store Equipment Sleeping Bags Inflating Mat 2012 66514.28 556 0.5440107

After looking at the data United States Telephone Department Store


Camping
Equipment Sleeping Bags Hibernator Pad 2012 16205.73 411 0.51382196

Camping
United States Telephone Department Store Equipment Sleeping Bags Hibernator Pillow 2012 33520.42 2475 0.46298346

Is it possible to answer the following questions?


Camping TrailChef Water
United States Fax Outdoors Shop Equipment Cooking Gear Bag 2013 19418.52 3102 0.53194888

Camping
United States Fax Outdoors Shop Equipment Cooking Gear TrailChef Cook Set 2013 42304.32 794 0.34365616

• Which order method type has highest revenue Camping TrailChef Single
United States Fax Outdoors Shop Equipment Cooking Gear Flame 2013 52266.32 824 0.26880025

? United States Telephone Department Store


Camping
Packs
Canyon Mule
2012 235660.36 676 0.38805542
Equipment Journey Backpack

• Which product line has highest quantity ? United States Telephone Department Store
Camping
Equipment Packs
Canyon Mule
Cooler 2012 53822.16 1652 0.50890117

Personal
United States Web Golf Shop Accessories Watches Venue 2014 73949 1013 0.42858294

Personal
United States Web Golf Shop Accessories Watches Infinity 2014 157890.2 665 0.45986527

Personal
United States Web Golf Shop Accessories Watches Lux 2014 67265.2 396 0.48654044
Importance of Data Visualization

After displaying the data in the form of graphs . Lets try answering the questions

400

E-mail
Fax 350

300
Sales visit Telephone
250

Quantity
E-mail 200
Fax Sales
visit 150

Telephone
100
Web
50

Web 0
Camping Golf Equipment Mountaineering Outdoor Personal
Equipment Equipment Protection Accessories
Product line
Importance of Data Visualization

After displaying the data in the form of graphs . Lets try answering the questions

E-mail
Which order method type has highest revenue ?
Fax

Sales visit
Telephone

E-mail
Fax
Web order method type has highest revenue
Sales visit
among all the order method type.
Telephone
Web Moreover looking at the pie chart you can also
tell that which order method type has lowest
revenue : Email.
Web

Web
Importance of Data Visualization

After displaying the data in the form of graphs . Lets try answering the questions

400

350 Which product line has highest quantity ?

300

250
Quantity

200

150
From the graph it can be easily seen that
100
Personal Accessories has highest number
50
among all the product line.
0
Camping Golf Equipment Mountaineering Outdoor Personal
Equipment Equipment Protection Accessories
Product line
Base Graphics in R
Base Graphics in R

Base Graphics helps in making simple graphs


Bar Plot
Bar Plot

Bar Plots are suitable for showing comparison between cumulative totals across several groups
Bar Plot

Making a simple Bar-plot

plot(customer_churn$Dependent
s)
Bar Plot

Adding color

plot(customer_churn$Dependents, col="coral")
Bar Plot

Adding x-axis label & title

plot(customer_churn$Dependents,
col="coral",xlab="Dependents",
main="Distribution of Dependents")
Bar Plot

Bar-plot for ‘PhoneService’ column

plot(customer_churn$PhoneService,col="aquamarine4"
)
Bar Plot

Bar-plot for ‘Contract’ column

plot(customer_churn$Contract,col="palegreen4",
xlab="Contract",
main="Distribution of Contract")
Histogram
Histogram

Histogram is basically a plot that breaks the data into bins (or breaks) and shows frequency distribution of these bins
Histogram

Making a simple histogram

hist(customer_churn$tenure)
Histogram

Adding Color

hist(customer_churn$tenure,col="olivedrab")
Histogram

Change the number of bins

hist(customer_churn$tenure,
col="olivedrab",
breaks=30)
Grammar of Graphics
Grammar of Graphics

Every form of communication needs to have grammar. Since, visualization is also a form of communication, it needs to
have a foundation of grammar

I am John Am John I
Components of Grammar of Graphics

Element Description

Data The data-set for which we would want to plot a graph

Aesthetics The metrics onto which we plot our data

Geometry Visual elements to plot the data

Facet Groups by which we divide the data


Visualization with ggplot2
Visualization with ggplot2

ggplot2 is a system for declaratively creating graphics,


based on the Grammar of Graphics. You provide the
data, tell ggplot2 how to map variables to aesthetics,
what graphical primitives to use, and it takes care of
the details
geom_hist()
geom_hist()

geom_hist() function helps in making histograms with ggplot2


geom_hist()

Build a histogram for ‘tenure’


column

ggplot(data=customer_churn,
aes(x=tenure))+geom_histogram()
geom_hist()

Change the number of bins

ggplot(data = customer_churn,
aes(x=tenure))+geom_histogram(bins = 50)
geom_hist()

Add fill color and border color

ggplot(data = customer_churn, aes(x=tenure))+


geom_histogram(fill="palegreen4", col="green")
geom_hist()

Assign ‘Partner’ to fill aesthetic

ggplot(data = customer_churn, aes(x=tenure,


fill=Partner))+geom_histogram()
geom_hist()

Set Position to be ‘identity’

ggplot(data = customer_churn,
aes(x=tenure, fill=Partner))+
geom_histogram(position = "identity")
geom_bar()
geom_bar()

geom_bar() function helps in making bar-plots with ggplot2


geom_bar()

Build a bar-plot for ‘Dependents’ column

ggplot(data = customer_churn,
aes(x=Dependents))+geom_bar()
geom_bar()

Add fill
color

ggplot(data = customer_churn, aes(x=Dependents))


+geom_bar(fill="chocolate")
geom_bar()

Assigning ‘DeviceProtection’ column to


fill aesthetic

ggplot(data = customer_churn,
aes(x=Dependents,fill=DeviceProtection))
+geom_bar()
geom_bar()

Set the position to ‘dodge’

ggplot(data = customer_churn,
aes(x=Dependents,fill=DeviceProtection))
+geom_bar(position=‘dodge’)
geom_point()
geom_point()

geom_point() function helps in making scatterplots with ggplot2. A scatter plot helps in understanding how does one
variable change w.r.t another variable. It is used for two continuous values
geom_point()

Scatter-plot between ‘TotalCharges’


& ‘tenure’

ggplot(data = customer_churn,
aes(y=TotalCharges,x=tenure))+geom_point(
)
geom_point()

Adding color

ggplot(data = customer_churn,
aes(y=TotalCharges,x=tenure)) +
geom_point(col="slateblue3")
geom_point()

Map ‘Partner’ to col aesthetic

ggplot(data = customer_churn,
aes(y=TotalCharges,x=tenure, col=Partner)) +
geom_point()
geom_point()

Map ‘InternetService’ to col


aesthetic

ggplot(data = customer_churn,
aes(y=TotalCharges,x=tenure,
col=InternetService)) +
geom_point()
geom_boxplot()
geom_boxplot()

geom_boxplot() function helps in making boxplots with ggplot2. Box Plot shows 5 statistically significant numbers- the minimum, the
25th percentile, the median, the 75th percentile and the maximum. It is thus useful for visualizing the spread of the data and deriving
inferences accordingly
geom_boxplot()

Box-plot between ‘MonthlyCharges’ &


‘InternetService’

ggplot(data =
customer_churn,aes(y=MonthlyCharges,x=In
ternetService))+geom_boxplot()
geom_boxplot()

Add fill color

ggplot(data =
customer_churn,aes(y=MonthlyCharges,x=Inte
rnetService))+geom_boxplot(fill="violetred4")
Faceting the data
Faceting the data

facet_grid() is used to facet the data. Faceting is used when the plot is too chaotic and some variables have to be grouped into different
facets to have a better visualization
facet_grid()

Initial Graph After Faceting

ggplot(data = customer_churn, ggplot(data = customer_churn,


aes(x=tenure,fill=InternetService)) aes(x=tenure,fill=InternetService))+
+ geom_histogram() geom_histogram()+ facet_grid(~InternetService)
facet_grid()

Initial Graph After Faceting

ggplot(data = customer_churn, ggplot(data = customer_churn,


aes(y=TotalCharges,x=tenure, col=Contract))+ aes(y=TotalCharges,x=tenure, col=Contract))+
geom_point()+geom_smooth() geom_point()+geom_smooth()+facet_grid(~Contract)
Theme Layer
Theme Layer

Theme layer is used to add themes to our plots


Theme Layer

Initial Graph After Adding Title

ggplot(data = customer_churn,aes(x=tenure))+
geom_histogram(fill="tomato3", g1+labs(title = "Distribution of tenure")->g2
col="mediumaquamarine") -> g1
Theme Layer

After Adding Title After Adding Panel Background

g2+theme(panel.background =
g1+labs(title = "Distribution of tenure")->g2
element_rect(fill = "olivedrab3"))->g3
Theme Layer

After Adding Panel Background After Adding Plot Background

g2+theme(panel.background = g3+theme(plot.background =
element_rect(fill = "olivedrab3"))->g3 element_rect(fill = "palegreen4"))->g4
Theme Layer

After Adding Plot Background After Changing Title

g3+theme(plot.background = g4+theme(plot.title = element_text(hjust = 0.5, face="bold",


element_rect(fill = "palegreen4"))->g4 colour = "peachpuff"))
Quiz
Quiz

Which of these is the correct code to make a bar-plot for the ‘TechSupport’ column. The color of the bars
should be ‘blue’ & the title of the plot should be ‘Distribution of Tech Support’

1. plot(customer_churn$TechSupport, col="blue", main="Distribution of Tech Support")

2. plot(customer_churn$TechSupport, fill="blue", main="Distribution of Tech Support")

3. plot(customer_churn$TechSupport, col="blue", title="Distribution of Tech Support")

4. plot(customer_churn$TechSupport, color="blue", title="Distribution of Tech Support")


Quiz

Which of these is the correct code to make a histogram for the ‘tenure’ column. The fill color of the bins
should be ‘azure’ & the number of bins should be 87

1. ggplot(data = customer_churn,aes(x=tenure,col='azure'))+geom_histogram(bins=87)

2. ggplot(data = customer_churn,aes(x=tenure))+geom_histogram(col="azure",bins=87)

3. ggplot(data = customer_churn,aes(x=tenure))+geom_histogram(fill="azure",bins=87)

4. ggplot(data = customer_churn,aes(x=tenure,fill='azure'))+geom_histogram(bins=87)
Quiz

Which of these is the correct code to make a bar-plot for the ‘OnlineBackup’ column. The color of the bars should
be determined by the ‘PhoneService’ column

1. ggplot(data = customer_churn,aes(fill=OnlineBackup,x=PhoneService))+geom_bar()

2. ggplot(data = customer_churn,aes(y=OnlineBackup,fill=PhoneService))+geom_bar()

3. ggplot(data = customer_churn,aes(x=OnlineBackup))+geom_bar(fill=PhoneService)

4. ggplot(data = customer_churn,aes(x=OnlineBackup,fill=PhoneService))+geom_bar()
Quiz

To which of these geometries can you add the facet_grid()?

1. geom_bar()

2. geom_histogram()

3. geom_point()

4. All of the above


Thank
You

You might also like