0% found this document useful (0 votes)
14 views29 pages

TEB2043 Introduction To Data Science: Descriptive Analytics & Visualization DR Shuhaida Mohamed Shuhidan JAN 2025

This document outlines an introductory course on Data Science focusing on Descriptive Analytics and Visualization. It covers key concepts such as descriptive statistics, including measures of central tendency and variability, as well as practical implementation using R for data analysis and visualization techniques. The course also introduces various visualization methods, including scatter plots, histograms, and bar charts, along with case studies for practical understanding.

Uploaded by

hafizmna04
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views29 pages

TEB2043 Introduction To Data Science: Descriptive Analytics & Visualization DR Shuhaida Mohamed Shuhidan JAN 2025

This document outlines an introductory course on Data Science focusing on Descriptive Analytics and Visualization. It covers key concepts such as descriptive statistics, including measures of central tendency and variability, as well as practical implementation using R for data analysis and visualization techniques. The course also introduces various visualization methods, including scatter plots, histograms, and bar charts, along with case studies for practical understanding.

Uploaded by

hafizmna04
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

TEB2043

Introduction to Data Science


Descriptive Analytics & Visualization

Dr Shuhaida Mohamed Shuhidan


JAN 2025
Credit: Ts Dr Nurul Aida Osman
Learning Outcomes

At the end of this session, you will be able to:


• Explain about descriptive analytics
• Implement R code for descriptive statistics
• Implement R code for visualization

2
I. Descriptive Analytics
Descriptive Analytics

• The examination of data or content, usually manually performed, to answer the question “What
happened?” (or What is happening?), characterized by traditional business intelligence (BI) and
visualizations such as pie charts, bar charts, line graphs, tables, or generated narratives. (gartner.com)

• The interpretation of historical data to better understand changes that have occurred in a business.
Descriptive analytics describes the use of a range of historic data to draw comparisons. Most commonly
reported financial metrics are a product of descriptive analytics, for example, year-over-year pricing
changes, month-over-month sales growth, the number of users, or the total revenue per subscriber. These
measures all describe what has occurred in a business during a set period. (investopedia.com)

4
Descriptive Statistics

• Data are described in measurements e.g., measures of central tendency and measures of variability or
dispersion.
• Central tendency - represents the whole set of data by a single value. It gives us the location of central
points.
o Mean
o Mode
o Median
• Variability/dispersion - the spread of data or how well is our data is distributed.
o Range
o Variance
o Standard deviation

5
Descriptive Statistics

• Import iris data from library


data <- iris
• The use of head function can describe the data
head(data)

• Or tail function
tail(data)

6
Descriptive Statistics AL2 Think Pair Share
(Quiz #2 - Part of 2%)

• The str function is used to describe the structure of the data


str(data)

Quiz #2
Define Factor and discuss the meaning of “Factor w/ 3 levels” in the above figure.

7
Descriptive Statistics

• The minimum and maximum values from the data can be determined using min and max functions
respectively.
min(data$Sepal.Length) #this produces 4.3
max(data$Sepal.Length) #this produces 7.9
• Alternatively, minimum and maximum values can be determined by range function.
range(data$Sepal.Length)
• The above code produces 4.3 7.9 which we can extract any of the values using index.
range(data$Sepal.Length)[1] #this produces 4.3
range(data$Sepal.Length) [2] #this produces 7.9
• Similarly, the code above can be written as:
range_val <- range(data$Sepal.Length)
range_val[1]
range_val[2]
• Sometimes, we may want to determine the range value; we may do that using min and max functions:
the_range <- max(data$Sepal.Length)-min(data$Sepal.Length)
the_range #this produces 3.6
8
Descriptive Statistics AL3 In-Class Teams
(Quiz #2 - Part of 2%)

• The mean and median of data can be determined with the mean and median functions.
mean(data$Sepal.Length) #this produces 5.843333
median(data$Sepal.Length) #this produces 5.8

Quiz #2:
How about mode? Investigate about the mode calculation in R

9
Descriptive Statistics

• Standard deviation and variance can be determined with the sd and var functions respectively.
sd(data$Sepal.Length) #this produces 0.8280661
var(data$Sepal.Length) #this produces 0.6856935

• Variance (σ2) is the average of the squared differences from the Mean.

• Standard deviation (σ) is the square root of the Variance.

Normal distribution

Image source: mathisfun.com

10
Descriptive Statistics
• We have seen how summary function is used to generate useful descriptive statistics.

summary(data) summary(data$Sepal.Length)

• summary function can be further varied using by function as follows:

by(data, data$Species, summary)

11
Descriptive Statistics

• Data can be divided into quartiles.


• First quartile (lower quartile) → the value that cuts off the first 25% of the data when it is sorted in
ascending order.
• Third quartile (upper quartile) → the value that cuts off the first 75% when it is sorted in ascending order.
• Assume that a vector A contains 9 values:
A<-c(170.2, 181.5, 188.9, 163.9, 166.4, 163.7, 160.4, 175.8, 181.5)
• Using quantile function:
quantile(A)

1st Quartile 3rd Quartile n=9


1st Quartile=0.25*9
Round up 2.25=3
• Let’s check: 3rd value=163.9

sort(A)
n=9
3rd Quartile=0.75*9
Round up 6.75=7
7th value=181.5 12
Descriptive Statistics

• The quantile function can be used for specific quartile:


quantile(A,0.25)
quantile(A,0.75)
• As expected, other quartiles can also be determined using the quantile function:
quantile(A,0.4)
quantile(A,0.8)
*find out how R determines the results for the above code
• There is a specific function known as IQR that calculates the interquartile range (i.e., the difference between
the 3rd and the 1st quartiles):
IQR(A)

13
Descriptive Statistics

• Other commonly used descriptive statistics:


o Counting the number of rows
nrow(data)
nrow(data[‘Sepal.Length’])
o Counting the number of columns
ncol(data)
o Counting the number of NA
sum(is.na(data$Sepal.Length))
o Counting the number of negative values
sum(data$Sepal.Length<0)
o Counting the number of unique text-based values (non-numeric)
B<-c(rep("Yellow",2),rep("Red",3),rep("Yellow",3),rep("Black",3))
factor(B)
14
II. Visualization
Scatter Plot

• Other than iris, there are a number of other built-in datasets provided by R:
data()
• Import mtcars data:
data <- mtcars
• View the mtcars data:
data
• To plot a scatter plot, x and y values have to be defined, and plot function is used:
x<- -10:10
y<-x*x
plot(x,y,xlab='x',ylab='y',col='red')

16
Scatter Plot

• Let’s plot mpg (y) vs wt (x)


x<-data$wt
y<-data$mpg
plot(x,y,xlab='wt',ylab='mpg',col='green')

17
Histogram

hist(data$mpg,col=“green”)

18
Bar Chart
val=data$mpg
carnames=row.names(data)
barplot(val,ylab='mpg',main="Car - MPG",names.arg= carnames,
cex.names=0.6,las=2,col="blue")

19
More on Scatter Plot
my<-read.csv("C:/Users/nurulaida/OneDrive - Universiti Teknologi
PETRONAS/R/IDS/covid_my.csv")
my

x<-1:15
y<-my$Confirmed
plot(x,y,pch=16,col='blue',ylab='Confirmed case',main="Covid-19 Confirmed
Cases in Malaysia")
text(x,y,labels=my$State,pos=4,cex=0.5)

20
More on Bar Chart
val=my$Deaths
name_st=my$State
barplot(val,ylab='Deaths',main="Covid-19 Deaths in Malaysia",names.arg=
name_st, cex.names=0.6,las=2,col="orange")

21
Pie Chart

lbl=my$State
val2=my$Confirmed
pie(val2,lbl,cex=0.5)

22
3D Pie Chart
library(plotrix)
val2=my$Confirmed
lbl=my$State

pie3D(val2,
col = hcl.colors(length(val2), "Spectral"),
border = "white",
labels=lbl,labelcex=0.5)

23
Exploded 3D Pie Chart
library(plotrix)
val2=my$Confirmed
lbl=my$State

pie3D(val2,
col = hcl.colors(length(val2), "Spectral"),
border = "white",
labels=lbl,labelcex=0.5,explode=0.2)

24
Exploded 3D Pie Chart
val2=my[my$Confirmed>300000,]
val3=val2$Confirmed
lbl=paste(val2$State,val2$Confirmed,sep=",")

pie3D(val3,
col = hcl.colors(length(val3), "Spectral"),
border = "white",
labels=lbl,labelcex=0.5,explode=0.2)

25
Case Study 1 – Stacked Bar Chart AL5 Reflection
(Quiz #2 - Part of 2%)

val2=my[my$Confirmed>300000,]
tbl1=val2[c("Confirmed","Population")]
legendval=c("Confirmed","Population")
colors=c("green","orange")
row.names(tbl1)=val2$State
tbl2=t(tbl1)
options(scipen=999)
barplot(as.matrix(tbl2),col=colors,cex.names=0.8,las=2, cex.axis=0.8)
legend("topright", legendval, cex=0.8, fill=colors)

26
Case Study 2 – Geomap AL5 Reflection
(Quiz #2 - Part of 2%)

library(rworldmap) #to get a Malaysia map


library(tidyverse)
library(tidygeocoder)

mydat<-read.csv("C:/Users/nurulaida.osman/OneDrive - Universiti Teknologi


PETRONAS/R/IDS/covid_my.csv")
global <- map_data("world") #get map
ggplot() +
geom_polygon(data = global %>% filter(region == "Malaysia"), aes(x=long, y =
lat, group=group),
fill = "lightskyblue1") +
coord_fixed(1.3) +
geom_point(data = mydat, aes(x = Long, y = Lat),color="red") +
geom_text(
data = mydat,label=paste(mydat$State,mydat$Confirmed,sep=","), aes(x = Long,
y = Lat),
nudge_x = 0.25, nudge_y = 0.25,
color = "black", size=1.5
) +
theme_void()
27
Case Study 2 – Geomap AL5 Reflection
(Quiz #2 - Part of 2%)

28
Summary
You have learned…
✓ Descriptive analytics
✓ Descriptive statistics that are important for descriptive analytics (and data preparation)
✓ Visualization

More variations of visualization can be produced using…


➢ ggplot2
➢ plotly
➢ leaflet

Next…
❖ More on data cleaning
❖ Feature settings
29

You might also like