College of Computing & Mathematics
Department of Mathematics
STAT 214 - LAB (R-Studio)
Descriptive Statistics
Semester 241 (2024-2025)
What will we learn?
0. Required packages / Library
1. Frequency Table & contingency Table for categorical variable
2. Graphs and charts for categorical data:
➢ Bar chart
➢ pie chart
3. Frequency Table for numerical variable
4. Graphs and charts for numerical data
➢ Stem and leaf
➢ Histogram
5. Computing Numerical Measures
➢ Computing measures for a variable in the data set & the Box-plot
➢ Computing measures the function “summary”
0. Required packages / Library
Before starting Install the following package:
❖ library(carData)
❖ library(stats)
❖ library(tigerstats)
1. Frequency Tables:
Categorical Variable
Example 1 (Frequency Table for categorical variable):
- Load the builtin data file “States”.
- Find the frequency table for “region”.
Use the command:
data.frame(table(States$region))
Example 2 (2-Way table ):
The data in the file “survey” represent the responses to two questions asked in a survey of 40 college students
majoring in business
1.“What is your gender?” male = M; female = F
2.What is your major? Accounting = A; computer information system = C; marketing = M
-Load the data file “survey”.
-Find the 2-way table for region vs group?
Answer: Use the command xtabs
xtabs(~Gender+Major,data=survey)
To find the overall proportion, the row percentage or the column percentage, we use the commands
proportions(xtabs(~Gender+Major,data=survey))
rowPerc(xtabs(~Gender+Major,data=survey))
colPerc(xtabs(~Gender+Major,data=survey))
2. Graphs and charts for categorical data:
Bar and Pie charts
Example 3: (Graphs and charts for categorical data: Bar chart)
- Load the builtin data file “States”.
- Draw the Bar-chart for “region”.
Use the command:
barplot(table(States$region))
Also try:
barplot(table(States$region),ylim = c(0,10),xlab = "region",ylab = "freq")
Example 4:(Graphs and charts for categorical data: Pie chart )
Draw the Pie-chart for “region”.
Use the command:
pie(table(States$region))
Exercise:
-Load the built-in data file “UN”.
-draw the Bar and Pie charts for “region“?
3. Frequency Tables :
Numerical Variable
Example 6 : (intervals closed from the right)
- Find the frequency table for “pay” with starting interval (20,25].
Use the commands:
pay_intervals=factor(cut(States$pay,breaks = 20+5*(0:5), right = TRUE))
frqtab=data.frame(table(pay_intervals))
frqtab A factor is a vector object used to specify a discrete classification (grouping)
of the components of other vectors of the same length.
Cut(): define the intervals: (20,25],(25,30]...(40,45] (since the max is 43)
? try the command with breaks= 20+5*(0:7)
pay_intervals=factor(cut(States$pay,breaks = 20+5*(0:7)))
frqtab=data.frame(table(pay_intervals))
Example 8 : (intervals closed from the left)
- Find the frequency table for “pay” with starting interval [20,25).
Pay_intervals2=factor(cut(States$pay,breaks = 20+5*(0:5), right = FALSE))
frqtab=data.frame(table(Pay_intervals2))
frqtab
4. Graphs and charts for
Numerical data
Example 9:Graphs and charts for Numerical data(Stem & leaf plot)
Draw the stem & leaf for “pay”.
Use the command:
stem(States$pay,scale=0.4)
Example 10: Graphs and charts for Numerical data(Histogram)
- Load the built-in data file “States”.
- Draw the histogram for “pay”.
Use the command:
hist(States$pay)
Customize the histogram :
hist(States$pay, breaks = 18+5*(0:7)) # start 1-st interval with 18 and CW=5
hist(States$pay, breaks = 20+6*(0:7)) # start 1-st interval with 20 and CW=6
4. Computing Numerical Measures
for a variable in a data set & the
box-plot
Example 11: (Computing Numerical Measures)
1- Load the builtin data file “iris”
2- For the variable “Sepal.Width”
1. Find the length.
length(iris$Sepal.Width) Answer: [1] 150
2. Find the mean.
mean(iris$Sepal.Width) Answer: [1] 3.057333
3. Find the variance.
var(iris$Sepal.Width) Answer: [1] 0.1899794
Example 11: (Computing Numerical Measures)
4. Find the Minimum, Quartiles, and Maximum.
quantile(iris$Sepal.Width, probs = c(0,0.25,0.5,0.75,1))
Answer:
0% 25% 50% 75% 100%
2.0 2.8 3.0 3.3 4.4
Exercise:
1-For the variable Petal. Length Find the mean, std, Q1,Q2, Q3 and P915 .
Hint Use the following commands:
mean(iris$Petal.Length)
var(iris$Petal.Length) .... then find the std
quantile(iris$Petal.Length, probs = c(0.22, 0.5, 0.75,0.915))
2- Find the IQR? Use the command : IQR(iris$Petal.Length)
Example 12: (Box plot)
Draw the Box-plot for “pay”.
Use the command:
boxplot(States$pay,horizontal = TRUE)
Exercise: Graphs and charts for Numerical data
- Load the builtin data file “UN”.
> stem(UN$fertility)
- Draw the histogram (starting with point
The decimal (1,1.5]) for
is at the | “fertility“?
- Draw the stem & leaf for “fertility“?
1 | 11233344444444
- Draw the Box-plot for “fertility“?
1 | 5555555555555555555566666666777777788888888899999999999
2 | 0000000001111111112222222222333333333444444444
2 | 555555666666678999
3 | 00111122222233
3 | 55678888889
try the commands: 4 | 001222334444
hist(UN$fertility, breaks = 1+.5*(0:12))
4 | 5566777799
stem(UN$fertility) 5 | 0001134
5 | 557899
boxplot(UN$fertility,horizontal =6 |TRUE)
00133
6|9
Computing Numerical
Measures using the function
“summary”
Example 13 (Computing Numerical Measures) :
- For the variable “Sepal.Width” you can compute the Mean, Median, Minimum,
Maximum, and Quartiles using the summary function:
summary(iris$Sepal.Width)
Min. 1st Qu. Median Mean 3rd Qu. Max.
2.000 2.800 3.000 3.057 3.300 4.400
** Controlling the number of digits
summary(iris$Sepal.Width, digits=3)
Min. 1st Qu. Median Mean 3rd Qu. Max.
2.00 2.80 3.00 3.06 3.30 4.40
*** To compute the measures for all vars in “iris” try the command
summary(iris, digits=3)
THANKS!