0% found this document useful (0 votes)
36 views23 pages

STAT 214-T241-Lab 2

The document outlines the curriculum for the STAT 214 lab course focusing on descriptive statistics using R-Studio for the semester 241 (2024-2025). Key topics include creating frequency tables and graphs for both categorical and numerical data, computing numerical measures, and utilizing various R commands for data analysis. Examples and exercises are provided to reinforce learning, including the use of built-in datasets like 'States' and 'iris'.

Uploaded by

binladn40499
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views23 pages

STAT 214-T241-Lab 2

The document outlines the curriculum for the STAT 214 lab course focusing on descriptive statistics using R-Studio for the semester 241 (2024-2025). Key topics include creating frequency tables and graphs for both categorical and numerical data, computing numerical measures, and utilizing various R commands for data analysis. Examples and exercises are provided to reinforce learning, including the use of built-in datasets like 'States' and 'iris'.

Uploaded by

binladn40499
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

College of Computing & Mathematics

Department of Mathematics

STAT 214 - LAB (R-Studio)

Descriptive Statistics
Semester 241 (2024-2025)
What will we learn?
0. Required packages / Library
1. Frequency Table & contingency Table for categorical variable
2. Graphs and charts for categorical data:
➢ Bar chart
➢ pie chart
3. Frequency Table for numerical variable
4. Graphs and charts for numerical data
➢ Stem and leaf
➢ Histogram
5. Computing Numerical Measures
➢ Computing measures for a variable in the data set & the Box-plot
➢ Computing measures the function “summary”
0. Required packages / Library
Before starting Install the following package:

❖ library(carData)
❖ library(stats)
❖ library(tigerstats)
1. Frequency Tables:
Categorical Variable
Example 1 (Frequency Table for categorical variable):
- Load the builtin data file “States”.

- Find the frequency table for “region”.

Use the command:


data.frame(table(States$region))
Example 2 (2-Way table ):
The data in the file “survey” represent the responses to two questions asked in a survey of 40 college students
majoring in business
1.“What is your gender?” male = M; female = F
2.What is your major? Accounting = A; computer information system = C; marketing = M

-Load the data file “survey”.


-Find the 2-way table for region vs group?
Answer: Use the command xtabs
xtabs(~Gender+Major,data=survey)
To find the overall proportion, the row percentage or the column percentage, we use the commands
proportions(xtabs(~Gender+Major,data=survey))
rowPerc(xtabs(~Gender+Major,data=survey))
colPerc(xtabs(~Gender+Major,data=survey))
2. Graphs and charts for categorical data:
Bar and Pie charts
Example 3: (Graphs and charts for categorical data: Bar chart)
- Load the builtin data file “States”.

- Draw the Bar-chart for “region”.


Use the command:
barplot(table(States$region))

Also try:
barplot(table(States$region),ylim = c(0,10),xlab = "region",ylab = "freq")
Example 4:(Graphs and charts for categorical data: Pie chart )
Draw the Pie-chart for “region”.
Use the command:
pie(table(States$region))

Exercise:
-Load the built-in data file “UN”.
-draw the Bar and Pie charts for “region“?
3. Frequency Tables :
Numerical Variable
Example 6 : (intervals closed from the right)

- Find the frequency table for “pay” with starting interval (20,25].
Use the commands:
pay_intervals=factor(cut(States$pay,breaks = 20+5*(0:5), right = TRUE))
frqtab=data.frame(table(pay_intervals))
frqtab A factor is a vector object used to specify a discrete classification (grouping)
of the components of other vectors of the same length.
Cut(): define the intervals: (20,25],(25,30]...(40,45] (since the max is 43)
? try the command with breaks= 20+5*(0:7)
pay_intervals=factor(cut(States$pay,breaks = 20+5*(0:7)))
frqtab=data.frame(table(pay_intervals))
Example 8 : (intervals closed from the left)
- Find the frequency table for “pay” with starting interval [20,25).
Pay_intervals2=factor(cut(States$pay,breaks = 20+5*(0:5), right = FALSE))
frqtab=data.frame(table(Pay_intervals2))
frqtab
4. Graphs and charts for

Numerical data
Example 9:Graphs and charts for Numerical data(Stem & leaf plot)

Draw the stem & leaf for “pay”.


Use the command:
stem(States$pay,scale=0.4)
Example 10: Graphs and charts for Numerical data(Histogram)
- Load the built-in data file “States”.

- Draw the histogram for “pay”.


Use the command:
hist(States$pay)

Customize the histogram :


hist(States$pay, breaks = 18+5*(0:7)) # start 1-st interval with 18 and CW=5
hist(States$pay, breaks = 20+6*(0:7)) # start 1-st interval with 20 and CW=6
4. Computing Numerical Measures
for a variable in a data set & the
box-plot
Example 11: (Computing Numerical Measures)
1- Load the builtin data file “iris”

2- For the variable “Sepal.Width”


1. Find the length.
length(iris$Sepal.Width) Answer: [1] 150
2. Find the mean.
mean(iris$Sepal.Width) Answer: [1] 3.057333
3. Find the variance.
var(iris$Sepal.Width) Answer: [1] 0.1899794
Example 11: (Computing Numerical Measures)
4. Find the Minimum, Quartiles, and Maximum.
quantile(iris$Sepal.Width, probs = c(0,0.25,0.5,0.75,1))
Answer:
0% 25% 50% 75% 100%
2.0 2.8 3.0 3.3 4.4
Exercise:
1-For the variable Petal. Length Find the mean, std, Q1,Q2, Q3 and P915 .
Hint Use the following commands:
mean(iris$Petal.Length)
var(iris$Petal.Length) .... then find the std
quantile(iris$Petal.Length, probs = c(0.22, 0.5, 0.75,0.915))

2- Find the IQR? Use the command : IQR(iris$Petal.Length)


Example 12: (Box plot)

Draw the Box-plot for “pay”.


Use the command:
boxplot(States$pay,horizontal = TRUE)
Exercise: Graphs and charts for Numerical data
- Load the builtin data file “UN”.
> stem(UN$fertility)
- Draw the histogram (starting with point
The decimal (1,1.5]) for
is at the | “fertility“?
- Draw the stem & leaf for “fertility“?
1 | 11233344444444
- Draw the Box-plot for “fertility“?
1 | 5555555555555555555566666666777777788888888899999999999
2 | 0000000001111111112222222222333333333444444444
2 | 555555666666678999
3 | 00111122222233
3 | 55678888889
try the commands: 4 | 001222334444
hist(UN$fertility, breaks = 1+.5*(0:12))
4 | 5566777799
stem(UN$fertility) 5 | 0001134
5 | 557899
boxplot(UN$fertility,horizontal =6 |TRUE)
00133
6|9
Computing Numerical
Measures using the function
“summary”
Example 13 (Computing Numerical Measures) :
- For the variable “Sepal.Width” you can compute the Mean, Median, Minimum,
Maximum, and Quartiles using the summary function:
summary(iris$Sepal.Width)
Min. 1st Qu. Median Mean 3rd Qu. Max.
2.000 2.800 3.000 3.057 3.300 4.400
** Controlling the number of digits
summary(iris$Sepal.Width, digits=3)
Min. 1st Qu. Median Mean 3rd Qu. Max.
2.00 2.80 3.00 3.06 3.30 4.40

*** To compute the measures for all vars in “iris” try the command
summary(iris, digits=3)
THANKS!

You might also like