Stat 153 Unit 2b
Stat 153 Unit 2b
STATS 153
1 / 63
What we Have learnt
I Explain what is meant by descriptive Statistics.
I Population
I Discrete variable
I Sample
I Continuous variable
4. Measures of Dispersion
5. Measures of Position
6. Measures of Shape
3 / 63
Consider the following data
4,3,5,3,2,75,4
I Outliers
4 / 63
Tabular Representation of Data
The data gathered from a survey/experiment are usually
summarised or organised numerically in tabular form using a
frequency distribution table
I The distribution is said to be ungrouped if it shows the
distinct observations and their corresponding occurrences,
called frequencies
I If the number of observations is too large then they are
put into groups, called classes or categories
I The number of classes is usually chosen between 5 and
20, inclusive.
I 2k+ ≥ n
5 / 63
Terminologies
I Class Interval
I Class Limits
I Class Boundaries
I Class Width
6 / 63
Class Limit, boundary, interval, width and midpoint
Class Limits
Class limits are the smallest and largest observations (data,
events etc) in each class. Hence, each class has two limits: a
lower and upper limit.
Class Frequency
300 - 399 13
400 - 499 20
500 - 599 7
600 - 699 3
700 - 799 12
800 - 899 8
900 - 999 7
what are the lower and upper class limits for the first three
classes?
7 / 63
Class Boundary
Class Boundaries are the midpoints between the upper class
limit of a class and the lower class limit of the next class in the
sequence.
each class has an upper and lower class boundary.
Class Frequncy
300 - 399 13
400 - 499 20
500 - 599 7
Using the table above, find the class width for the first
class.300 – 399
The class width = Upper class boundary – lower class
boundary
I Upper class boundary = 399.5
Using the table above, find the class midpoint for the first
class.300 – 399
I Upper class boundary = 399.5
10 / 63
Constructing a Frequency Distribution Table
A frequency distribution lists each category of data and the
number of occurrences for each category of data in a table
form. For an ungrouped data
I Set the values of data (Scores) in the column starting
from the lowest value to the highest or vice versa;
I Create the second column with the frequency of each
data occurrence. This column is known as the tally of the
scores;
I Create the third column where the frequency will be
inserted.
11 / 63
12 / 63
25 students were given a blood test to determine their blood
type.the data set is as follows:
A B B AB O
O O B AB B
B B O A O
A O O O AB
AB A O B A
13 / 63
Body masses(kg) of 22 patients.
60, 45, 72, 55, 42, 65, 54, 68, 74, 50, 78, 70, 58, 48, 67, 64,
68, 52, 60, 58, 75, 83
14 / 63
Example l
The data given below are the number of children per family
sampled from a community.
0 1 4 4 3 1 2 3 1 2
2 4 3 0 2 5 0 2 2 1
3 2 1 1 3 2 3 4 5 2
1 0 5 4 2 0 3 5 1 2
4 3 0 2 5 1 1 2 2 4
15 / 63
Sturges Approximation Rule
16 / 63
Sturges Approximation Rule
The no. of classes K and the class width C for frequency
distribution table is given by;
K = 1 + 3.322log ∗ n
n is the Total number of observation
Range
c=
K
1
LB1 = (min. Obs. or lesser)− ( the smallest unit of measurement)
2
UB1 = LB1 + C
17 / 63
The following is the age distribution of patients in ward B at
Komfo Anokye Teaching Hospital
17, 11, 21, 22, 30, 33, 37, 14, 19, 17, 23,27 ,28, 24, 45, 38,
40, 33, 34, 30, 33, 29, 29, 30, 32, 33, 32, 26, 24, 25. Use
18 / 63
frequency
Relative Frequency =
total frequency
Cumulative frequency
Relative Cumulative Frequency =
total frequency
19 / 63
20 / 63
Example 2
21 / 63
Example 2
22 / 63
Example 2- Solution
I By Sturges’ Rule the required number of classes and class
width are
k = 1 + 3.322log10 · 50
= 6.64 ≈ 7
1
LB1 = 0.2 − (0.1) = 0.15
2
23 / 63
Example 2-Solution
The frequency distribution is seen below
Emission rate Tally Frequency Relative
(1b/million) Frequency
0.15 − 0.85 13 0.26
0.85 − 1.55 13 0.26
1.55 − 2.25 10 0.20
2.25 − 2.95 6 0.12
2.95 − 3.65 2 0.04
3.65 − 4.35 4 0.08
4.35 − 5.05 2 0.04
Total 50
24 / 63
I The approximate proportion of states whose emission is
I between 0.9 and 2.2 1b/millon
13 + 10 23
= = = 0.46
50 50
4+2 6
= = = 0.12
50 50
25 / 63
Graphical Representation of Data
26 / 63
Graphical Representation of Data
27 / 63
Example 1
28 / 63
29 / 63
30 / 63
Cumulative Frequency Curve(or Ogive)
The cumulative frequency distribution shows the number of
observations that fall above or below a specified value of
observation
33 / 63
Assignment
5.9 10.5 9.9 14.4 16.5 12.7 11.6 7.9 10.9 13.4
8.6 3.8 11.7 12.5 9.1 9.1 12.3 11.5 7.4 8.8
11.5 13.6 11.5 10.9 12.9 11.2 15.0 12.7 10.1 14.7
9.9 11.4 6.2 8.3 8.1 10.5 8.4 11.2 10.4 9.8
34 / 63
I Group the data into six classes and obtain a relative
frequency distribution
I Draw a histogram for the distribution
35 / 63
Exploratory Data Analysis (EDA):
36 / 63
Exploratory Data Analysis (EDA):
37 / 63
Exploratory Data Analysis (EDA):
The techniques or diagrams of EDA discusses in this section
are the Dotplots, Boxplots and Stem and Leaf.
I Dotplots: A dot plot is a plot that displays a dot for
each value in a data set along a number line. If there are
multiple occurrences of a specific value, then the dots will
be stacked vertically.
I Boxplots: Boxplots are useful for revealing the centre
and spread of the data as well as the outliers of the data
I The Stem-and-Leaf Plots: A stem-and-leaf plot is
another way to represent quantitative data graphically.It
is extremely useful in summarizing reasonably sized data 38 / 63
Example
143 158 136 127 132 132 126 138 119 104 113
90 126 123 121 133 104 99 112 120 107 139
122 137 112 121 140 134 133 123 150 115 141
39 / 63
Solution
Stem-Leaf-Diagram
I Step 1: Determine the smallest and largest number in
the data. For the example above we have 90 and 158.
I Step2: Identify the stems. For any number the digit to
the left is the stem. For example number 90 has a stem
of 9.
I step 3 draw a vertical line. And list the stem numbers to
the left of the line
I step 4Fill in the leaves. For example the leaf of 90 is 0.
40 / 63
Solution
Stem-Leaf-Diagram For our example .We arrange them
vertically and each leaf is recorded against its corresponding
stem.
The R program output for the diagram is as shown below >
p=c (data)
> stem (p, scale=1)
Stem leaves
9|09
10 | 4 4 7
11 | 2 2 3 5 9
12 | 0 1 1 2 3 3 6 6 7
13 | 2 2 3 3 4 6 7 8 9
14 | 0 1 3
15 | 0 8
Key 9|0 represents 90.
41 / 63
BoxPlot
I Box plots (also called box-and-whisker plots or
box-whisker plots) give a good graphical image of the
concentration of the data
I They also show how far the extreme values are from most
of the data.
I A box plot is constructed from five values: the minimum
value, the first quartile, the median, the third quartile,
and the maximum value.
I We use these values to compare how close other data
values are to them.
42 / 63
BoxPlot
43 / 63
box plot diagram: is given by > boxplot (p)
44 / 63
Dotplot
install.packages(’plyr’)
library("plyr")
count(p)
y=count(p)
plot(y,pch=19,xlim=c(90,160),ylim=c(0,20), main=’Dotplot
of Data’)
45 / 63
Assignment 2
The data below shows the results from a study of the total
number of covid-19 infections from 50 districts in Ghana in
2021.
46 / 63
Graphical Representation of Qualitative Data:
47 / 63
Graphical Representation of Qualitative Data:
48 / 63
Simple Bar Charts-Example
49 / 63
library(readxl)
example3 <
−reade xcel(”Desktop/Lect.NotesSem1, 2022/SMO153/example3.xls
”Sheet2”)
View (example3 )
U = table(example3 ) U
barplot(U, xlab =0 No.ofchildren0 , ylab =0 Frequency 0 , col =
0, ) barplot(U, xlab =0 No.ofchildren0 , ylab =0
Frequency 0 , col = 0, horiz = TRUE )
50 / 63
Multiple Bar Charts-Example
The number of females opted to offer programmes in Social
Sciences, Engineering and Science for the period 2016–2020in
KNUST is as in the table below. The given data are displayed
in the multiple bar charts, also shown below.
51 / 63
52 / 63
The death rate (per 1000) in a year of males and females of a
disease in community over a period of 6 years is given as
follows:
53 / 63
54 / 63
Pie Charts:
I its circular diagram giving various fractions of section of a
given data.
I The total number of observations of the data is
represented by a pie which is denoted by a circle.
I The pie is then cut into slices (sectors) where each slice
represents a category of the data
I The size of a slice is proportional to the relative frequency
of a category
I The angle of a slice (sector) at centre of a pie (circle) is
given by the product: RelativeFrequencyx3600 .
55 / 63
Example
Consider the responses regarding the relief provided by a
pain-killing drug.
Response Frequency Relative Angle of Sector
frequency
Excellent 30 0.20 0.20x3600 = 720
Satisfactory 66 0.44 0.44x3600 =
1580
Fair 36 0.24 0.24x3600 =
0
80.4
Poor 18 0.12 0.12x3600 =
0
43.2
Total 150 1.00 3600
56 / 63
3.png 3.png
57 / 63
Consumers spend their incomes on a vast array of goods and
services. The data below provide a guide summary of how the
average consumer dollar is spent.
Category Percentage of income
Medical Care 5
Clothing 15
Entertainment 5
Housing 40
Food 20
Transportation 15
58 / 63
Solution
Category Percentage angle of
of income sector
Medical Care 5 0.05x3600 =
180
Clothing 15 0.15x3600 =
540
Entertainment 5 0.05x3600 =
180
Housing 40 0.4x3600 =
1440
Food 20 0.2x3600 =
720
Transportation 15 0.15x3600 =
540
59 / 63
The required pie chart
4.png 4.png
61 / 63
Example, the graph below represents a time series plots of
deaths from a strange disease for the period, 1970-1990.
5.PNG 5.PNG
62 / 63
63 / 63