0% found this document useful (0 votes)
67 views63 pages

Stat 153 Unit 2b

The document covers fundamental concepts in statistics, including descriptive and inferential statistics, and key terminologies such as population, sample, and various types of variables. It details methods for organizing data through tabular and graphical representations, as well as measures of central tendency, dispersion, and position. Additionally, it introduces exploratory data analysis techniques and provides examples for constructing frequency distribution tables and visualizing data.

Uploaded by

amanfoblessed286
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views63 pages

Stat 153 Unit 2b

The document covers fundamental concepts in statistics, including descriptive and inferential statistics, and key terminologies such as population, sample, and various types of variables. It details methods for organizing data through tabular and graphical representations, as well as measures of central tendency, dispersion, and position. Additionally, it introduces exploratory data analysis techniques and provides examples for constructing frequency distribution tables and visualizing data.

Uploaded by

amanfoblessed286
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 63

Department of Statistics and Actuarial Science

STATS 153

1 / 63
What we Have learnt
I Explain what is meant by descriptive Statistics.

I Explain what is meant by inferential statistics.

I Define the following terms;

I Population

I Qualitative variable and quantitative Variables e

I Discrete variable

I Sample

I Continuous variable

I Parameter and Statistics


2 / 63
UNIT 2-DESCRIPTIVE STATISTICS

1. Tabular Representation of Data

2. Graphical Representation of Data

3. Measures of Central Tendency

4. Measures of Dispersion

5. Measures of Position

6. Measures of Shape

3 / 63
Consider the following data

4,3,5,3,2,75,4

I Outliers

4 / 63
Tabular Representation of Data
The data gathered from a survey/experiment are usually
summarised or organised numerically in tabular form using a
frequency distribution table
I The distribution is said to be ungrouped if it shows the
distinct observations and their corresponding occurrences,
called frequencies
I If the number of observations is too large then they are
put into groups, called classes or categories
I The number of classes is usually chosen between 5 and
20, inclusive.
I 2k+ ≥ n
5 / 63
Terminologies

I Class Interval

I Class Limits

I Class Boundaries

I Class Width

I Class Mark (Class Midpoint)

6 / 63
Class Limit, boundary, interval, width and midpoint
Class Limits
Class limits are the smallest and largest observations (data,
events etc) in each class. Hence, each class has two limits: a
lower and upper limit.
Class Frequency
300 - 399 13
400 - 499 20
500 - 599 7
600 - 699 3
700 - 799 12
800 - 899 8
900 - 999 7
what are the lower and upper class limits for the first three
classes?
7 / 63
Class Boundary
Class Boundaries are the midpoints between the upper class
limit of a class and the lower class limit of the next class in the
sequence.
each class has an upper and lower class boundary.
Class Frequncy
300 - 399 13
400 - 499 20
500 - 599 7

For the first class, 300 – 399


I The lower class boundary is the midpoint between 299
and 300, that is 299.5
I The upper class boundary is the midpoint between 399
and 400, that is 399.5
8 / 63
Class Intervals, width
Class interval for example is 300-399 and the class width or
size is the difference between the upper and lower class
boundaries of any class.
Class Frequncy
300 - 399 13
400 - 499 20

Using the table above, find the class width for the first
class.300 – 399
The class width = Upper class boundary – lower class
boundary
I Upper class boundary = 399.5

I Lower class boundary = 299.5


Therefore, the class width = 399.5 – 299.5= 100
9 / 63
Class Midpoint
Class midpoint is found by adding the upper and lower class
boundaries of any class and dividing the results by 2
Class Frequncy
300 - 399 13
400 - 499 20

Using the table above, find the class midpoint for the first
class.300 – 399
I Upper class boundary = 399.5

I Lower class boundary = 299.5


Therefore, the class width = (399.5 +299.5)/2 =50

10 / 63
Constructing a Frequency Distribution Table
A frequency distribution lists each category of data and the
number of occurrences for each category of data in a table
form. For an ungrouped data
I Set the values of data (Scores) in the column starting
from the lowest value to the highest or vice versa;
I Create the second column with the frequency of each
data occurrence. This column is known as the tally of the
scores;
I Create the third column where the frequency will be
inserted.
11 / 63
12 / 63
25 students were given a blood test to determine their blood
type.the data set is as follows:
A B B AB O
O O B AB B
B B O A O
A O O O AB
AB A O B A

Construct a frequency distribution for the data.

13 / 63
Body masses(kg) of 22 patients.

60, 45, 72, 55, 42, 65, 54, 68, 74, 50, 78, 70, 58, 48, 67, 64,
68, 52, 60, 58, 75, 83

Create a group frequency for the data above.

14 / 63
Example l

The data given below are the number of children per family
sampled from a community.
0 1 4 4 3 1 2 3 1 2
2 4 3 0 2 5 0 2 2 1
3 2 1 1 3 2 3 4 5 2
1 0 5 4 2 0 3 5 1 2
4 3 0 2 5 1 1 2 2 4

15 / 63
Sturges Approximation Rule

16 / 63
Sturges Approximation Rule
The no. of classes K and the class width C for frequency
distribution table is given by;

K = 1 + 3.322log ∗ n
n is the Total number of observation

Range
c=
K

1
LB1 = (min. Obs. or lesser)− ( the smallest unit of measurement)
2
UB1 = LB1 + C

17 / 63
The following is the age distribution of patients in ward B at
Komfo Anokye Teaching Hospital

17, 11, 21, 22, 30, 33, 37, 14, 19, 17, 23,27 ,28, 24, 45, 38,
40, 33, 34, 30, 33, 29, 29, 30, 32, 33, 32, 26, 24, 25. Use

Sturges Approximation Rule to construct a frequency


distribution table for the above data.

18 / 63
frequency
Relative Frequency =
total frequency

Cumulative frequency
Relative Cumulative Frequency =
total frequency

19 / 63
20 / 63
Example 2

The data below are the average sulphur dioxide (SO2)


emission rates (in lb/million btn) from utility and industrial
boilers from 50 states.
2.3 2.7 1.5 1.7 0.3 0.6 4.2 0.9 1.2 0.4
0.5 2.2 4.5 3.8 1.2 0.2 1.0 0.7 0.3 1.4
0.7 3.6 1.0 0.7 1.7 0.5 0.2 0.6 2.5 2.7
1.5 1.4 2.9 1.0 3.4 2.1 0.9 1.9 1.0 1.7
1.8 0.6 1.7 2.9 1.8 1.4 3.7 5.0 3.8 2.1

21 / 63
Example 2

I Summarise the data by constructing a grouped relative


frequency distribution.
I Find the approximate proportion of states with the
following sulphur dioxide emission rates.
I I between 0.90 and 2.20 1b/million
I at least 3.65 1b/million

22 / 63
Example 2- Solution
I By Sturges’ Rule the required number of classes and class
width are
k = 1 + 3.322log10 · 50

= 6.64 ≈ 7

I The limits of first class boundary;

1
LB1 = 0.2 − (0.1) = 0.15
2

UB1 = LB1 + C = 0.15 + 0.7

23 / 63
Example 2-Solution
The frequency distribution is seen below
Emission rate Tally Frequency Relative
(1b/million) Frequency
0.15 − 0.85 13 0.26
0.85 − 1.55 13 0.26
1.55 − 2.25 10 0.20
2.25 − 2.95 6 0.12
2.95 − 3.65 2 0.04
3.65 − 4.35 4 0.08
4.35 − 5.05 2 0.04
Total 50

24 / 63
I The approximate proportion of states whose emission is
I between 0.9 and 2.2 1b/millon

13 + 10 23
= = = 0.46
50 50

I at least 3.65 lb/million

4+2 6
= = = 0.12
50 50

25 / 63
Graphical Representation of Data

I Data can further be summarised using graphs or charts


for stronger visual impact
I The table are categorised for quantitative and qualitative
data.
Quantitative Data Qualitative Data
Histogram, Frequency Polygon, Bar Chart, Pie
Cumulative Frequency Curve Chart, Line Chart
(Ogive)

26 / 63
Graphical Representation of Data

I Histogram:Plotting the class frequency as a rectangle


over the corresponding class boundary gives a histogram.
I Frequency Polygon: Plot a graph of class frequencies
against the corresponding class midpoints and join the
points with straight lines
I Ogive (Cumulative Frequency Curve:A smooth
curve joining the co-ordinates of upper class boundaries
and cumulative frequencies produces the cumulative
frequency curve.

27 / 63
Example 1

As an illustrative example, we consider the distribution given


below:

28 / 63
29 / 63
30 / 63
Cumulative Frequency Curve(or Ogive)
The cumulative frequency distribution shows the number of
observations that fall above or below a specified value of
observation

Class Boundary Frequency Cumulative Relative Cum.


Frequency Frequency
5
79.5 − 84.5 5 5+0=5 80
= 0.0625
15
84.5 − 89.5 10 5 + 10 = 15 80
= 0.1875
30
89.5 − 94.5 15 15 + 15 = 30 80
= 0.3750
56
94.5 − 99.5 26 30 + 26 = 56 80
= 0.7000
69
99.5 − 104.5 13 56 + 13 = 69 80
= 0.8625
76
104.5 − 109.5 7 69 + 7 = 76 80
= 0.9500
80
109.5 − 114.5 4 76 + 4 = 80 80
= 1.0000
Total 80
31 / 63
32 / 63
The cumulative frequency distribution shows the number of
observations that fall above or below a specified value of
observation.

33 / 63
Assignment

Many people experience allergic reactions to insect stings.


These reactions differ from patient to patient not only in
severity but also in time of reaction. The following data
(measured in minutes) are on 40 patients who experience a
systematic reaction to beestings.

5.9 10.5 9.9 14.4 16.5 12.7 11.6 7.9 10.9 13.4
8.6 3.8 11.7 12.5 9.1 9.1 12.3 11.5 7.4 8.8
11.5 13.6 11.5 10.9 12.9 11.2 15.0 12.7 10.1 14.7
9.9 11.4 6.2 8.3 8.1 10.5 8.4 11.2 10.4 9.8

34 / 63
I Group the data into six classes and obtain a relative
frequency distribution
I Draw a histogram for the distribution

I Plot the cumulative frequency curve

I Use it to estimate percentage of patients who have


experience a reaction within 10 minutes.

35 / 63
Exploratory Data Analysis (EDA):

Exploratory data analysis is a process of using statistical tools


(such as graphs and numerical measures) to investigate data
sets in order to understand their important characteristics.
The five important characteristics for describing, exploring,
and comparing data sets which need to be noted are as listed
below:
I Centre: a representative or average value that indicates
where the middle of the data set is located.

36 / 63
Exploratory Data Analysis (EDA):

I Variation: a measure of the amount that the data values


vary among themselves.
I Distribution: the nature or shape of the distribution of
the data (such as bell-shaped, uniform, or skewed)
I Outliers: data values that lie very far away from the vast
majority of the values.
I Time: changing characteristics of the data over time.

37 / 63
Exploratory Data Analysis (EDA):
The techniques or diagrams of EDA discusses in this section
are the Dotplots, Boxplots and Stem and Leaf.
I Dotplots: A dot plot is a plot that displays a dot for
each value in a data set along a number line. If there are
multiple occurrences of a specific value, then the dots will
be stacked vertically.
I Boxplots: Boxplots are useful for revealing the centre
and spread of the data as well as the outliers of the data
I The Stem-and-Leaf Plots: A stem-and-leaf plot is
another way to represent quantitative data graphically.It
is extremely useful in summarizing reasonably sized data 38 / 63
Example

The weight of 33 students in a gym class are given below.


Construct a stem -and -leaf, box plot and dot plots diagram to
summarise the data.

143 158 136 127 132 132 126 138 119 104 113
90 126 123 121 133 104 99 112 120 107 139
122 137 112 121 140 134 133 123 150 115 141

39 / 63
Solution
Stem-Leaf-Diagram
I Step 1: Determine the smallest and largest number in
the data. For the example above we have 90 and 158.
I Step2: Identify the stems. For any number the digit to
the left is the stem. For example number 90 has a stem
of 9.
I step 3 draw a vertical line. And list the stem numbers to
the left of the line
I step 4Fill in the leaves. For example the leaf of 90 is 0.

40 / 63
Solution
Stem-Leaf-Diagram For our example .We arrange them
vertically and each leaf is recorded against its corresponding
stem.
The R program output for the diagram is as shown below >
p=c (data)
> stem (p, scale=1)
Stem leaves
9|09
10 | 4 4 7
11 | 2 2 3 5 9
12 | 0 1 1 2 3 3 6 6 7
13 | 2 2 3 3 4 6 7 8 9
14 | 0 1 3
15 | 0 8
Key 9|0 represents 90.
41 / 63
BoxPlot
I Box plots (also called box-and-whisker plots or
box-whisker plots) give a good graphical image of the
concentration of the data
I They also show how far the extreme values are from most
of the data.
I A box plot is constructed from five values: the minimum
value, the first quartile, the median, the third quartile,
and the maximum value.
I We use these values to compare how close other data
values are to them.
42 / 63
BoxPlot

I To construct a box plot, use a horizontal or vertical


number line and a rectangular box
I The smallest and largest data values label the endpoints
of the axis
I The first quartile marks one end of the box and the third
quartile marks the other end of the box.
I Approximately the middle 50percent of the data fall inside
the box

43 / 63
box plot diagram: is given by > boxplot (p)

The five-number summary is given by : These results also be


obtained by the R command, > summary (p).

Minimum 1st Quar- Median 3rd Quar- Maximum


tile tile
90.0 115.0 126.0 136.0 158.0

44 / 63
Dotplot
install.packages(’plyr’)
library("plyr")
count(p)
y=count(p)
plot(y,pch=19,xlim=c(90,160),ylim=c(0,20), main=’Dotplot
of Data’)

45 / 63
Assignment 2

The data below shows the results from a study of the total
number of covid-19 infections from 50 districts in Ghana in
2021.

102 100 96 99 101 102 100 105 97 100


92 103 101 100 99 102 96 100 101 98
107 95 98 100 100 99 97 104 101 103
98 101 100 105 99 101 102 100 87 98
101 103 93 99 101 97 100 102 99 104

Report these results as a stem plot and histogram.

46 / 63
Graphical Representation of Qualitative Data:

The most commonly used graphical representation of


qualitative data is bar charts, pie Charts and line or time series
graphs.
I Bar Charts:: A bar chart consists of rectangular bars
with equal widths and separated by gaps
I A bar chart is classified as either being simple,
multiple/compound or component depending on sets of
data being compared

47 / 63
Graphical Representation of Qualitative Data:

The most commonly used graphical representation of


qualitative data is bar charts, pie Charts and line or time series
graphs.
I Bar Charts:: A bar chart consists of rectangular bars
with equal widths and separated by gaps
I A bar chart is classified as either being simple,
multiple/compound or component depending on sets of
data being compared

48 / 63
Simple Bar Charts-Example

Consider the data given below of the number of children of 30


women at maternity / Birth control workshop. Below shows
the frequency distribution.
No. of Children Frequency Relative Frequency
0 3 0.1
1 6 0.2
2 9 0.3
3 6 0.2
4 4 0.15
5 2 0.05
Total 30 1.00

49 / 63
library(readxl)
example3 <
−reade xcel(”Desktop/Lect.NotesSem1, 2022/SMO153/example3.xls
”Sheet2”)
View (example3 )
U = table(example3 ) U
barplot(U, xlab =0 No.ofchildren0 , ylab =0 Frequency 0 , col =
0, ) barplot(U, xlab =0 No.ofchildren0 , ylab =0
Frequency 0 , col = 0, horiz = TRUE )
50 / 63
Multiple Bar Charts-Example
The number of females opted to offer programmes in Social
Sciences, Engineering and Science for the period 2016–2020in
KNUST is as in the table below. The given data are displayed
in the multiple bar charts, also shown below.

Year Social Engineering Science


Science
2016 350 400 300
2017 400 550 250
2018 450 650 250
2019 500 700 200
2020 650 800 150

51 / 63
52 / 63
The death rate (per 1000) in a year of males and females of a
disease in community over a period of 6 years is given as
follows:

Year 1990 1991 1992 1993 1994 1995


Male 20 15 25 20 10 50
Female 30 10 18 24 15 25

53 / 63
54 / 63
Pie Charts:
I its circular diagram giving various fractions of section of a
given data.
I The total number of observations of the data is
represented by a pie which is denoted by a circle.
I The pie is then cut into slices (sectors) where each slice
represents a category of the data
I The size of a slice is proportional to the relative frequency
of a category
I The angle of a slice (sector) at centre of a pie (circle) is
given by the product: RelativeFrequencyx3600 .
55 / 63
Example
Consider the responses regarding the relief provided by a
pain-killing drug.
Response Frequency Relative Angle of Sector
frequency
Excellent 30 0.20 0.20x3600 = 720
Satisfactory 66 0.44 0.44x3600 =
1580
Fair 36 0.24 0.24x3600 =
0
80.4
Poor 18 0.12 0.12x3600 =
0
43.2
Total 150 1.00 3600

56 / 63
3.png 3.png

57 / 63
Consumers spend their incomes on a vast array of goods and
services. The data below provide a guide summary of how the
average consumer dollar is spent.
Category Percentage of income
Medical Care 5
Clothing 15
Entertainment 5
Housing 40
Food 20
Transportation 15

I Summarise the information in the form of pie chart

I What area represents the largest piece of the pie?

58 / 63
Solution
Category Percentage angle of
of income sector
Medical Care 5 0.05x3600 =
180
Clothing 15 0.15x3600 =
540
Entertainment 5 0.05x3600 =
180
Housing 40 0.4x3600 =
1440
Food 20 0.2x3600 =
720
Transportation 15 0.15x3600 =
540

59 / 63
The required pie chart
4.png 4.png

The largest piece of the pie is "housing"


60 / 63
Time Series Graph

I Time is an important factor that contributes to variability


in data.
I Data collected over time can be displayed using a line
chart (better known as time series graph).
I A time series graph is useful for describing data over a
period of time.

61 / 63
Example, the graph below represents a time series plots of
deaths from a strange disease for the period, 1970-1990.
5.PNG 5.PNG

62 / 63
63 / 63

You might also like