0% found this document useful (0 votes)
10 views31 pages

Chapter 2

stats

Uploaded by

gwafilakoseludo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views31 pages

Chapter 2

stats

Uploaded by

gwafilakoseludo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 31

CHAPTER 2

INTRODUCTION TO STATISTICS
• SUBTITLE: SUMMARISING DATA

• LECTURER: DR ADEBAYO

• LEC 03
Unit 3: summarizing data

• OUTLINE

Learning objectives
INTRODUCTION
Frequency distributions
GRAPHICAL PRESENTATION OF DATA
Summarizing data from CONTINUOUS
variables
summary
Learning objectives
 After studying this lesson you will be able to:

o Identify and define effective tools for summarizing


numerical variables.

o Construct and apply appropriate tabular and


graphical summaries of data
INTRODUCTION
 Statistical data are often collected in raw format
for different purposes. e.g. for:

o Routine surveillance
o Experimental study
o conducting a study and so on

Collected data should be transformed into easy to


understand and usable form i.e. Summarized
INTRODUCTION cont…
Most commonly used methods of summarizing data
are:
o Tabulation
o Graphical presentations,
o Descriptive Summary Measures

During data collection two types of variables with


different scales of measurement are used
Introduction cont...
 Two types of variables with their scales of
measurement:
QUANTITATIVE/
Numeric

QUALITATIVE/
Categorical

Discrete Continuous

Nominal Ordinal
Interval Ratio
Frequency distributions

 Frequency table:

o One of the most important methods of


summarizing both categorical and numeric data is
to tabulate the frequency distribution.

o Frequency (or count) refers to the number of


observations that fall in a particular category of
variable.
Frequency distributions cont…
 Frequency table:

o A table that displays all the categories of a variable


with their respective counts is called a frequency
distribution.

o Thus,
- A frequency distribution is the organization of a
data set into nearby, mutually exclusive
intervals so that the number or proportion of
observations falling in each interval is
apparent/clear.
Frequency distributions cont...
 Frequency table:

o A frequency distribution simply tells how often a


variable takes on each of its possible values.

o For quantitative variables with many possible


values, the possible values are typically grouped
into classes, intervals or categories.

o Frequencies of each class/interval/category of a


variable are often transformed into proportions called
relative frequencies.
Frequency distributions cont...
 Frequency table:

o The relative frequencies are therefore calculated by


dividing the frequency of each category by the total
number of observations of a variable.

o Thus the formula:


Frequency distributions cont...
 Frequency table:

o Relative frequencies can also be converted to


percentages by multiplying each proportion by 100.

o Adding all the frequencies starting with the frequency


of the 1st category of the variable and adding it to the
frequency of the next category, then adding the total of
the two to the frequency of the next category, that
total to the frequency of the next category and so on
until the last category creates what is called cumulative
frequencies.
Frequency distributions cont...

 Frequency table:

o Example of Frequency Distribution for


Categorical Variable

o Example 1: Suppose there are 60 students


in this class. Thirty-five (35) are females
and 25 are males. The frequency
distribution of gender in this class is shown
by the table on the next slide:
Frequency distributions cont...

 Table 1: Gender of Biostats. Class (nominal


variable)
Gender Frequency (f) Relative
frequency(Percent)

Female 35 58
Male 25 42
Total (n) 60 100

NB: Relative frequency(as a %) =(f/n)*100


Frequency distributions cont...
 Frequency table:

o Example 2: Suppose 40 students of STA 116 were interviewed


about their family marital status and the following results were
obtained.

1, 1, 2, 3, 2, 1, 2, 2, 2, 1, 3, 1, 3, 1, 1, 2, 1, 1, 2, 2, 2, 2,
4, 2, 3, 2, 2, 1, 1, 2, 1, 1, 2, 2, 2, 1, 2, 1, 1 and 4

Where
1= Married, 2= Single, 3 = Divorced and 4=Widowed

Construct the frequency distribution for this data and calculate the
percentage of patients who were Single.
Frequency distributions cont...
 Frequency table:
ANS: 45% of the patients
o Solution were single.

Table 2: Marital Status of the family of students in STA 116

Marital Status Frequency Relative frequency Cumulative frequency


(%)
Married 16 40 16
Single 18 45 34
Divorced 4 10 38
Widowed 2 5 40
Total 40 100
Frequency distributions cont...
 Frequency Table:

o The numeric/quantitative data can also be summarized by a


frequency distribution.

o For Discrete variables

- Summarize data in a frequency table the same way as


categorical variables.

- That is, in a place of the qualitative categories, now list in a


frequency table the distinct numerical measurements that
appear in the discrete data set and then count their
frequencies.
Frequency distributions cont...
 Frequency Table:

o For Discrete variables

- Example 3: Consider table 3 below which is a typical line


listing from a hypothetical investigation of an apparent cluster
of HIV/AIDS patients in hospital X. Construct a frequency
distribution that displays age data and determine the
proportion of patients aged 29 years.

-
Frequency distributions cont...
- Table 3: Line Listing of HIV/AIDS Cases in Hospital X
ID Date of Age Sex HIV/ AIDS Hospitalized ARV
Diagnosis (Years) Drugs
01 05/ 01 74 M N Y N
02 06/ 01 29 M Y N Y
03 08/ 01 39 M Y Y N
04 19/ 01 23 F N N N
05 30/ 01 39 M Y N Y
06 02/ 02 23 M Y Y Y
07 03/ 02 19 M Y Y Y
08 05/ 02 40 M Y N Y
09 19/ 02 28 M N Y N
10 22/ 02 29 F Y N N
11 23/ 02 23 F Y Y N
12 24/ 02 40 M Y N Y
13 26/ 02 49 F N N N
14 26/ 02 40 F N N N
15 27/ 02 29 F Y Y N
16 27/ 02 18 M N Y N
17 27/ 02 19 M Y N Y
18 28/ 02 29 F Y Y Y
19 28/ 02 40 F Y Y Y
20 29/ 02 40 M Y N N

NB: M=Male, F=Female, N=No, Y=Yes


Frequency distributions cont...
 For Discrete variables

o Solution: To construct a frequency distribution that displays these


data:

- First, list all the values that the variable age can take, from the
lowest possible value to the highest.

- Then, for each value, record the number of patients in


accordance with the listed ages.

o Table 4 below displays what the resulting frequency distribution


would look like. Notice the table has a title, each column is clearly
labeled, and that the total is given in the bottom row.
Frequency distributions cont...
Frequency Table:
ANS: 20% of the patients were
o Solution: aged 29 years

Table 4: Distribution of patients by Age


Age Number of Relative Cumulative
Patients frequency (%) frequency
18 1 5 1
19 2 10 3
23 3 15 6
28 1 5 7
29 4 20 11
39 2 10 13
40 5 25 18
49 1 5 19
72 1 5 20
Total 20 100
Frequency distributions cont...
 Frequency Table:

o The numeric/quantitative data can also be


summarized by a frequency distribution.

- For Continuous Variables the data must be


grouped into categories (classes) before the
table of frequencies can be constructed.
Frequency distributions cont...
 Continuous Variables

o The main steps in a process of grouping data from continuous variable


into classes are:

Step 1: Figure out how many classes (called class Intervals) you need.
You can use Sturge’s Rule. Let K = number of classes then,

k = 1 + 3.322log(n), where n is the number


of observations

N.B: The number found here should be rounded to the next integer
(whole number).
Frequency distributions cont...
 Data from Continuous Variables

o Main steps:

Step 2: Find the minimum and the maximum values. Then find the
range by subtracting the minimum value from the maximum value.
Range = Maximum value – Minimum value

Step 3: Divide your answer in Step 2 by the number of classes you


have chosen in Step 1. Round off the number to next whole number.
This number is what is called Class Width.

Class Width is defined as the difference between two successive


lower class limits or upper class limits of a given class
Frequency distributions cont...
 Data from Continuous Variables

o Main steps:

Step 4: Begin with the minimum value in the data then add the class
width from step 3 to get the next lower class limit.

Step 5: Repeat Step 4 (i.e. keep on adding class width to your


minimum data values) until you have created the number of classes
you chose in step 1.

Step 6: Write down the upper class limits by subtracting 1 from the
class width then add that value to the lower class limits
Frequency distributions cont...
 Continuous Variables

o Main Steps in grouping Data from continuous variables:

Step 7: Count the number of observations in the data that belongs to


each class interval. The count in each class is the class frequency.

Step 8: Calculate the relative frequencies of each class by dividing the


class frequency by the total number of observations in the data.

Step 9: Determine the class marks. The number in the middle of the
class is called class mark of the class. The number in the middle of
the upper class limit of one class and the lower class limit of the other
class is called the class boundary.
Frequency distributions cont...
 Frequency Table
o Example of a Frequency Distribution of a Continuous Variable

Example : Consider the following IQ Score for statistics Students.


Construct the grouped frequency distribution for this data.

118, 123, 124, 125, 127, 128, 129, 130, 130, 133,
136, 138, 141, 142, 149, 150, 154, 119, 122, 132,
145, 120, 151, 135, 152, 131, 121, 154, 151, 136,
144, 139, 123, 137, 147, 149, 119, 131, 142
Frequency distributions cont...
 Frequency Table
o Example of a Frequency Distribution of a Continuous Variable

Solution: Work it out


Frequency distributions cont...
 Frequency Table
o Example of a Frequency Distribution of a Continuous Variable

Example : Consider the following daily maximum temperatures in a city


for 50 days. Construct the frequency distribution for this data.

28, 28, 31, 29, 35, 33, 28, 31, 34, 29, 25, 27, 29, 33, 30, 31, 32, 26, 26, 21,
21, 20, 22, 24, 28, 30, 34, 33, 35, 29, 23, 21, 20, 19, 19, 18, 19, 17, 20,
19,18, 18, 19, 27, 17, 18, 20, 21, 18, and 19
Frequency distributions cont...
 Solution

Step 1: k = 1+3.322log(n), n = 50
= 1+ 3.322log(50)
= 1+5.64398
= 6.64398
=7

Step 2: Minimum = 17 and Maximum = 35


Range = Maximum value – Minimum value
= 35 – 17
= 18
Frequency distributions cont...
 Solution

Step 3: Class Width =

= 2.57143

=3
Frequency distributions cont...
 Solution

Steps 4 – 9: Grouped Frequency Distribution


Class Tally Class Relative Class Class boundary
frequency (%)
intervals Marks frequency Mark
17 – 19 |||| |||| 13 26 18 16.5 – 19.5
|||
20 – 22 |||| 9 18 21 19.5 – 22.5
||||
23 – 25 ||| 3 6 24 22.5 – 25.5
26 – 28 |||| 9 18 27 25.5 – 28.5
||||
29 – 31 |||| ||| 8 16 30 28.5 – 31.5
32 – 34 |||| | 6 12 33 31.5 – 34.5
35 – 37 || 2 4 36 34.5 – 37.5
Totals 50 100

You might also like