The document outlines the course content for STA 101: Descriptive Statistics at Adamawa State University, covering definitions, scope, and roles of statistics, methods of data collection, and measures of central tendency. It emphasizes the importance of statistics across various disciplines and discusses the nature of statistical data, including qualitative and quantitative types. Additionally, it details methods for collecting data, such as interviews and questionnaires, and introduces key statistical terms and concepts.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0 ratings0% found this document useful (0 votes)
18 views36 pages
Sta 101 Research Methods
The document outlines the course content for STA 101: Descriptive Statistics at Adamawa State University, covering definitions, scope, and roles of statistics, methods of data collection, and measures of central tendency. It emphasizes the importance of statistics across various disciplines and discusses the nature of statistical data, including qualitative and quantitative types. Additionally, it details methods for collecting data, such as interviews and questionnaires, and introduces key statistical terms and concepts.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 36
ADAMAWA STATE UNIVERSITY, MUBL
FACULTY OF SCIENCE
DEPARTMENT OF MATHEMATICS
Course Code:STA 101
Course Title:
escriptive Statisties
Course Uni
wo (2)
Course Content:
“Definition of statistics. *The scape and role of statistics. *Nature of statistical data
“Methods of data collection. *Measures of central tendency (mean, median and
mode). * Tabular and graphical summary of data. *Diagrams and charts. “Graph of
frequency distribution. *Ogive and applications. *Fundamentals of probably. *Pascal
‘riangle and binomial expansion, *The Gaussian curveand some ofits properties. “Use
OF Normal Tables. “Elements of Regression and Correlation Analysis, *Rank
Correlation,
INTRODUCTION
‘The subject statistics asit seems, is nota new discipline but itis as old as human society
"self It has been used right from the existence of Ife on this earth, although the
sphere of ts utility was very much restricted
Statistics is a word that is used in everyday life. It deals basically withthe estimation
of model parameter from data with testing of hypothesis about their values.
However, one may ask whats statistics, how does it work/function, how does it help
to solve certain practical problems, and so on.
‘The word statistics can be used with two distinct meaning
i, Itcan be referred toas facts and figures which can be put into numerical form,
ii
On the other hand it can be it can be referred to as statistical methods,
's Defined:
‘Doparencont of Mathenatles STA 10% Lecture Nees for 2a Oat AO Cen roStatistics has been defined differently by different authors from time to time and
Feasons for variation in the definitions are as follows:
Firstly, in modern times the field of statistics has widened considerably while in ancient
times, it was confined only to the affairs of state but now it embraces almost every
sphere of human activities, Hence a number of definitions which were limited toa very
narrow field of enquiry have to be replaced by some more comprehensive and
exhaustive definitions,
Secondly, statistics has been defined in two ways; as statistical data (i.e, numerical
statement of fact), while others define it as statistical methods (i.e, complete body of
the principles and techniques used in collecting such data).
> Statistics can be defined as an area of science which concerns with the design
of experiments, analysis of data and making inference about a population from
the information contained in a sample.
> Statistics is the science of making effective use of numeric data relating to
‘groups of individuals or experiments. It deals with all aspects, including not only
the collection, analysis and interpretation of such data, but also planning of the
collection of data in terms of design of survey and experiment’
-
Statisties is a branch of science with deals with scienti
methods of collecting,
organising summarising presenting and analysis of data in order to draw a valid
and logical conclusion.
‘SCOPE AND ROLE OF STATISTICS
Statistics has a major role to play in all stages of the scientific method. This is because
it is involved with the definition and evaluation of hypothesis through the collection
and analysis of data.
Statistics is almost unique around the major disciplines, in that the professional skills
of a statistician can be applied in fields as diverse as medicine, natural sciences,
(ales STA TOL Lecture Notes for 2
4 Academie Session Page?agriculture and forestry, education, technology, industry, communications, insurance,
marketing and management as well as various aspects of government
Therefore states isa subject that appeals not only tothe mathematics students with
an interest on the probability based on real data on modelling of real life situations
but alo to students studying any field of application and who have reasonable
background in mathematics. In the industry, statisticians are becoming increasingly
trol, quality assurance,
of clinical trials is a major statistical
involved in such area as process cor industrial experimentation
and product reliability. The design and analysis
screening of safety of new drugs. Govern’
atistcs is heavily used in everyday life and by all
area, particulary, in ment generally is the
major employer of statisticians as st
government agencies, opportunities in statisti isin fact limitless and by no means
or life would have been something different without sta istics.
less important. In she
‘The role of statistics therefore is to act as a tool of analysis of data arising from
experiment or investigation from al fields of human endeavour
NATURE OF STATISTICS
Statistics involves collection, presentation, analysis and interpretation of numerical
data. The fact which dealt with must be capable of numeric expressions. The
statistician isconcemed with developing and using procedures for design, analysis and
inferences making that provide the best decision at a minimum cost.
All problems involving the use of statistical methods can be classified as belonging to
either descriptive or inferential statistics.
Assignment No.
Discuss briefly the importance of statistics in the following disciplines:
(a) Mathematics (9) Sociology
(2) Economies (10) Medical sciences
(3) Business and management (11) war
(4) Planning (22) Social sciences,
Tpnrncnt of aon aten STAIOL Lest Nit for Saas ao 24 Acadeile SEitlon Pa(5) Accountancy and auditing (23) Physical sciences
(6) Astronomy (24) Psychology and education
(7) industry (25) Insurance
(8) Biology (16) Education
NATURE OF STATISTICAL DATA
Statistical data are facts or figures collected from units of experiments. The data (facts
or figures) can either be qualitative or quantitative.
> Qualitative data are facts or information (data) that cannot be presented in a
numerical form, e.g. eye colour, gender, marital status, gender, educational
status, etc.
> Quantitative data are the type of facts or information that can easily be
presented in numerical form e.g. age, height, weight etc.
Sources of statistical data
Statistical data can be obtained either through primary or secondary sources,
depending on the method and purpose of collecting the data. It may be observed that
the distinction between primary and secondary data is a matter of degree or relativity
only. The same set of data may be secondary in the hands of one and primary in the
hands of others. In general, data are primary to the source that collects and process
them for the first time and are secondary for all sources that later use such data.
Primary Data
Primary data are information (data) which is expressly collected for a specific purpose
e.g. the data relating to mortality (death rates) and fertility (birth rates) in Nigeria by
the national population commission, data and figures relating to traffic flow by the
FRSC, collection of information by interview, observation etc.
One great advantage of primary data is that the exact information required is
obtained.
—_——
“Dopartncantof athenaties STA £01 Lecture Notes for 2028/2024 Academie Session Page #Primary data can be collected through any of the following sources;
Questionnaire
Interview
Observation
Experiment
Secondary Data
Secondary data are collected for some other purposes, frequently for administrative
reasons e.g. when the primary data are reproduced by either UN, Statistics office,
details of import and exports compiled by the customs and exercise department, etc,
Sources of Secondary Data
1, Books and journals 5, Textbooks
2, Report registers 6. News papers
3, Survey
4, Maps, photographs and satellite
METHODS OF DATA COLLECTION
Statistical data can be collected through the following ways:
‘A. Interview: This is an instrument used to extract/elicit Information from the
respondent through some verbal interaction between the interviewer and the
Interviewee (respondent). These involve one to one chart between the
researcher and respondent.
Advantages of interview:
i, _ It gives an opportunity for the interviewer and the respondent to have a
face to face interaction.
ji, The respondent can respond the way he likes (freely).ili, Information which the respondents would not want to commit in writing
is obtained.
iv. The recorded information is relatively reliable because itis recorded by
the interviewer himself.
It is very useful for subjects that cannot fill the questionnaire ¢.g-
illiterate people/uneducated.
Disadvantages of interview:
Interview consumes time and is expensive to conduct.
ji, Subjective information derived from unstructured interview Is
sometimes difficult to analyse.
B. Questionnaire: These are sequence of questions derived in written form in
ler to collect data on a specific subject. The questionnaire can be
onde
administered directly to the respondents or can be mailed to the respondents
to be filled and then matled back to the researcher. There are three types of
questionnaire:
1. The structure or closed form questionnaire
2. The unstructured (open-form) questionnaire
3. The pictorial form questionnaire
Advantages of question:
i. Itis economical in terms of time
ii, It can be used to elicit information on non-cognitive constraints like
creativity, anxiety, kindness etc.
ili, _ Greater percentage of people can be reached at a time.
iv, It can be administered to a variety of people.
Disadvantages of questionnaire
i. Negative or incorrect answers can be given if questions are too lengthy
or fit includes the respondent's personal life.ii, There may be low percentage return of the questionnaire especially
when the mode of administration is not on the spot.
li, Unclear questionnaire may lead to misunderstanding or wrong
responses
Guidelines for constructing a good questionnaire
‘Think of the attribute you want to measure from the respondents.
2. Construct enough items to actually measure the attribute you want to measure
from the respondents
Give enough instructions on how to complete the questionnaire.
Make sure the language of the questionnaire is clear, precise and unambiguous.
Avoid repetition of items in the questionnaire
Avoid words that embarrass the respondent
‘The questionnaire should not be too short or too long
eNO we w
Consider the method of analysis before constructing the questionnaire.
C. Direct field measurement: This is a technique common to the researcher direct
measurement of phenomena under studies
D. Observations (participant or non-participant): observation is a technique that
involves watching people, events, situations or phenomena and obtaining first-
hand information relating to particular aspects of such people, events,
situations or phenomena.
jary and secondary and discuss the various
Exercise: Distinguish between p
methods of collecting primary data.
SOME STATISTICAL TERMS
Raw data: data collected in its original form.
Frequency: the number of times a value or number of values occurs/appear.
Frequency distribution: the organisation of raw data in table form with classes and
frequencies.
_
Page?
‘Department of Mathenatize STA 101 Lecture Notes for 2028/2024 Academie SessionCategorical frequency distribution: a frequency distribution which the data is only
nominal or ordinal,
Ungrouped frequency distribution: a frequency distribution of numerical data. The
raw dat
not grouped.
Grouped frequency distributions: a frequency distribution where several numbers are
grouped into one class.
Class limit: separate one class in a grouped frequency distribution from another. The
limits could actually appear in the data and have gaps between the upper limits of one
class and the lower limits of the next
Class boundaries: separate one class in a grouped frequency distribution from
another. The boundaries have one more decimal place than the raw data and
therefore do not appear in the data. There is no gap between the upper boundary of
one class and the lower boundary of the next class. The lower class boundary is found
by subtracting 0.5 units from the lower class limit and the upper class boundary is
found by adding 0.5 units to the upper class limit.
Class width: the difference between the upper and lower class boundaries of any class.
‘The class width Is also the difference between the lower of any two consecutive class
and the upper limits of two consecutive classes. Itis not the difference between upper
and lower limits of the same class.
Class mark (mid-point): the number in the middle of the class. It is obtained by adding
the upper and lower class limits and dividing by two. It can also be found by adding.
the lower and upper class boundaries and dividing by two.
‘Cumulative frequency: the number of values less than the upper class boundary for
the current class. This is a running total of the frequencies.
Relative frequency: the frequency divided by the total frequency, This gives the
percent of values falling in that class.
eee
“Department of Mathematics STA 10% Lecture Notes for 2028/2024 Academie Session Page ®Cumulative relative frequency (relative cumulative frequency): the running total of
the relative frequencies or the cumulative frequency divided by the total frequency,
Gives the percent of the values which are less than the upper boundary.
Histogram: A graph which displays the data by using vertical bars of various heights
to represent frequencies. The horizontal axis can be the class boundaries, the class
marks or the class limits
Frequency polygon: a line graph. The frequency is placed along the vertical axis and
the class mid points are placed along the horizontal axis these points are connected
with lines.
give: a frequency polygon of the cumulative frequency or the relative cumulative
frequency. The vertical axis isthe cumulative frequency or the relative frequency, The
horizontal axis is the class boundaries. The graph always starts at zero the lowest class
boundary and will end up at the total frequency (for a cumulative frequency) or 1.00
(for a relative cumulative frequency).
chart: graphical depiction of data as slices of a pie. The frequency determines the
size of the slice. The number of degrees in any slice is the relative frequency multiply
by 360 degrees.
Pictograph: a graph that uses pictures to represent data,
MEASURE OF CENTRAL TENDENCY (AVERAGES)
‘The measure of central tendency or measure of location is important in determining
averages of numerical values and itis the value to be expected at a typical or middle
data point. The following are the three measures of central tendency that are in
common use:
i, Arithmetic mean
ji, Median
Mode
Department of Mathonaties STA 10! b‘We shall briefly discuss each of these measures in this section,
The Arithmetic mean (simple mean)
The arithmetic mean is the best known and most reliable measure of central tendency.
It is the arithmetic average of a group of scores which can be obtained by adding all
the scores in a distribution and then dividing the sum of the scores by N (the total
number of scores).
‘Arithmetic mean of a set of data or observations is thelr sum divided by the number
of observations, e.g., the arithmetic mean ( X ) of the observations x1, x2, ... Xois given
by
‘The mean in most cases is not the actual data value.
Inthe case of frequency distribution x, f,,1= 1,2 0
Where fiis the frequency of the variable x,
Lx + fyXy tn t+ fate
Sit htwt hs
In the case of grouped or continuous frequency distribution, xis taken as the mid-point
of the corresponding class.
Remark: the symbol is called sigma and is a Greek alphabet use in mathematics to
denote the sum of values.
Example 1
‘The ages in years of random sample of six school children are 3,8,5,12,14 and12. Find
the average age of the school children.
Solution
—
Department of Mathematics STA 10% Lecture Notes for 2023/2024 Acadeniie Session Page1O= Lanasat2eider? = 54=9 years
6 6
Example 2
a) Find the arithmetic mean of the following frequency di
x)oa 2 3 4 5 6 7
a er ee
b) Calculate the arithmetic mean of the mark from the following table:
Marks Q=10 10-20 20-30 30-40 40-50 50-60
Noofstudents | 1218 2720 wv 6
Solution
2} Computing the arithmetic mean using an ungrouped frequency distribut
x fi ie
1 5 3
2 os
3 fase
4 i 68
5 1470
6 wo 60
7 6
Total 73299
LSS x = 299/73 = 4.09
b) Computing the arithmetic mean using grouped frequency distribution:
Marks No. ofstudents(f) Mid-point(x) ie
Tpnninin ofraptncnatien STA TC: betture Nees for 2023/2024 Academie Sesion Page HL0-10 12 5 60
10-20 18 15 270
20-30 27 25 675,
30-40 20 35 700
40-50 7 45 765
50-60 6 55 330
Total 100 2800
Fe DDS x= ig t 2800" 28
Note:
If the values of x and f are large, then calculating the mean using the above formula
will be time-consuming and tedious. However, the calculation can be reduced to a
large extent by taking the deviations of given values from any arbitrary point A as
explain below:
Let d)=x/—A. Then fd) =fi(xi-A)
fxs - Afi
‘Summing both sides over | from 1 to n, we get
Dynan
Sie Saabs
: ae 7
= Yifid, = ES fin-A=%-4, where * is the arithmetic mean of the
distribution.
ie
1X sa
welt
The above formula is much more convenient to apply.
Exercise 1
ind the mean of the scores.
Below are scores for 25 students on a4 point quiz.
STA dai Lecture Nowe for 25/2024 Acadenie Session Page12Exercise 2
Given the data as classified in the table below, calculate the arithmetic mean.
Class Frequency (f)
10-14 2
15-19 4
20-24 3
25-28 2
30-34 1
35-39 2
Exercise3
Use the frequency distribution of marks given in the table below to find the mean
mark of the following students.
("ark | Glass mark T_ Frequency
[rsosa [sz | 1
[esse] sr] 3
60-64 62 5
[65-68 67 | 7
70-76 | 72 1 é
75-79 | 77 10
60-84 a
e569 |
iz
—————Example 3
Find the mean of the above data using an assumed mean, Choose 77 as the assumed
mean, we complete the table as follows:
Class mark, Deviation from Frequency) fa
() assumed mean (d)
52 -25 1
57 =20 3
GI =15 5
67 =10 7
2 5 8
77 0 10
82 5 6
87 10 4
92 15 5
7 20 1
Total Ysf-s0
. et
R= AtLY fd, =774GQ =77-21=769
we, 0
Merits and Demerits of Arithmetic mean
Merits Demerits
1. itis rigidly defined It cannot be determined by inspection
2. It is easy to understand and | nor can it be located graphically.
compute 2. It cannot be used while dealing with
3. Itis based on all the observation | qualitative characteristics which cannot
4. It is amenable to algebraic | be measured quantitatively such as
treatment. intelligence, honesty, beauty, ete
5. Among all the _ averages, | 3. Arithmetic mean cannot be obtained
arithmetic mean is least affected | for missing values.
by fluctuations of sampling (i.e | 4. Itis affected by extreme values
the arithmetic mean is a stable | 5. Arithmetic mean may lead to wrong
average) conclusions if there details of data
computed from are not given,
6, It cannot be calculated if the extreme
class is open, eg, below 10 or above 70
‘THE MEDIAN (MD)
=
Department of Mathematics STA 10% Lecture Notes for 2023/2024 Acadlenic Stssion PagetMedian of a distribution is the value of the variable which divides it into two equal
parts, je,, itis the value such that the number of observations above itis equal to the
number of observations below it.
= Whena data set is ordered, its called a data array
‘The median is defined to be the midpoint of the data array
Median of ungrouped data
in the case of ungrouped data, if the number of observation is odd, the median Is the
middle value after the values are arranged in ascending or descending order of
magnitude.
in the case of even number of observations, there are two middle terms, median is
obtained by taking the arithmetic mean of the middle terms.
IF the observation of an ungrouped data is arranged in an increasing or decreasing
order of magnitude, a value which divides the ordered observations into two equal
parts is called the median of the data, Itis denoted by M.O
Example:
The median of the values 25, 20, 15, 35, 18 is 20 and the median of 8, 20, 50, 25, 15,
30s $(20425)= 22.5
Median of ungrouped frequency distribution
In the case of discrete frequency distribution, median is obtained by consid
cumulative frequencies, using the following steps:
1
Find EN, where N= 3
ii. See the (less than) cumulative frequency (cf just greater than 2.
iii, The corresponding value of x is the median
Sporto of ation aes STA GOA Listure kes for 2023/2024 Acadenie Session PagesSExample:
Obtain the median for the following frequency distribution:
xo o1 2 3 4 5 6 7
fo 8 10 1 16 20 25 15
Solution:
Computation of the median
x f f
1 a 8
fe 0 eerie
3 nn 29
4 16 45
5 20 65
6 2 90
7 15 105
8 9 114
9 6 120
Total N=120
Here, N= 120 => N/2 =60
The point where N/2th value falls is at c.f 65, and the value of x corr
5.
Therefore, median is 5.
Exercise:
responding to 65 is
‘A super market recorded the number of items sold per week over one year period,
The data is given below:
No.ofitemssold Frequency
1 4
2 9
3 6
4 2
5 3
Median for a Grouped Frequency Distribution:
Department of mathenaat
DSTA Ta1 Lecture Notes for 2028/2024 Academie Session Page.In the case of continuous frequency distribution, the class corresponding to the c.fjust
greater than N/2 is called the median class and the value of median is obtained by the
following formula:
(N12) - of yw
f
Median (MO) = In+
Where Nis the sum of the frequencies
cfis the cumulative frequency of the class preceding the median class
f is the frequency of the median class
wis the width of the median class
Ins the lower boundary of the median class.
Example:
Given the data in table below, find the median of the distribution.
Class Frequency
155-205 3)
20.5 -25.5 5
25.5 -30.5 4
30.5-35.5 3
35.5-40.5 2
Solution: Form a cumulative frequency distribution table of the given data
Class Frequency Cumulative frequency
15.5-20.5 3 3
205-255 5 a
255-305 4 2
30.5 -35.5 3 15
35.5 - 40.5 2 17
N/2is 17/2=85=9
Hence, the class that contains the 9" value is the median class i.e. the median class
will then be in the class interval (25.5 - 30.5)
Then,
7, of =8, f= 4, W=30.5-25.5 = 5, Im = 25.5
Deparenaant of Mathematics STA 04 Lecture Notes for 2028/2004 Academie Session Paget?Median (MD) = In + (weg
Median = 255+ {7 /2)= 815
Exercis
‘The following are marks obtained by 17 students in STA 101 examinations. Find the
median mark of the students.
‘Marks
10-20
20-30
30-40
40-50
50-60
Merits and Demerits of Median
Merits Demerits
1. Itis rigidly defined T. In the case of even number of
2. Itis easily understood and easy to | observations, median cannot be
compute. In some casesit can be | determine exactly, but estimate by
located by mere inspection taking the mean of the two middle terms
3. Itisnotatallaffected byextreme|2. It is not based on all the
values observations(the median is insensitive)
4, Median can be calculated for|3. It is not amenable to algebraic
distributions with open-end | treatment
classes. 4, When compared to mean, median is
affected much by fluctuations of
sampling.
MODE
Mode is the value which occurs most frequently in a set of observations and around
which the other items of the set cluster densely. Thus in the case of discrete frequency
distributions, mode is the value of x corresponding to maximum frequency. For
‘example, in the following frequency distribution;
x fia 2 3 4 5 6 7 8
Fl 4 9 162522 7 3
—
ny Acadeatle Sts
Daparennent of atheThe value of x corresponding to the maximum frequency is 4. Hence mode of the given
frequency distribution is 4
However, for a small data set, where arrangement can easily be done, the data can be
arranged in an ascending order of magnitude and mode can easily be obtained by mere
inspection of the arranged data.
Note:
i. Foragrouped data, the mode is the most commonly observed category (class)
ii, A data set can have more than one mode ( bimodal)
ili, Addata set is said to have no mode if all values occur with equal frequency.
Exomples
1. To find the mode of the following data set
8,9,9, 14,8, 8, 10, 7, 6,9, 7, 8, 10, 14, 11, 8, 14, 11.
Ordering the data set
ascending order of magni
de gives
6,7.7,8,8,8, 8,8,9,9,9, 10, 10, 11, 11, 14, 14, 14.
‘Therefore, the mode of the data set is 8 because it appear with highest
frequency of occurrence.
2. Six strains of bacteria were tested to see how long they could rem:
alive,
outside their normal environmental conditions. The time in minutes is given
below. Find the mode, data set: 2,3,5,7,8,10
Here, there is no mode since each data value occurs equally with a frequency
of one.
3. Find the mode of the data 18, 18, 18, 20, 22, 24, 24, 24, 26, 26.
Here, there are two values having same high frequency, these are 18 and 24,
hence the data is bimodal (two modes).
In case of a large data set, we cannot easily pick the mode by inspection as illustrated.
in the above cases, so the mode can be computed using the following formula,
‘Mode of an ungrouped frequency distribution,
wi-p)
Mode (Me). hs When the frequency distribution is symmetrical, the mean, median and mode
coincide
> When the frequency distribution is skewed, the mean median and mode do
not coincide
Deporenacat of Pithecnd> If the frequency distribution is posi
ely skewed (skewed to the right), the
mean Is greater than the median; the median is greater than the mode i.e.
(mean>median>mode)
> If the distribution is negatively skewed (skewed to the left), the mode Is
greater than the median and the median is greater than the mean Le.
(mode>median>mean)
NOTE: the following relation holds between the mean, median and mode:
‘*% mode = mean — 3 (mean — median) = mean—3mean + 3median
“ mean-— mode = 3(mean — median) = 3mean - 3median
% mean~median = 1/3 (mean ~ median)
These can also be stated as;
4 mode = mean—3mean + median or mode = 3median ~2mean.
‘ORGANISATION OF STATISTICAL DATA
Having collected and edited the data, the next thing is to organise it, i. to present it
in a readily comprehensible condense form which will highlight the important
characteristics of the data, facilitate comparism and render it suitable for processing
(statistical analysis) and interpretation.
‘The presentation of data can be broadly classified into two:
(i) Tabular presentation
(ii) Diagrammatic or graphic presentation.
Tabular Presentation of Data
Tabulation and classification are devices of presenting the statistical data in neat,
concise, systematic and readily comprehensible and intelligible form.
> When data are collected in original form, they are called raw data.
> When raw datas organised into a frequency distribution, the frequency will be
the number of values in a specific class of the distribution
—_
Department of Mathenatice STA 101 Lecture Motes for 2028/2024 Academie Session Page 22A frequency distribution is the organisation of raw data in tabular form, using classes
and frequencies.
Classes or Types of Frequency Distributions
1, Categorical frequency distributions - can be used for data that can be placed in
specific categories, such as a nominal or ordinal level data.
Examples are political affiliation, religious affiliation, blood type etc.
Blood type frequency distribution
Class. Frequency Percent
A 5 20
8 7 28.
° $s 36
AB 4 16
2. Ungrouped frequency distributions - used for data that can be enumerated and
when the range of values in the data set is not large.
Examples - number of miles you travelled from home to campus, number of girls in a
4-child family ete.
Number of miles travelled- Example
Class Frequency
5 24
10 16
15 10
3. Grouped frequency distributions - used when the range of values in the data set is.
very large. The data must be grouped into classes that are of more than one unit in
width.
‘Some Basic Principles for Forming a Grouped Frequency Distribution
‘The following guidelines may be used for a good classification of a frequency data.
Departnent of Mathematies STA 102 Lecture Notes for 2023/2024 Acaderaic Session Page 23‘% Types of Classes: The classes should be clearly defined and should not lead to
any ambiguity. They should be exhaustive and mutually exclusive (ie. Non
overlapping)
+ Number of classes: The choice of the number of classes or the class intervals
into which a frequency distribution can be divided primarily depends on;
i, _ the total frequency (i.e, total number of observation)
Ti, the nature of the data i.e, the size or magnitude of the values of the
variable
il, the accuracy aimed at
Iv. the ease of computation
‘Terms associated with a grouped frequency distribution
+ Class limits represent the smallest and largest data values that can be included
ina class. E.g, in the lifetime of boat batteries example, the values 24 and 30 of
the first class are the class limits,
= The lower class limit here is 24 and the upper class limit is 30,
~The class boundaries are used to separate the classes so that there are no gaps
in the frequency distribution,
~The class width of a class in frequency distribution is found by subtracting the
lower (or upper) class limit of one class minus the lower (or upper) class limit of
the previous class.
Guidelines for constructing a frequency distribution
~ There should be between 5 and 20 classes.
= The class width should be an odd number.
= The classes should be mutually exclusive,
= The classes must be continuous.
~The classes must be exhaustive.
= The classes must be equal in width
eee
Department of Mathenuaties STA 101 Lecture Notes for 2023/2024 Academie Seesion Page otProcedure for constructing a grouped frequency distribution
= Find the highest and lowest value
= Find the range (i.e, highest value ~ lowest value)
= Select the number of classes desired
- Find the width by dividing the range by the number of classes and rounding up.
= Select the starting point (usually the lowest value); add the widths to get the
lower limits.
= Find the upper class limits
~ Find the class boundaries
= Tally the data, find the cumulative frequency,
Example: Grouped Frequency Distribution
In a survey of 20 patients who smoked, the following data were obtained. Each value
represents the number of cigarettes the patient smoked per day.
Construct a frequency distribution using six classes,
10 8 6 14
22 1B 7 19
i 9 18. 14
2B 2 15 15,
s [| ae [aa
Solution:
‘Step 1: Identify the highest (H) and lowest (L) values: H= 22 and L= 5.
Step 2: find the range: R= H—
2
v7.
‘Step 3: select the number of classes you desire, say 6.
‘Step 4: find the class width by dividing the range by the number of classes.
Width = 17/6 =
.83. This value is rounded up to 3.
—_
Department of Mathematics STA £01 Lecture Notes for 2028/2024 Académie Session. PAGERSStep 5: select a starting point for the lower class limit. For convenience, this value is
chosen to be 5, 8, 11, 14, 17 and 20.
Step 6: the upper class limits will be 7, 10, 13, 16, 19 and 22. For example, the upper
limit for the first class is computed as 8~1, etc.
Step
find the class boundary by subtracting 0.5 from each lower class limit and
adding 0.5 to the upper class limit.
Step 8: Tally the data and then write the corresponding numerical values for the tallies
in the frequency column, and find the cumulative frequencies.
Class limits Class Tally | Frequency | Cumulative
boundaries frequency
05-07 45-75 IL 2 a
[~ 08-10 75-105 m7 3 5
1-13 10.5- 13.5 HAT 6 ct
14-16 | _ 135-165 Hit 5 16
a7-19 | 165-195 Wh 3 19
20-22 | _19.5-22.5 1 1 20
Graphical Presentation of Stati
‘The three most commonly used graphs in statistical analysis are:
1. The histogram.
2. The frequency polygon.
3. Cumulative frequency graph or ogive.
‘The Histogram: A bar graph that represents a frequency distribution of a quantitative
variable. A histogram is made up of the following components:
1. Atitle, which identifies the population or sample of concern.
2. Avertical scale, which identifies the frequencies in the various classes.
3. A horizontal scale, which identifies the variable x. Values for the class
boundaries or class midpoints may be labelled along the x-axis. Use which ever
method of labeling the axis best presents the variable.
——
Department of Mathematizs STA 101 Lecture Netes for 2023/2024 Academie Session Page 2Example:
Draw a frequency histogram of the annual salaries for resort-club managers.
‘Annual Salary ($1000) 15-25 25-35 35-45 45-55 55-65
Number of Managers
Solution:
Frequency Histogram |
Number of Managers
5 T]
15-25 25-35 35-85 SSS 55-65
‘Annual Salary
‘The Frequency Polygon
Frequency polygon is another device of graphic presentation of @ frequency
distribution (continuous or discrete).
In the case of discrete frequency distribution, frequency polygon is obtained by
plotting the frequencies on the vertical axis (Y-axis) against the corresponding values
of the variable on the horizontal axis (X-axis) and joining the points so obtained by
straight lines.
Example: The following data show the number of accidents sustained by 313 drivers
of a company over a period of 5 years. Use the data to draw a frequency polygon.
Nocteaans | ofayz)214¢ [5 ,*)7/#[2| |»
Woofawvers | 60 | a pes | a [2s [2 | [7[s[a| 3 [2
_—————————— eel
Dopartmnt of Mathematics STA Lou Listurt Nokes for 2022/2024 Academie Session Page 2peumngar oF Ree Se SF
+ frequency polygot
7
~
. de Boe
+o § UGC POLY Cont
0 Aira ge
d
™~ ¢
(ras
xo. 2
or o 2 at a TE ee a et
A frequency polygon for a frequency distribution having equal class intervals is formed
by plotting (as points) class frequencies above the mid-points of the classes to which
they relate and joining these points using straight lines.
Note: The mid-point of a class is defined as that point lying mid-way between the two
class boundaries. It is calculated as
Lebtueh.
Example:
The data below gives the frequency distribution of the weekly wages (in naira) of 100
workers in a factory. Use the data and draw the histogram and frequency polygon of
the distribution
Weeldy wages) 20-24 | 25-29 | 30-34 | 35-39 | 40-44 | 45-49 | 50-54 | 55-59'] 60-64
Number of workers [4 Ce FE 5 2
Solution:
All the classes are of equal magnitude j.e.5 but they are not continuous, as such. the
distribution is to be converted into a continuous frequency distribution as below:
‘weekly
wagestt) | 195-2455 | 245-2955 | 29.5-34.5 | 345-395 | 395-045 | 445-095 | 095-545 | 54.5595 | 595-645
Number of
workers 4 5 2 23 31 10 8 5 2
Department of Mathematics STA 101 Lecture Notes for 2020/2021 Acaaerite SessionThis is obtained by subtracting 0.5 from the lower limit and adding 0.5 to the upper
limit of each class interval. Frequency polygon is obtained by joining the mid points of
the rectangles by straight lines, and extend both ways to 14.5-19.5 and 64.5-69.5 on
the X-axis. STG CAE F FREQUEUG COLYSSt
2 Tenn srecnnet
eens
e 4
Z— POL AGO
a
es. tor
Qs
Ho.
6
14
Sn ge mae BO GATE Wrens Oe her
Sette wae We HS SIS Ses
We should note that the plotted points for the frequency polygon are just the centres
of the top of the bars of the histogram.
Cumulative Frequency Graph or Ogive: A line graph of a cumulative frequency or
cumulative relative frequency distribution. An ogive has the following components:
1, Atitle, which identifies the population or sample.
2. A vertical scale, which identifies either the cumulative frequencies or the
cumulative relative frequencies.
3. Ahorizontal scale, which identifies the upper class boundaries. Until the upper
boundary of a class has been reached, you cannot be sure you have
accumulated all the data in that class. Therefore, the horizontal scale for an
ogive is always based on the upper class boundaries.
Example:
Construct 3 cumulative frequency graph (ogive) for this frequency distribution.
Marks | Frequency
ee
Departncent of Mathenaatles STA 101 Lecture Notes for 2020/2021 Acad30-54
55-59
60-64
65-69
10-74
15-73 10
80-84
85-89
90-94
95-99
Solution: To obtain a cumulative frequency distribution, the absolute frequencies are
added successively as shown below:
‘warks | Cumulative frequency
Tess than 55 T
» 60 4
on 5 9
7 16
- Fy
on» 80 34
eres 40
» 90 4s
Aa 45
ow 100 50 |
Cumulative frequency table
‘A graph of the cumulative frequency distribution for any set of data is called an give
or cumulative frequency curve. To obtain an ogive, the cumulative frequency of each
class is plotted against the upper boundary of that class and the points are Joined by a
smooth curve. This curve is very useful for reading off the percentage of observations
below or above a given value.
Talon Cccture Notes for 2023/2024 Academic Session POQL30Thus, the corresponding ogive of the cumulative frequency distribution of the
cumulative frequency table given above is shown below:
Ogive graph . 2.
Carseat WE SO Caverrey CAs
ae SS
a Cpe ntiad Geet te—cs
te
Note: Every ogive starts on the left with a relative frequency of zero at the lower class
boundary of the first class and ends on the right with a cumulative relative frequency
of 100% at the upper class boundary of the last class.
Pie graph (Diagram) ~ A pie graph is a circle that is divided into sections or wedged
according to the percentage of frequencies in each category of the distribution.
Steps for Construction of Pie Diagram
ZL
2.
Express each of the component values as percentage of the
respective total
Since the angle at the centre of the circle is 360°, the total
magnitude of the varius components is taken to be equal to 360°
and each component part is to be expressed proportionately in
degrees.
Draw a circle of appropriate radius using an appropriate scale
depending on the space available.
Having drawn the circle, draw any radius (preferably horizontal).
Now with the radius as the base line draw an angle at the centre
(with the help of protractor) equal to the degree represented by
the first component. The sector so obtained represent the
proportion of the first component.
Different sector representing various component part are
distinguished from one another by using different shades,
dottings, colour, etc, or labels either inside the sector or outside
the circle.
-COGwe)Remarks: The degrees represented by the various component part of a given
magnitude can be obtained as follows:
Degree of any component part =
Component value
Total value * °°
Example: Draw a pie diagram to represent the following data of proposed expenditure
by a State Government for the 1997 ~ 1998
tems ‘Agriculture & Rural | industries & Urban | Health & ‘Miscellaneous
Development. Development | Edueation
Proposed 4200 1500 3,000 500
Expenditure fin
milions)
Solution:
Calculations for Pie Chart
items Proposed Expenditure ‘Angle atthe centre
@) (2) (3) = {35 x360°
‘griculture & Rural Development 4.200 #8060'= 210°
$8 x360"= 75°
Industries & Urban Development 4,500
Health & Education 34,000
Miscellaneous 500
Total 7,200
Pie diagram representing proposed expenditures by the state government is as given below.
PROPOSED EXPENDITURE
Miscellaneous
Health &
Education
Industries &
Urban
Development
Department of Matheraat
Agriculture &
Rural
Development
Sra aaa Lecture Notes for 2028/2024 Academie Session Page 32THE BINOMIAL THEOREM
The binomial theorem is an algebraic method of expanding 2 binomial expression
(vax).
Theorem: If x and y are quantities and n is a positive integer, then we can expand
(+3) in the form:
vor ien Gh Gertie ee
We can demonstrate this result easily with two examples; n= 2 andn
Example 1: For m =2, the theorem gives:
cee
Example 2: For n= 3, the theorem gives:
wor er peer Gh
Psst sseyex
Which is also verified by evaluating
Department of Mathematics STA 101 Lecture Notts for 2023/2024 Page ss(r+x)=(40realyead= P4397 4aty 4?
We are interested in the particular form of the theorem given by
y=1-pandx=p,
et-prol=E{"pt-n)”
=1{since t= p+ p)'=1" =1)
‘There are numerous applications and identities concerned with th
theorem but we
limit ourselves to only a few of these that are directly applicable to the scope this,
course.
Pascal's Triangle
The coefficients of the binomial expansion (i.e., the numbers pre-multiplying the x and
y terms) form an interesting and useful pattern when looked at in isolation.
=0.
‘We will expand (y+.x)" for a few values of n beginning with
(x+y
(c+) try
(cs) <1 42a 4p?
(ety) =i 430474397 41)?
(c+)! =x +4rry 6x29? +ay? +1y*
Writing the above coefficients in the form of a triangle gives the pattern shown below,
and notice that any adjacent values added together gives the value immediately below
it in the following row. This particular characteristic of the number triangle enables us
immediately to write down the next row of the triangle, and the one after that, and so
on.
ee
Diporonont of mathematics STA 10 Lecture Notes for 2028/2024 Acadinile Session Page st1 4 6 4 4
Nothing that each row always begins and ends with a 1, we have the next row as:
1, 5(= 144), 10(= 446), 10(=6+4), 5(=441), and 1.
The next row is 1, 6, 15, 20, 15, 6 and 1. We can, of course extend this process
indefinitely. Notice, in particular, that each row of coefficients form a symmetric
pattern, so that, for instance;
(e+y) =¥ s3rtye3g% 43?
ay 43ytxt3yx? tx?
=(y+2), as we would expect.
The ease with which we can generate binomial coefficients using Pascal's Triangle
enables us to write down binomial expansions fairly rapidly. That is, the row 1, 5, 10,
10, 5 and 1 represents the coefficients in the expansion of (x +y)*,
ie, (x+y) =x5 +5x4y+10x°y? +10x7y? +5xy* + y* or, alternatively:
(vtx) sy! +5ytx+lOy'x? +10y2x? + 5x4 +34
Fora value of nas large as 20, say, Pascal's Triangle would grow rather large and in
this case we could revert to the coefficients given in the theorem, namely
C2}
a
Department of mathenantics STA 10! Lecture Notes for 2023/2024 Academie Session Page soNORMAL PROBABILITY DISTRIBUTIONS
The normal probability distribution is considered the single most important probability
distribution. An unlimited number of continuous random variables have either a
normal or an approximately normal distribution, The normal probability distribution
has a continuous random variable and it uses two functions: one function to determine
the ordinates (y values) of the graph picturing the distribution and a second to
determine the probabilities. The formula below expresses the ordinate (y value) that
corresponds to each abscissa (x value).
Note: Each different pair of values for the mean, x, and standard deviation, o, will
result in a different normal probability distribution function.
When a graph of all such points is drawn, the normal (bell-shaped) curve will appear
as shown in this figure below:
ease. PR ce ABILIT Gee
J
Dipartncent of Mathomdtice STA 10H Lecture Notes [or 2020/2021 Academie Session PAGE SE