0% found this document useful (0 votes)
24 views113 pages

Introdaction To Statistics

Focus on statistics

Uploaded by

Teshale Siyum
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views113 pages

Introdaction To Statistics

Focus on statistics

Uploaded by

Teshale Siyum
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 113

HAWASSA UNIVERSITY

Department of Statistics
College of Natural and Computational Science.
MSc. in Bio-Statistics

Intr oducti on to Statistics

By:
Addisu T.(M.Sc)

E-mail:- [email protected]
Phone:- +251 18 17 36 76

Nationality:- Ethiopian
November 24, 2023

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 1 / 105


Outline

1. Chapter
1 One
1
INTRODUCTION

2 Chapter Two
Organization and Presentation of data

3 Chapter Three
Measures of central tendency

4 Chapter Four
Measures of dispersion (Variation)

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 2 / 105


CH-1

Chapter One
Definition and classifications of statistics

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 3 / 105


INTRODUCTION
1. Introduction
1.Definition and classifications of statistics
Definition
a. In the plural sense :- are the numerical figures or facts.
Statistics are the raw data themselves , like statistics of births,
statistics of deaths, statistics of students, etc.

b. In the singular sense:- statistics is the science which deals with


the collection, organization, presentation, analysis and
interpretation of data.
The word 11statistics11 is derived from the Latin for 11state11
indicating the historical importance of governmental data
gathering.
Now days, statistics is used almost in every field of study, such as
natural science, social science engineering, medicine, agriculture,
etc.
In trodu ction to Statistics Set BY Addisu T. 5/1/2023 4 / 105
Can't
Classifications of Statistics:
Based on how data are used statistics can be classified in to two:-
1 Descriptive statistics consists of the collection, organization,
summarization and presentation of data.
the statistician tries to describe a situatio n and presenting the data
in some meaningful form, such as charts, graphs, or tables.

2 Inferential statistics consists of generalizing from samples to


populations, performing estimations and hypothesis tests,
determining relationships among variables, and making predictions.
Inferential statistics uses probability, i.e., the chance of an event
occurring.
1.2. Stages in statistical investigation:
i. Collection of data: the process of measuring, gathering, assembling
the raw data upon which the statistical investigation is to be
based. Careful planning is essential before collecting the data.
ii. Organization of data: Summarization of data in some meaningful
way, e.g table form
In trodu ction to Statistics Set BY Addisu T. 5/1/2023 5 / 105
Can't

iii. Presentation of the data: The process of re-organization,


classification, compilation, and summarization of data to present it
in a meaningful form by using tabular or diagrammatic or graphic
form.

iv. Analysis of data: The process of extracting relevant information


from the summarized data, mainly through the use of elementary
mathematical operation. e.g : mean, variance, median or mode.

v. Interpretation of data: The final step is drawing conclusion from


the data collected. A valid conclusion must be drawn on the basis
of analysis. A high degree of skill and experience is necessary for
the interpretation.
Statistical techniques based on probability theory are required.

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 6 / 105


Can't
1.3. Definitions of some terms
1. A (statistical) population: is the complete set or totality of
objects/subjects that possess some common characteristics which
are being studied.
•The population from which actual information collected is called
Sample or Study Population.
• The population to which generalization is made is called Target
or Source or Reference Population.
Examples:
Population of all males between the ages of 15 and 18.
Population of trees under specified climatic conditions
Population of animals fed a certain type of diet
Population of farms having a certain type of natural fertility
Population of households, etc

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 7 / 105


Can't
There are two ways of investigation: Census and sample survey.
2. Census: a complete enumeration of the population.
3. Sample: A sample from a population is the set of measurements
that are actually collected in the course of an investigation.
Examples:
Monthly production data of a certain factory in the past 10 years.
Small portion of a finite population.
4. Parameter: Characteristic or measure obtained from a
population. eg: µ, a
5. Statistic: Characteristic or measure obtained from a sample.
eg: X̄, s
6. Sampling: The process or method of sample selection from the
population.
7. Sample size: The number of elements or observation to be
included in the sample.
In trodu ction to
to Statistics
Statistics ()Set BY Addisu T. November
5/1/2023 24, 2019 s
8 / 105
Can't

8. Data: are the values (measurements or observations) that the


variables can assume. It can be obtained either by measurement or
counting or observing.
9. Variable: It is an attribute or characteristic that can take any
values. Can be:
1 Qualitative Variables: are nonnumeric variables and can't be
measured.
Examples: gender, religious affiliation, and state of birth etc.
2 Quantitative Variables: are numerical variables and can be
measured.
Examples: temperature, blood pressure, weight a t birth, height,
number of children in family, diameter of trees in rain forest etc.
Quantitative variables can be further classified into two groups:
1 Discrete variables assume values tha t can be counted.
2 Continuous variables can assume an infinite number of values
between any two specific values.

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 9 / 105


Can't
1.4. Application, Uses and Limitation of Statistics
i. Applications of statistics:
In almost all fields of human endeavor.
For example:
Agriculture and agricultural research, sport world, Social science
and social research, Industry special in quality control, Biological
science, Education, Engineering , Administration and governmental
policy making
ii. Uses of statistics:
It helps to present data in definite and precise form
It helps to data reduction
It helps to forecast for the future
It helps to identify the relationship between two or more variables
It helps to formulate and test hypothesis
It helps to formulate policy and decision making
In trodu ction to Statistics Set BY Addisu T. 5/1/2023 10 / 105
Can't

iii. Limitations of statistics:

Statistics is not suitable to the study of qualitative


data
It deals with only aggregate of facts but not deals
with individual data
Statistical laws are not
exact Statistics can be
easily misused
Statistics is only, one of the methods of studying a
problem

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 11 / 105


In trodu ction to Statistics 5/1/2023 12 /
Con’t..

In trodu ction to Statistics 5/1/2023 13 /


Con’t…

In trodu ction to Statistics 5/1/2023 14 /


In trodu ction to Statistics 5/1/2023 15 /
In trodu ction to Statistics 5/1/2023 16 /
Con’t…

In trodu ction to Statistics 5/1/2023 17 /


Con’t…

 Level of Measurement (Measurement


Scale)
Based on measurement scale data can be classified
in to four:
Nominal, Ordinal, Interval and Ratio (LOWEST)
Nominal - Ordinal - Interval - Ratio (HIGHEST)

In trodu ction to Statistics 5/1/2023 18 /


Can't
i. Nominal Scale
Nominal11 is a Latin word for 11name11 This is a scale for grouping
individuals into different categories.
In this scale, one is different from the other
All mathematical operations and comparisons are impossible
Statistical analysis based on frequency counts such as percentage,
mode.
Example: gender, species, ethnicity, nation, religion, locality,
party affiliation, red, black, short, pass, fail etc
ii. Ordinal Scale
Classifies data into categories that can be ranked or ordered;
however, precise differences between the ranks do not exist.
Along with counting operation of nominal scale this has statistics
based on percentiles, quartiles and median.
Relational operations are meaningfully applied but not arithmetic
Example: social class, severity of a behavior disorder
In trodu ction to Statistics Set BY Addisu T. 5/1/2023 19 / 105
Can't
iii. Interval Scale
Distance between any two objects is fixed and equal
It allows comparison of difference between two objects
Meaningful addition and subtraction of scale values are possible
The zero point and the unit of measurement are arbitrary
In addition to the statistical techniques applied to nominal and
ordinal data, the arithmetic mean and standard deviation are
used. Example: Temperature (Fahrenheit or Celsius)
iv. Ratio Scale
Possess all the properties of nominal, ordinal and interval scale
This has absolute zero point
It is meaningful to calculate ratio of scale values.
All statistical techniques can be applied. Examples: Income, age,
weight, height, Blood pressure so on.
In trodu ction to Statistics Set BY Addisu T. 5/1/2023 20 / 105
Can't
Categorical variables can be:(Nominal or ordinal)

They can be placed into one of two (dichotomous) or more


(polychotomous) categories. Examples of dichotomous categorical
variables: Male / Female, Pregnant / Not pregnant, Smoker / Non
smoker, Married / Single
However, many classifications require more than two categories.
For e.g., Married / Single / Divorced/ Separated/ Widowed;
Blood group: A/ B/ AB/ O; Religion: Hindu/ Christian/ Muslim
etc. There is no ordering of the categories. These are examples of
nominal scale.
But often there is a natural order, as with the varying stages of
cancer and social class. Example : degree of smoking can be
further divided as non-smokers/ ex-smokers/ light smokers/ heavy
smokers. This is an example of ordinal scale.
In ordinal scales, the categories bear an ordered relationship to
one another
In trodu ction to Statistics Set BY Addisu T. 5/1/2023 21 / 105
Can't

Numerical variables can be:(interval or ratio)

They are expressed as integers, fractions or decimals, in which


equal distances exist between successive intervals. Age, systolic &
diastolic blood pressure, and height are examples of continuous
variables.
Numerical variables can be further divided into discrete &
continuous. Discrete numerical variable can take only intermittent
values over a range, they differ by fixed amount, and no
intermediate values are possible.
Examples of discrete numerical variables are number of children,
number of ectopic heart beats etc
The difference between discrete numerical & ordered categorical
(ordinal) can be seen in the following example.
Stages of breast cancer: I II III IV (Ordinal data)
Number of children: 0 1 2 3+ (Discrete numerical)
In trodu ction to Statistics Set BY Addisu T. 5/1/2023 22 / 105
Can’t...

We cannot say that stage IV is twice as bad as stage II, or that the
difference between stages I and II is equivalent to that between
stages III and IV.
In contrast, 3 children as three times as many as 1 and a difference
of one means the same throughout the range of values.

If the values of the measurement take any number in a range, the


data are said to be continuous.

The difference between any two possible data values can be very
small. Common examples include height, weight, temperature etc.

Continuous data can be reduced to several categories.

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 23 / 105


Can’t…
1.6. D a t a collection
Any aggregate of numbers cannot be called statistical data. We say an
aggregate of numbers is statistical data when they are
Comparable
Meaningful and
Collected for a well defined objective
There are two sources of data:

i. Primary Data
Data measured or collected by the investigator or the user directly
from the source. Data collected first hand by the investigator.

ii. Secondary Data


Data gathered or compiled from published and unpublished
sources or files. from journals, reports, government publications,
publications of professionals and research organizations. E.g.
CSA: Census, DHS
In trodu ction to Statistics Set BY Addisu T. 5/1/2023 24 / 105
Methads af callecting primary data
1. Survey method
Investigator makes personal contact with the informants either
directly or indirectly and collect the data (Telephone Interview,
Questionnaires, Focus Group discussion, Door-to-Door Survey,
New Product Registration ), Collected information is more
reliable/accurate
2. Experimental method
Determine whether/in what manner variables are related to each
other. eg: to study the effect of fertilizer on crop
3. Observation method
Investigator observes the overall nature of the event and collects
the required data.
devices used are automatic recorder, motion picture etc
ex: individual doing research on growth of plants, behavior of
bats, keenly observes and finds out the required information.
Gives more accurate result and supplementary information. Costly
and time consuming.
In
In trodu
trodu ction
ction to
to Statistics
Statistics ()Set BY Addisu T. 5/1/2023 24, 2019
November 25/ / 105
1s
Secandary data saurces

Official publications of Government


Publications of research institutions
Professional bodies
Economic trade and scientific Journals
W h e n the source is secondary data check that:
The type and objective of the situations.
The purpose for which the data are collected and compatible with
the present problem.
The nature and classification of data is appropriate to our
problem.
There are no biases and misreporting in the published data.
Note: Data which are primary for one may be secondary for the
other.

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 26 / 105


CH-2

Chapter Two
Organization and Presentation of data

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 27 / 105


Organizatian and Presentation of data
2. Methods of data organization and presentation
Having collected and edited the data, the next important step is
to organize it.
To present it in a readily comprehensible condensed form that aids
in order to draw inferences from it.
The presentation of data is broadly classified into
Tabular presentation
Diagrammatic and Graphic presentation.
Classification & Tabulation
The process of arranging data into classes or categories according
to similarities is called classification.
Classification is a preliminary and it prepares the ground for
proper presentation of data.
Tabulation/organization is concerned with the systematic
arrangement by using classes and frequencies.
Thus classification is the first step in tabulation.
In trodu ction to Statistics Set BY Addisu T. 5/1/2023 28 / 105
Can’t…

R a w data: recorded information in its original collected form,


whether it can be counts or measurements.
Frequency: is the number of values in a specific class of the
distribution.
Frequency distribution: is the organization of raw data in table
form using classes and frequencies.
Types of frequency distributions:
Categorical frequency distribution
Ungrouped frequency distribution
Grouped frequency distribution
Categorical frequency distribution
Used for data that can be place in specific categories such as
nominal, or ordinal. E.g. marital status.

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 29 / 105


Can’t…

Example: a social worker collected the following data on marital


status for 25 persons. (M=married, S=single, W=widowed,
D=divorced)
MSDWDSSMMMWDSMMWDDSSSWWDD
class frequency percentage
M 6 24
S 7 28
W 5 20
D 7 28
Total 25 100
Class activities: The 25 people were given a blood test to
determine their blood types. The following data is obtained then
construct appropriate frequency distribution:
A B B AB O O O B AB B B B O A O A O O O AB AB
AOAB
In trodu ction to Statistics Set BY Addisu T. 5/1/2023 30 / 105
Ungrauped frequency distributian

Is a table of all the potential raw score values that could occur in
the data along with the number of times each actually occurred.
Is often constructed for small set of data on discrete variable.
Constructing ungrouped frequency distribution:

First find the smallest and largest raw score in the collected data.
Arrange the data in order of magnitude and count the frequency.

To facilitate counting one may include a column of tallies

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 31 / 105


Can't

In a survey of 40 families in a village, the number of children per


family was recorded and the following data obtained.
1 0321 56221 03421 6321 53324223021 453344
1 2 4 5, Represent the data in the form of a discrete frequency
distribution. Frequency distribution of the number of children

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 32 / 105


Grauped Frequency Distributian

When the range of the data is large, the data must be grouped in
to classes that are more than one unit in width.

Class limits: Separates one class in a grouped frequency


distribution from another. The limits could actually appear in the
data and have gaps between the upper limits of one class and
lower limit of the next.

Class boundaries: Separates one class in a grouped frequency


distribution from another. The boundaries have one more decimal
places than the raw data and therefore do not appear in the data.
There is no gap between the upper boundary of one class and
lower boundary of the next class.

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 33 / 105


Can't

Class width: the difference between the upper and lower class
boundaries of any class.
Class mark ( Mid points): it is the average of the lower and
upper class limits or the average of upper and lower class
boundary.
Cumulative frequency: is the number of observations less
than/more than or equal to a specific value.
Relative frequency : it is the frequency divided by the total
frequency.
Steps for constructing Grouped frequency Distribution

i. Find unit of measurement: it is smallest positive difference between


two any data value
ii. Find the largest and smallest values
iii. Compute the Range(R) = Maximum - Minimum

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 34 / 105


Can't

iv. Select the number of classes desired, usually between 5 and 20 or


use Sturges rule K = 1 + 3.32logn, where k is number of classes
desired and n is total number of observation.
v. Find the class width by dividing the range by the number of
classes and rounding up.
R
W =
K
vi. Pick a suitable starting point less than or equal to the minimum
value. The starting point is called the lower limit of the first class.
Continue to add the class width to this lower limit to get the rest
of the lower limits.
vii. To find the upper limit of the first class, subtract U from the lower
limit of the second class. Then continue to add the class width to
this upper limit to find the rest of the upper limits.

In trodu ction to
to Statistics
Statistics ()Set BY Addisu T. 5/1/2023 24, 2019
November 35/ / 105
2s
Can't

viii. Find the boundaries by subtracting U/2 units from the lower
limits and adding U/2 units from the upper limits.
The boundaries are also half-way between the upper limit of one
class and the lower limit of the next class.
Find the frequencies.
Example: Frame a grouped frequency distribution for
the observation of 30 families.
30,20,14,52,24,33,56,45,24,37,21,43, 25,11,33,19,26,31,42,34,37,41,15,47
38,26,44,28,12,
Solution: U=1 , Largest data value is 56 and smallest data value is
11 then range ,R = 56 -11 = 45
K = 1 + 3.32log (30) = 5.9 by rounding up=6
W=R K
= 456 = 7.5 by round up it is equal to 8,

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 36 / 105


Can't
Less than cumulative (LCF) - no. of items less than or equal to
the upper boundary of the class interval
More than cumulative (MCF) - no. of items more than or equal to
the lower boundary of the class interval
ascending order 11,12,14, 15,19,20,21,21, 24, 24, 25, 26,26, 28,
30, 31, 33, 33, 34, 37, 37,38, 41, 42, 43,44, 45,47,52,56
Then Grouped f.D

C.limit C.boundary Class mark frequency R.F LCF MCF


11-18 10.5-18.5 14.5 4 0.133 4 30
19-26 18.5-26.5 22.5 9 0.300 13 26
27-34 26.5-34.5 30.5 6 0.233 19 17
35-42 34.5-42.5 38.5 5 0.167 24 11
43-50 42.5-50.5 46.5 4 0.133 28 6
51-58 50.5-58.5 54.5 2 0.067 30 2
Total 30 1.000

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 37 / 105


Diagrammatic & Graphic presentatian of data

Techniques for presenting data in visual displays using geometric


and pictures. Diagrams are appropriate for presenting categorical
data.
Importance:
They have greater attraction, facilitate comparison and easily
understandable.
The most commonly used diagrammatic presentation are
Bar charts, Pictogram and Pie charts
1. B a r charts: -is a series of equally spaced bars having equal width
and the height of each bar representing the magnitude or frequency
of observations in each group.
Bars can be drawn either vertically or horizontally.
There are different types of bar charts. The most common being :
Simple bar chart, Component or sub divided bar chart, Multiple
bar chart and Percentage bar chart

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 38 / 105


1. Simple bar charts
Are used to display data on one variable or one way variable.
They are thick lines (narrow rectangles) having the same breadth.
The magnitude of a quantity is represented by the height /length
of the bar.
Example: The following data represent sale by product, 1957- 1959 of
a given company for three products A, B, C.

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 39 / 105


2. Campanent Bar chart

When there is a desire to show how a total (or aggregate) is


divided in to its component parts, we use component bar chart.

The bars represent total value of a variable with each total broken
in to its component parts and different color or designs are used
for identifications.
Example: Draw a component bar chart to represent the sales by
product from 1957 to 1959.

In trodu ction to S tatistics S e t B Y A ddisu T. 5/1/2023 40 / 105


3. Multiple Bar chart

These are used to display data on more than one variable.


They are used for comparing different variables at the same time.
Example: Draw a multiple bar chart to represent the sales by
product from 1957 to 1959.

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 41 / 105


4. Percentage Bar chart

Here percentages corresponding to the actual value of the


components are calculated
Actual value
percentage = * 100
Total actual value
It is a bar chart representing the percentage of the components of
component bar chart
The total percentage is 100 so that the bars are of equal height.
Example: Sales by Product 1957-1959

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 42 / 105


Pictagram

In this diagram, we represent data by means of some picture or


symbols. We decide about a suitable picture to represent a
definite number of units in which the variable is measured.
Example : Draw a pictorial diagram to present the following data
(number of students in a certain school for four years.)

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 43 / 105


Pie Chart

A pie chart is a circle that is divided into sections or wedges


according to the percentage of frequencies in each category of the
distribution.
Since there are 360o in a circle, the frequency for each class must
be converted into a proportional part of the circle. This conversion
is done by using the formula

f
Degrees = .360o
n

where f = frequency for each class and n = Σ f . The degrees


should sum to 360o .
Step 1: Find the percentage.
Step 2: Find the number of degrees for each class.
Step 3: Using a protractor and compass, graph each section and
write its name corresponding percentage(degrees).
In trodu ction to Statistics Set BY Addisu T. 5/1/2023 44 / 105
Can't
Example : Draw a suitable diagram to represent the following
population in a town.

Solution:

Pie chart for population in a town:

In trodu ction to
to Statistics
Statistics ()Set BY Addisu T. 5/1/2023 24, 2019
November 45/ / 105
3s
Graphical presentatian of data

Histogram ,
Frequency polygon and
Cumulative frequency graph or ogive are most commonly
applied graphical representation for continuous data.
Procedure for constructing statistical graphs:
Draw and label the X and Y axes.
Choose a suitable scale for the frequencies or cumulative
frequencies and label it on the Y axes.
Represent the class boundaries for the histogram or ogive and the
mid points for the frequency polygon on the X axes .
Plot the points.
Draw the bars or lines to connect the points.

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 46 / 105


Can't
Histogram: A graph which displays the data by using vertical bars
of various heights to represent frequencies. Class boundaries are
placed along the horizontal axes. Class marks and class limits are
some times used as quantity on the X axes.
Frequency Polygon: A line graph. The frequency is placed along
the vertical axis and classes mid points are placed along the
horizontal axis. It is customary to display the next higher and
lower class interval with corresponding frequency of zero, this is to
make it a complete polygon.
Example:
drawing a Histogram
Score Counts
30-39 1
40-49 0
50-59 9
60-69 15
70-79 7
80-89 1
90-99 1

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 47 / 105


Frequency polygan
Drawing frequency polygon , Find the midpoints of each class
Score Class boundaries Mid points Frequency
100-104 99.5-104.5 102 2
105-109 104.5-109.5 107 8
110-114 109.5-114.5 112 18
115-119 114.5-119.5 117 13
120-124 119.5-124.5 122 7
125-129 124.5-129.5 127 1
130-134 129.5-134.5 132 1

Example: Create a frequency polygon using the data

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 48 / 105


Ogive (cumulative frequency polygan):

A graph showing the cumulative frequency (less than or more than


type) plotted against upper or lower class boundaries respectively.

That is class boundaries are plotted along the horizontal axis and
the corresponding cumulative frequencies are plotted along the
vertical axis. The points are joined by a free hand curve.
Less than Ogive

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 49 / 105


Can't

More than Ogive

Cumulative Frequency
More than 99.5 50
More than 104.5 48
More than 109.5 40
More than 114.5 22
More than 119.5 9
More than 124.5 2
More than 129.5 1
More than 134.5 0

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 50 / 105


CH-3

Chapter Three
Measures of central tendency

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 51 / 105


3. Measures of central tendency
To make comparison between groups of numbers it is good to have
a single value that is considered to be a good representative of
each group. This single value is called the average of the group.
Averages are also referred to measures of central tendency.
Objectives:
1 To comprehend (understand) the data easily.
2 To facilitate comparison.
3 To make further statistical analysis.
Types of measures of central tendency
The various measures of central tendency are numbers that tell us
something about the location of a distribution's 'center'.
1 Arithmetic Mean (A.M)
2 Median
3 Mode
4 Geometric mean (G.M)
5 Harmonic mean (H.M)
6 Percentiles.
In trodu ction to Statistics Set BY Addisu T. 5/1/2023 52 / 105
Can't

a. A rithmetic mean

Mean is the simple arithmetic average of the observations.


It is calculated by dividing the total of all the observations by the
number of observations. In an ungrouped data, if x represents the
observations, and n the number of observations, then the mean is
given by µ or X̄.

X̄ =
X1 + X2 + ... + Xn
=
ΣX
n n
In the case of grouped distribution, mean is calculated assuming
that each observation in a class interval is equal to the midpoint of
that class interval.

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 53 / 105


Can't
Arithmetic Mean for raw data
Eg.1 Calculation of mean Serum Albumin levels of 24 pre-school
children
2.90 3.75 3.66 3.57 3.45 3.76 3.73 3.43
3.55 3.84 3.69 3.72 3.30 3.77 3.88 3.62
2.98 3.76 3.68 3.67 3.38 3.76 3.71 3.43

Total number of observations (n) = 24


Total of all the values, i.e. Σ x = 85.93
Therefore, mean x̄ = nΣx = 852493 = 3.58g% .

Some Properties of the Summation Notation


i. Σ ni=1 C = nC, where C is a constant number.
ii. Σ ni=1 bxi = b Σ ni=1 xi , where b is a constant number
iii. Σ ni=1 (a + bxi ) = n.a + b Σ ni=1 xi , where a and b are constant
numbers
In trodu ction to Statistics Set BY Addisu T. 5/1/2023 54 / 105
Can't
Arithmetic Mean for Ungrouped data
In a survey of 40 families in a village, the number of children per
family was recorded and the following data obtained.
1 0321 56221 03421 6321 53324223021 453344
1245
no of children(Xi ) frequency(fi ) Xi * fi
0 3 0
1 7 7
2 10 20
3 8 24
4 6 24
5 4 20
6 2 12
total 40 107

X1 *f1 + X2 *f2 + ... + Xk *fk Σk *fi


X̄ = = i=1 Xi = 107 = 2.68 ≈ 3
f1 + f2 + ... + fk Σi=1
k fi 40
IIn
n trodu
troduction
ction to
to Statistics
Statistics ()Set BY Addisu T. 5/1/2023 24, 2019
November 55/ / 105
4s
Can't
n n Σ Σ
iv. Σ ni=1(x i y )i = i=1 i
x i=1 yi
Arithmetic mean for grouped data
For the grouped data mean is given by
Σk
X̄ = i=1 cmi * fi
Σ ki=1 fi
where
k is number of class, f is the frequency,
cmi is the mid point of the class interval and
n is the total number of observations (n = Σ f ).
In the case of grouped distribution, mean is calculated assuming
that each observation in a class interval is equal to the midpoint of
that class interval.
This is because, once the data is grouped, one do not know the
actual values. So it is assumed that midpoint of the class interval
is their value.
In trodu ction to Statistics Set BY Addisu T. 5/1/2023 56 / 105
Can't

Eg.2 Calculation of mean of 50 observations.

Scores Mid points Frequency Cmi * fi


100-104 102 2 204
105-109 107 8 856
110-114 112 18 2016
115-119 117 13 1521
120-124 122 7 854
125-129 127 1 127
130-134 132 1 132
i=1 fi = 50
total 7 Σ7 Σ 7 fi * Cmi = 5710
i=1
Σi=fi = 2 + 8 + 18 + 13 + 7 + 1 + 1 = 50
1
Σ 7 f *Cm = 204 + 856 + 2016 + 1521 + 854 + 127 + 132 = 5710
i=1 i i

mean of 50 observations is 5710/50=114.20

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 57 / 105


Can't

Class work

marks frequency(f) classmidpoit(X) f*X


0-20 4
20-40 5
40-60 2
60-80 5
80-100 4
total 20

X̄ = Σ f *X =?
n

n = Σ f = 20

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 58 / 105


Can't

Properties of mean

i. Uniqueness: For a given set of observations, there is one and


only one mean.

ii. Simplicity: It is easily understood and easy to calculate.


iii. Let mean of n observations is µ: If each observation is added or
subtracted by constant 11k11 then mean of new observation is old
mean plus or minus 11k11 and if each observation multiplied by 11k11
then mean of new observation is old mean multiplied by 11k11

Extreme values therefore have an influence on the mean, and it


can sometimes distort the mean.

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 59 / 105


Can't
If a wrong figure has been used in calculating the mean, we can correct
if we know the correct figure that should have been used. Let
Xwr denote the wrong figure used in calculating the mean
Xc be the correct figure that should have been used
X̄ wr be the wrong mean calculated using Xwr , then the correct
mean,X̄correct , is given by

nX̄wr + Xc - Xwr
X̄correct =
n
Combined mean
If X¯1 is the mean of n1 observations
If X¯2 is the mean of n2 observations
...
If X¯k is the mean of nk observations
Then the mean of all the observation in all groups often called the
Σ k X̄ n
combined mean given by X¯ = X 1 n1+X 2 n 2 + +Xk k = i=1 i i
¯ ¯ ¯ . . . . . .

c nn1 +n2 + +n
...
k k
Σi=1 ni

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 60 / 105


Can't

Example: In a class there are 30 females and 70 male students. If


females average mark is 60 and average mark of male students is
72, find the mean of the entire class.
Solution: X̄1 = 60, n1 = 30 and X̄2 = 72, n2 = 70

X¯1n1 + X̄ 2 n 2 Σ 2i=1 X̄i ni


X̄c = = 2
n1 + n2 Σi=1 ni
30 * 60 + 70 * 72 6840
= = 68.40
30 + 70 100
Class work: Last year there were three sections taking CDA course
in Hawassa University. At the end of the semester, the three
sections got average marks of 80, 83 and 76. There were 28, 32
and 35 students in each section respectively. Find the mean mark
for the entire students.

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 61 / 105


Merits of Arithmetic
Mean

Arithmetic mean is rigidly defined a mathematical formula so that


its value is always definite.
It is calculated based on all observations.
Arithmetic mean is simple to calculate and easy to understand.

Arithmetic mean is also capable of further algebraic treatment.

It affords a good standard of comparison.

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 62 / 105


Drawbacks of Arithmetic
Mean
It is highly affected by extreme (abnormal) observations in the
series.
For instance, the monthly incomes of three boys are 37 birr, 53
birr and 48 birr and that of their father is 1026 birr. The average
income become for one of these four people becomes 219 birr
which is not at all a representative figure.
It can be a number which does not exist in the series.
It sometime gives such results which appear almost absurd. For
example it is likely that we can get an average of '3.6 children' per
family.
It gives greater importance to bigger items of a series and lesser
importance to smaller items. That means it is an upward bias
measure.
It can't be calculated for open-ended classes.

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 63 / 105


Median
The median is the midpoint of the data array.
Median is the middle-most item, when all observations are
arranged in order of magnitude.
It is the point, above and below which fall exactly 50 percent of
observations.
Median is a measure of location.
When there are even number of observations, median is the
average of the middle pair of observations.
We shall denote the median of X1, X2, ..., Xn by Xmd .
For raw data the median is obtained by

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 64 / 105


Can't

Example.1 The number of children with asthma disease during a


specific year in 7 local districts is shown. Find the median.
253, 125, 328, 417, 201, 70, 90
Solution:
• Sort in increasing order as
70, 90, 125, 201, 253, 328, 417
Since the number 201 is at the center of the distribution and the
number of observation is odd, to obtain the median we use
(Xn+1 ) th = ( 7+1
2
) th = 4th item is 201.
2

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 5s


65/ / 105
Can’t…

Example.2 Six customers purchased these numbers of magazines: 1,


7, 3, 2, 3, 4. Find the median.
Solution:
• order in increasing order 1,2,3,3,4,7 the number of items are 6 which
is even integer, since we use the formula 12 (X n + Xn+2 ) to obtain the
2 2
median value of the observations.
Xmd = 3 +4 = 3+3 = 3 Or we can calculate as
th th
2 2
3th + 12 (4th - 3 th) = 3 + 0.5(3 - 3)
= 3+0
=3

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 66 / 105


Median for Grauped Data
First we have to find median class, the median class is the class
that include ( N2 )th observation in its less than cumulative
frequency. Then the formula for calculating median for a grouped
distribution is

LCB - the lower class boundary of the median class,


n - the total number of observations,
F - the number of observations up to the median class (preceding
less than cumulative frequency of median class),
f - the frequency in the median class, and
w- the width of the interval of the median class.
The class interval that contains the median is called the median
class.
In trodu ction to Statistics Set BY Addisu T. 5/1/2023 67 / 105
Can’t…

Example.2 : Calculation of Median of 50 families.


Score Class boundaries Frequency LCF
100-104 99.5-104.5 2 2
105-109 104.5-109.5 8 10
110-114 109.5-114.5 18 28
115-119 114.5-119.5 13 41
120-124 119.5-124.5 7 48
125-129 124.5-129.5 1 49
130-134 129.5-134.5 1 50
total 50
Median class is 110-114. n = 50 f = 18 F = 10 w = 5
n/2 - F 50/2 - 10
median = LCB + [ ]w = 109.5 + [ ]5 = 113.666
f 18

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 68 / 105


Can’t…

Class work
marks frequency(f) LCF
0-20 4
20-40 5
40-60 2
60-80 5
80-100 4
total 20

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 69 / 105


Properties of median

i. Uniqueness: There is one and only one median for a given set of
observations.
ii. Simplicity: It is easy to calculate and easily understood.
iii. It is not drastically affected by extreme values , as is the
mean.
2 8 6 4 5 2 4 5 6 8, Mn = 5
2 80 6 4 5 2 4 5 6 80, Mn = 5
2 0.8 6 4 5 0.8 2 4 5 6, Mn = 4
0.2 8 6 4 5 0.2 4 5 6 8, Mn = 5

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 70 / 105


Can’t…
Merits of median
Median is a positional average and hence it is not influenced by
extreme values.
Arithmetic mean is rigidly defined a mathematical formula so that
its value is always definite.
Median can be calculated even in case of open-ended intervals.
It gives best result in a study of those phenomenon's which are
incapable of direct quantitative measurement. Ex: intelligence

Demerits of median
It is not capable of further algebraic treatment.
It is not a good representative of the data if the number of items
(data) is small.
The arrangement of items in order of magnitude is sometimes very
tedious process if the number of items is very large.
In trodu ction to Statistics Set BY Addisu T. 5/1/2023 71 / 105
Mode for Ungrauped Data

Mode is the most frequently occurring observation. i.e., around


this value the observations tend to be most heavily concentrated.
For a group of values such as 31, 33, 38, 43, 45 -there is no mode.
Where as some distributions have two modes.
Example.1 Calculation of mode Serum Albumin levels of 24
pre-school children
2.90 2.98 3.30 3.38 3.43 3.43 3.43 3.45 3.55 3.57
3.61 3.62 3.66 3.68 3.69 3.71 3.72 3.73 3.75
3.76 3.76 3.76 3.77 3.84 3.88
Modal serum Albumin levels were=3.43 and 3.76 both occur 3
times. The data set is said to be bimodal.

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 72 / 105


Mode for Grauped Data

∆1
mode = X̂ = LCBm + ( )w
∆1 + ∆2

LCBm = lower class boundary of the modal class,


w = width of the interval of the modal class.
∆1 = Frequency in the modal class - frequency in the preceding
class. Or ∆1 = fm - fp
∆2 = Frequency in the modal class - frequency in the succeeding
class. Or ∆2 = fm - fs

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 73 / 105


Can’t…
Example.2 : Calculation of Mode of this grouped data.
Score Class boundaries Frequency
100-104 99.5-104.5 2
105-109 104.5-109.5 8
110-114 109.5-114.5 18
115-119 114.5-119.5 13
120-124 119.5-124.5 7
125-129 124.5-129.5 1
130-134 129.5-134.5 1

Highest frequency is 18 = fm, so modal class is 110-114


LCBm = 109.5 and w = 5
∆1 = fm -fp = 18 -8 = 10
∆2 = fm -fs = 18 - 13 = 5
∆1 10 * 5 50
X̂ = LCBm + ( )w = 109.5 + = 109.5 + = 112.83
∆1 + ∆2 10 + 5 15
In trodu ction to Statistics Set BY Addisu T. 5/1/2023 74 / 105
Can’t…

Merits of mode

Mode is not affected by extreme values.


Mode can be calculated even in the case of open-end intervals.
And it is not necessary to know all observations.

Demerits of mode

Mode may not exist in the series and if it exists it may not be a
unique value.
It does not fulfill most of the requirements of a good measure of
central tendency
It may be unrepresentative in many cases.

In trodu ction to
to Statistics
Statistics ()Set BY Addisu T. 5/1/2023 24, 2019
November 75/ / 105
6s
Review

Mean, median and mode are measures of central tendency;


however, different kinds of distributions influence them differently.

The median is known as a measure of location.


We need not know all the values to calculate the median; if the
smallest value is made even smaller or the largest value even
larger, it would not change the value of the median.
Thus the median does not use all the information in the data and
so it can be shown to be less efficient than the mean, which does
use all values of the data.

A major disadvantage of the mean is that it is sensitive to


outlying points.

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 76 / 105


Can’t…

2, 4, 2, 0, 40, 2, 4, 3, 6. Σ x = 63 n = 9 Mean = 7 Median = 3


Mode = 2.

2, 4, 2, 0, 2, 4, 3, 6. Σ x = 23, n = 8, Mean = 2.88 Median = 2.5


Mode = 2
5, 5, 6, 6, 6, 8, 104, Mean is 20 which is higher than almost all
values, median is 6, however represents the central tendency.

2, 4, 6, 8, 100 Mean is 24 and median is 6.


2, 4, 6, 8, 10 Median is 6 and the mean is also 6, same as the
median.

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 77 / 105


Quantiles
Quantiles are values which divides the data set arranged in order of
magnitude in to certain equal parts. They are averages of position
(non-central tendency).
Some of these values of quantiles are quartiles, deciles and percentiles.
I. Quartiles:

are values which divide the data set in to four equal parts,
denoted by Q1 , Q2 and Q3.
The first quartile Q1 is also called the lower quartile and the third
quartile Q3 is the upper quartile.
The second quartile Q2 is the median.

Quartiles for Ungrouped data:

Let Qj be the j th quartile value for j = 1, 2, 3. Then


Qj = ( 4j (n + 1))th item far j = 1, 2, 3
In trodu ction to Statistics Set BY Addisu T. 5/1/2023 78 / 105
Can’t…

Quartiles for grouped data:

j . n4 - FQj
Qj = LQj + ( f Qj )w j=1,2,3 Where
Qj = the j th quartile which is to be worked out
LQj = Lower class boundary of the j th quartile class
FQj = Sum of frequencies of all classes lower than the j th quartile
class
fQj = Frequency of the j th quartile class and w= Class width

The j th quartile class is the class with the smallest cumulative


frequency greater tha n or equal to j . n4.
It can be located by counting of the frequencies beginning from the
lowest class.

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 79 / 105


II. Deciles:
are values dividing the data in to ten equal parts, denoted by D1, D2,
..., D9 . The fifth decile D5 is the median.
Deciles for Ungrouped data:

Let Dj be the j th Deciles value for j = 1, 2, 3, ...9. Then


Dj = ( 10j (n + 1)) th item far j = 1, 2, ..., 9
Deciles for grouped data:

n
j . 10 - FDj
Dj = LDj + ( fDj
)w j=1,2,...,9 Where
Define the symbols similar way as we did in the case of quartiles.
The j th Deciles class is the class with the smallest cumulative
frequency greater tha n or equal to j . n10.
It can be located by counting of the frequencies beginning from the
lowest class.
In trodu ction to Statistics Set BY Addisu T. 5/1/2023 80 / 105
III. Percentiles:
are values which divide the data in to one hundred equal parts,
denoted by P1, P2, ..., P99 . The fiftieth percentile D5 is the median.
Percentiles for Ungrouped data:

Let Pj be the j th Percentiles value for j = 1, 2, 3, ...99. Then


Pj = ( 100
j
(n + 1)) th item far j = 1, 2, ..., 99
Percentiles for grouped data:

n
j . 100 - FPj
Pj = LPj + ( fPj
)w j=1,2,...,99 Where
Define the symbols similar way as we did in the case of quartiles.
The j th Percentiles class is the class with the smallest cumulative
n .
frequency greater tha n or equal to j . 100
It can be located by counting of the frequencies beginning from the
lowest class.
In trodu ction to Statistics Set BY Addisu T. 5/1/2023 81 / 105
CH-4

Chapter Four
Measures of dispersion (Variation)

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 82 / 105


Can’t…

Measure of central tendency alone does not adequately describe a


set of observation unless all observations are the same. So we need
some additional information like

1. The extent to which the items in a particular distribution are


scatters around the central tendency i.e. measure of dispersion.
2. The direction of scattered ness whether more items are attached
towards higher or lower values i.e. measure of skew ness.

3. The extent to which the distribution is more peaked or more flat


toped than the normal distribution i.e. measure of kurtosis.

The degree to which numerical data tend to spread about an


average value is called dispersion or variation of the data.

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 83 / 105


Can’t…

Variation (dispersion) is the scatter or spread of observations


/values/ in a distribution.
Measures of variation are statistical measures, which provide ways
of measuring the extent to which the data are dispersed or spread
out.
Measures of variation are needed for the following basic
objectives.
To judge the reliability of a measure of central tendency
To compare two or more sets of data with regard to their variability
To control variability itself like in quality control, body
temperature, etc
To make further statistical analysis or to facilitate the use of other
statistical measures.

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 84 / 105


Properties af a good measure of dispersian

A good measure of dispersion should:


be rigidly defined by a mathematical formula,
be simple to understand and easy to calculate,
be unique,

be fundamental of all observations in the series,


not be affected by some extreme values existing in the series,

have sampling stability property, and


be capable of further algebraic treatment as well as further
statistical analysis.

In trodu ction to
to Statistics
Statistics ()Set BY Addisu T. 5/1/2023 24, 2019
November 85/ / 105
7s
Absolute and Relative Measures of Dispersian
i. Absolute measures of dispersion are expressed in the same unit
of measurement in which the original data are given.
These values may be used to compare the variation in two
distributions provided that the variables are in the same units and
of the same average size.
In case the two sets of data are expressed in different units,
however, such as quintals of sugar versus tons of sugarcane or if
the average sizes are very different such as manager's salary versus
worker's salary, the absolute measures of dispersion are not
comparable.
In such cases measures of relative dispersion should be used.
ii. A measure of relative dispersion is the ratio of a measure of
absolute dispersion to an appropriate measure of central tendency.
It is also called coefficient of dispersion because the word
coefficient represents a pure number (that is independent of any
unit of measurement).
Note: the value of a relative dispersion is unit less quantity.
to Statistics
In trodu ction to Statistics ()Set BY Addisu T. November
5/1/2023 24, 2019 79
86/ / 105
Types of Measures of Dispersian

1 . T h e Range and Relative Range


Range ( R ) is the difference between the largest and the smallest
obs in a dataset.
R = Xmax - Xmin
R = CMlast - CMfirst where CMlast and CMfirst are the class marks
of the last class and that of the first class respectively.
A relative range ( R R ) , also known as coefficient of range, is
given by

Xmax - Xmin R
RR = = ...for ungrouped data
Xmax + Xmin Xmax + Xmin

CMlast - CMfirst R
RR = = ...for grouped data
CMlast + CMfirst CMlast + CMfirst

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 s087/ / 105


Properties of Range and Relative Range
Range and relative range are easy to calculate and simple to
understand.
Both cannot be computed for grouped data with open ended
classes.
They do not tell us anything about the distn of values in the series.
EX. the distribution of the maximum loads supported by a certain
number of cables. Maximum load Number of
(in kilo-Newton) cables CMi
93 - 97 2
98 - 102 5
103 - 107 12
108 - 112 17
113 - 117 14
118 - 122 6
123 - 127 3
128 - 132 1
Find R= 35 and RR= 0.16
In trodu ction to Statistics Set BY Addisu T. 5/1/2023 s1 / 105
Variance, SD and Coefficient of Variatian

1. The Variance
is the arithmetic mean of the square of the deviation of
observations from their arithmetic mean.
Population Variance for ungrouped data
Σ(Xi -µ)2 1 (Σ Xi )2
a2 = = (ΣXi 2- )
N N N
Population Variance for grouped data
Σ fi (Xi -µ)2 1 (Σfi Xi -
N (Σ fi Xi )2 )
a2 = 2
N N

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 s2 / 105


Can’t…
Sample Variance (S2) for ungrouped data
Σ(Xi - X̄ )2 1 I:
Xi2 -
(Σ Xi )2
S2 = = ( )
n-1 n-1 n
Sample Variance (S2) for grouped data

S2 = Σ fi (Xi - X̄ )2 1 (Σ fi Xi )2 )
2
= - (
n-1 n - 1 Σf Xi
i
n
Where X̄ is the sample arithmetic mean, Xi is the class mark of
the i th class, fi is the frequency of the i th class and Σ fi = n.
2. The Standard Deviation
Standard deviation is the positive square root of the variance.

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 s3 / 105


Examples 1
8, 5, 7, 6, 9
X̄ = ΣnX i = 8+5+7+6+9
5
= 35
5
=7
Σ (X i X¯ ) 2 (8 7) +(
2
5 7) 2+(7 7) +(
2 6 7) 2+( 9 7) 2
S2 = n1
-

= 51

= (1) +( 2) +(0)4 +( 1) +(2)


2 2 2 2 2
= 1+4+0+1+4
4
= 10
4
= 2.5
OR we can calculate using shortcut formula as:
X X2
5 25
6 36
7 49
8 64
9 81
ΣX = 35 ΣX 2 = 255
- Σ ( Σn X )2 255 - 352/5 255 - 245 10
S2 = X 2 = = =
n-1 4 4 4
In trodu ction to Statistics Set BY Addisu T. 5/1/2023 24, 2019
November s4 / 105
Example2: for Variance & SD - Grauped data
EX. the distribution of the maximum loads supported by a certain
number of cables. X¯ = Σf X 6655 = 110.92
.

Σf 60

Maxload(KN) f CMi (X ) f.X (X - X̄)2 f .(X - X̄)2


93 - 97 2 95 190 253.45 506.90
98 - 102 5 100 500 119.25 596.25
103 - 107 12 105 1260 35.05 420.60
108 - 112 17 110 1870 0.85 14.45
113 - 117 14 115 1610 16.65 233.10
118 - 122 6 120 720 82.45 494.70
123 - 127 3 125 375 198.25 594.75
128 - 132 1 130 130 364.05 364.05

total Σ(X - X̄)2 Σf .(X - X̄)2


Σf = 60 Σ f .X = 6655 = 1070.00 = 3224.80

In trodu ction to Statistics Set BY Addisu T.


5/1/2023 s592/ / 105
Can’t…
3. Coefficient of Variation
The standard deviation is an absolute measure of dispersion. The
corresponding relative measure is known as the coefficient of
variation ( C V ) .

Coefficient of variation is used in such problems where we want to


compare the variability of two or more than two different series.

Coefficient of variation is the ratio of the standard deviation to the


arithmetic mean, usually expressed in percent.
S
CV = * 100

A distribution having less coefficient of variation is said to be
less variable or more consistent or more uniform or more
homogeneous.
In trodu ction to Statistics Set BY Addisu T. 5/1/2023 s6 / 105
Can’t…
Example: Last semester, the students of Biology and Chemistry
Departments took Stat 273 course. At the end of the semester, the
following information was recorded.
Department Biology Chemistry
Mean score 79 64
Standard deviation 23 11
Compare the relative dispersion of the two departments' scores.
Solution:

Interpretation: Since the CV of Biology Department students is


greater than that of Chemistry Department students, we can say
that there is more dispersion relative to the mean in the
distribution of Biology students' scores compared with that of
Chemistry students.
In trodu ction to Statistics Set BY Addisu T. 5/1/2023 s7 / 105
Properties of the Variance and the Standard Deviatian

Properties of the Variance

It removes most of the demerits or drawbacks of the measures of


dispersion discussed so far.

Its unit is the square of the unit of measurement of values. For


example, if the variable is measured in kg, the unit of variance is
kg2.

It is calculated based on all the observations/data in the series.


It gives more weight to extreme values and less to those which are
near to the mean.

In trodu ction to
to Statistics
Statistics ()Set BY Addisu T. 5/1/2023 24, 2019
November 95/ / 105
ss
Can’t…

Properties of the Standard Deviation

It is considered to be the best measure of dispersion.

Demerits: If the values of two series have different unit of


measurement, then we can not compare their variability just by
comparing the values of their respective standard deviations.

It is calculated based on all the observations. Standard deviation


is capable of further algebraic treatment.

Standard deviation is as such neither easy to calculate nor to


understand.

Similar to the variance, standard deviation gives more weight to


extreme values and less to those which are near to the mean.

In trodu ction to
to Statistics
Statistics ()Set BY Addisu T. 5/1/2023 24, 2019
November s996/ / 105
The Standard Scores (Z-Scores)
A standard score is a measure that describes the relative position
of a single score in the entire distribution of scores in terms of the
mean and standard deviation.
It also gives us the number of standard deviations a particular
observation lie above or below the mean.
Population standard score:
X -µ
Z=
a
Sample standard score:

X - X̄
Z=
S

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 97 / 105


Can’t…

Example: Two sections were given an exam in a course. The


average score was 72 with standard deviation of 6 for section 1 and
85 with standard deviation of 5 for section 2. Student A from
section 1 scored 84 and student B from section 2 scored 90. Who
performed better relative to his/her group?
Solution:

From these two standard scores, we can conclude that student A


has performed better relative to his/her section students because
his/her score is two standard deviations above the mean score of
selection 1 while the score of student B is only one standard
deviation above the mean score of section 2 students.

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 98 / 105


Moment, Skewness and Kurtosis/ Measure of Shape

1 . Moments
1. The r th moment is defined as:
¯ r = Σ (Xi )r , for ungrouped data , r = 1, 2, ...
X
n
¯r =
X Σ fi (Xi )r , for grouped data , r = 1, 2, ...
n
If r=1, it is the simple arithmetic mean, this is called the first
moment.
2. The r th moment about the mean (the r th central moment)
Denoted by Mr and defined as:
(X i X¯)
Mr = Σ
r

n
, for ungrouped data , r = 1, 2, ...
Σ f (X X¯) r
Mr = i
n
i
, for grouped data , r = 1, 2, ...
If r=2, it is population variance, this is called the second central
moment. If we assume n - 1 n, it is also the sample variance.

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 99 / 105


Can’t…

3. The r th moment about any number A. denoted by Mr and defined


as:
(X A) r
M r = Σ ni , for ungrouped data , r = 1, 2, ...

Mr = Σ fi (X i A) r
, for grouped data , r = 1, 2, ...
n
Remarks:
1. M0 = 1
2. M1 = 0

3. M2 ≈ 𝜎2
Example :
1. Find the first two moments for the following set of numbers 2, 3, 7
2. Find the first three central moments of the numbers in problem 1
3. Find the first three moment about the number 3 of the numbers in
problem 1

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 100 /


Measure of Skewness and Kurtasis (Shape)
Skewness is the degree of asymmetry or departure from symmetry
of a distribution.
A skewed frequency distribution is one that is unsymmetrical.
Skewness is concerned with the shape of the curve not size.
If the frequency curve (smoothed frequency polygon) of a
distribution has a longer tail to the right of the central maximum
than to the left, the distribution is said to be skewed to the right
or said to have positive skewness.
If it has a longer tail to the left of the central maximum than to
the right, it is said to be skewed to the left or said to have
negative skewness.
For moderately skewed distribution, the following relation holds
among the three commonly used measures of central tendency.
Mean - Mode = 3 *(Mean - Median)
In trodu ction to Statistics Set BY Addisu T. 5/1/2023 101 /
Skewness
It is the measure of the direction and degree of asymmetry.
Denoted by 𝛅𝟑
There are various measures of skewness.
1. The Pearsonian coefficient of skewness

𝛅𝟑 = mean - mode X̄ -X̂ X̄ -X


= = 3.( )
SD S S
2. The moment coefficient of skewness
M3 M3 M3
𝛅𝟑 = = 3
=
M2 (S2 )
3
2 2 S3
Note: The shape of the curve is determined by the vType
equation here.value of 𝛅𝟑
If 𝛅𝟑 = 0 then the distribution is symmetric
If 𝛅𝟑 > 0 then the distribution is positively skewed If
I f 𝛅𝟑 < 0 then the distribution is negatively skewed.
In trodu ction to Statistics Set BY Addisu T. 5/1/2023 102 /
Can’t…

Remarks:
In a positively skewed distribution, smaller observations are more
frequent than larger observations i.e. the majority of the
observations have a value below an average and it has a long tail
in the positive direction.

In a negatively skewed distribution, smaller observations are less


frequent than larger observations i.e. the majority of the
observations have a value above an average.

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 103 /


Example 1.

1. Suppose the mean, the mode, and the standard deviation of a


certain distribution are 32, 30.5 and 10 respectively. What is the
shape of the curve representing the distribution?
Solution: X̄ = 32, X̂ = 30.5 and S = 10 Since we can calculate δ3
using pearsonian method
𝛅𝟑 mean - mode X̄ - X̂ 32 - 30.5 1.5
= = = = = 0.15
SD S 10 10
Which is greater than 0 indicates that the distribution is right
skewed.

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 104 /


quize

For a moderately skewed frequency


distribution, the mean is 10 and the median is
8.5. If the coefficient of variation is 20%, find
the Pearsonian coefficient of skewness and the
probable mode of the distribution.

In trodu ction to Statistics 5/1/2023 105 /


Example 2.
2. For a moderately skewed frequency distribution, the mean is 10
and the median is 8.5. If the coefficient of variation is 20%, find
the Pearsonian coefficient of skewness and the probable mode of
the distribution.
Solution:
mean = 10, median = 8.5 and CV = 20% from the previous CV
can be: CV = XS¯.100 = 20% since

20 *X̄ 20 *10 200


S= = = =2
100 100 100
Mean -Mode = 3 *(Mean -Median) solve for mode and we get
mode = 3 *median -2 * mean = 3 *8.5 -2 *10 = 25.5 -20 = 5.5
mode = 5.5

∴ 𝛅𝟑 = 3( mean median
S
) = 3( 10 -2 85 ) = 15
.

2
= 0.75 +vely skewed
In trodu ction to
to Statistics
Statistics ()Set BY Addisu T. 5/1/2023 24, 2019
November 106
9s /
/ 105
Kurtasis

Kurtosis is the degree of peakdness of a distribution, usually


taken relative to a normal distribution.
The peakdness of a distribution be classified in to three:
1. Leptokurtic: -A distribution having relatively high peak.
- A large number of observations have same values
2. Mesokurtic: - Normal peak
- The curve is properly peak.
3. Platykurtic: - Flat toped
- A large number of observations have low frequency are spread
in the middle interval.

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 107 / 105


Measures of kurtasis:

It is a measure of peakdness. Denoted by δ4 and


given by:

Note: The peakdness depends on the value of c4


1. If 𝛅𝟒 > 3 then the curve is Leptokurtic.
2. If 𝛅𝟒 = 3 then the curve is Mesokurtic.
3. If 𝛅𝟒 < 3 then the curve is Platykurtic.

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 108 / 105


Can’t…

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 109 / 105


Example:
1. If the first four central moments of a distribution are:
M1 = 0, M2 = 16, M3 = -60 and M4 = 162
a. Compute a measure of skewness

b. Compute a measure of kurtosis and give your interpretation.


Solution: 𝛿4 = M
M 4 = 162 = 162 = 0.63
2 162 256
2
∴ 𝛿 4 < 3 indicates the curve is Platykurtic
2. If the standard deviation of a symmetric distribution is 10, what
should be the value of the fourth moment so that the distribution
is mesokurtic?
Solution: S = 10, δ3 = 0 and δ𝟒 =3
M M M M4
𝛿4 = 42 = 44 = 44 = =3
M2 S 10 10, 000
⇒ M4 = 3 * 10, 000 = 30, 000
In trodu ction to Statistics Set BY Addisu T. 5/1/2023 110 / 105
Can’t…

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 111 / 105


References

Bluman, A.G. (1995). Elementary Statistics: A Step by Step


Approach (2nd edition). Wm. C.Brown Communications, Inc.

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 112 / 105


the end

In trodu ction to Statistics Set BY Addisu T. 5/1/2023 113 / 105

You might also like