0% found this document useful (0 votes)
25 views106 pages

Probability and Statistics For Engineers

The document provides an overview of probability and statistics tailored for engineers, covering essential concepts such as data, populations, samples, and variables. It discusses the definitions, classifications, applications, and limitations of statistics, emphasizing its significance in various fields including scientific research, quality control, and decision-making. Additionally, it outlines methods for data collection, organization, and presentation, as well as different scales of measurement and types of data.

Uploaded by

zerishyero
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views106 pages

Probability and Statistics For Engineers

The document provides an overview of probability and statistics tailored for engineers, covering essential concepts such as data, populations, samples, and variables. It discusses the definitions, classifications, applications, and limitations of statistics, emphasizing its significance in various fields including scientific research, quality control, and decision-making. Additionally, it outlines methods for data collection, organization, and presentation, as well as different scales of measurement and types of data.

Uploaded by

zerishyero
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 106

PROBABILITY AND STATISTICS FOR

ENGINEERS

HARAMAYA UNIVERSITY
COLLEGE OF COMPUTING AND INFORMATICS
DEPARTMENT OF STATISTICS

MILLION WESENU(ASSIST.PROF.)

PROBABILITY AND STATISTICS


@ 2025
CHAPTER ONE
OUTLINE
 Some Basic Terms
 What is Statistics?
 Type of Statistics
 Application of Statistics
 Functions of Statistics
 Limitation of Statistics
 Variable and its Characteristics
 Scales of Measurements
 Methods of Data collection and presentation
Some Basic Terms
 Data: is collection of observed values representing one or more
characteristics of some objects.
 Population is the totality of set of subjects or things possessing certain
common characteristics that we are interested in studying. It is a collection of
all the units under investigation.
 Sample is consists of elements selected from a population with statistical
methods for the purpose of investigation and with the aim of estimating the
characteristics of the population.
 Variable: is any characteristic that assume different values.
 Sampling: is a statistical process in which one can select and examine a
sample instead of considering the whole population.
 Statistic is a characteristic or a fact about a sample or is a descriptive
measure computed from sample observation. The summary measure that
describes the characteristic of the sample is known as Statistic.
 Parameter A summary measure that describes any given characteristic of
the population.
What is Statistics?
Statistics has become an integral part of our daily lives.
The term ‘statistics’ is derived from the Latin word
status, meaning state, and historically statistics referred
to the display of facts and figures relating to the
demography of states or countries.
Generally, it can be defined in two senses:
A.Plural sense (as statistical data)
B.Singular sense (as statistical methods).
PLURAL SENSE
Statistics are collection of facts (figures).
For example: sales statistics, employment or
unemployment, accident, weather, death, education,
e.t.c.
In this sense the word Statistics serves simply as data.
But not all numerical data are statistics.
In order for the numerical data to be identified as
statistics, it must possess certain identifiable
characteristics.
Some of These Characteristics Are Described As Follows:
I. Statistics are aggregate of facts.
Single or isolated facts or figures cannot be called statistics as these
cannot be compared or related to other figures within the same
framework (has no baseline comparison).
Example: a customer officer at bank earned birr 28,000 per month.
It would not considered as statistics!
However, if the person says : the average salary of a customer officer at
bank is 28,000 birr per month.
This would be considered as statistics since the average salary has
been computed from many related figures such as monthly salary of
customer officers.
II. Statistics, Generally, Are Not The Outcome Of A Single
Cause But Affected By Multiple Causes.
There are a number of forces working together that affect
the facts and figures.
For example: The drop out rate of female students 10%
over the last year in HU.
 What are the factors that influence the drop out?
III. Statistics are numerically expressed
 All statistics are stated in numerical figures only.
 Qualitative statements cannot be called statistics.
 E.g Helen is very tall.
 Ethiopia is developing country.
IV. Statistical Data Are Collected In A Systematic Manner For
Predetermined Purpose.
The purpose and objective of collecting pertinent data must be clearly
defined, decided upon and determined prior to data collection.
 Also the procedures for collecting data should be predetermined and
well planned.
 These would facilitate the collection of proper and relevant data.

V. Statistics are enumerated or estimated according to reasonable standard


of accuracy.
There are basically two ways of collecting data.
1. An actual counting or measuring, which is the most accurate way.
2. collecting data by estimation and is used in situations where actual
counting or measuring is not feasible or where it involves
prohibitive costs.
SINGULAR SENSE:
Statistics is the science that deals with the methods of data collection,
organization, presentation, analysis and interpretation of data.
 It refers the subject area that is concerned with extracting
relevant information from available data with the aim to make
sound decisions.
 According to this meaning, statistics is concerned with the
development and application of methods and techniques for
collecting, organizing, presenting, analyzing and interpreting
statistical data.
The five statistical stages of investigation are:
 Data collection
 Data organization
 Data presentation
 Data analysis
 Interpretation of results.
Classification of Statistics
Based on the scope of the decision, statistics can be classified into two;
Descriptive and Inferential Statistics.
1. refers to the procedures used to organize and
Descriptive Statistics
summarize masses of data.
It is concerned with describing or summarizing the most
important features of the data.
It deals only the characteristics of the collected data without
going beyond it.
The methodology of descriptive statistics includes the methods of
organizing, presenting data and calculations of certain indicators
of data like Measures of Central Tendency and Measures of
Dispersion which summarize some important features of the data.
E.g. The average age of the students in this class is 21 years.
2. Inferential Statistics
Inferential Statistics includes the methods used to find out something about a
population, based on the sample.
It is concerned with drawing statistically valid conclusions
about the characteristics of the population based on information
obtained from sample.
In this form of statistical analysis, inferential statistics is linked
with probability theory in order to generalize the results of the
sample to the population.
Performing hypothesis testing, determining relationships
between variables and making predictions are also inferential
statistics.
Examples:
At least 5% of the killings reported last year in city X were due to terrorists.
Applications of Statistics
Today, Statistics is applied in almost all fields of human endeavor.
 In Scientific Research: Statistics is used as a tool in a scientific
research. Statistical formulas and concepts are applied on a data which are
results of an experiment.
 In Quality Control: Statistical methods help to check whether a product
satisfies a given standard.(especially, in industry)
 For Decision Making: statistics helps to enhance the power of decision
making in the face of uncertainty by providing sufficient information.
 In Public Health and Medicine: statistical methods are used for
computation and interpretation of birth and death rates.
 In Economics: for modeling functional relationships between or among
variables
 In Natural and Social Sciences, Business, Planning, Behavior
Sciences, etc.
 In computer science and Engineering, Data sciences, machine
learning and AI ,etc.
Statistics Has Significant Role In All Other Sciences!
Here is another appl,
CONT’D
 Computer scientist want people
..
who can formulate key
questions in order to help
surface the insights needed the
daily chaos.
 Statisticians wants to manage
big data which is the greatest
challenges of today’s
statisticians.
Uses of Statistics
I. Condenses and summarizes masses of data and presents
facts in numerical and definite form
II. Facilitates comparison: statistical devises such as averages,
percentages, ratios, e.t.c. are used for this purpose.
III. Formulating and testing hypothesis: For instance,
hypothesis like whether a new medicine is effective in curing
a disease, whether there is an association between variables
can be tested using statistical tools.
IV. Forecasting: Statistical methods help in studying past data
and predicting future trends.
V. Also it has the functions for Precision to the Facts, Policy
Making, It Enlarges Knowledge and To Measure
Uncertainty
What Are The Differences Between Application and Uses In Statistics?
Application Uses
 The application of statistics is the  The uses of statistics are
use of statistical methods to solve the various sectors where
problems. statistics is applied.
 The application is how the service  The use describes the
or products works in different benefits gained from a
fields and how it is used to service or product.
produce the benefits.  'Use' is a more general
 The application of statistics is term referring to the act
integral to solving complex of making use of
problems and making informed something.
decisions in a data-driven world.
 It refers specific uses or
implementation.
Limitations of Statistics
I. It cannot deal with a single observation; rather it deals
aggregate of facts.
II. Statistical methods are not applicable to qualitative
character i.e. it deals with quantitative characteristics
or limited applicability
III. Statistical results are true on average; i.e. for the
majority of case. Laws of statistics are not universally
true like the laws of physics, chemistry and
mathematics.
IV. Statistics are liable to be misused or misinterpreted. This
may be due to incomplete information, inadequate and faulty
procedures during data collection and sample selection and
mainly due to ignorance (lack of knowledge).
CONT’D
IV. Requires expertise: Statistical methods are most dangerous
when used by people without expertise.
V. Validity depends on assumptions: Statistical results are only
valid under certain assumptions.
VI. Less exact than natural sciences: Statistics is less exact than
natural sciences like chemistry and physics.

VII. Mean limitations: The mean cannot be calculated for categorical


data, and it is influenced by outliers and skewed distributions.
VIII. It does not depict entire story of phenomenon: When even
phenomena happen, that is due to many causes, but all these causes
can not be expressed in terms of data. So we cannot reach at the
correct conclusions.
Variable and its Characteristics
Variable is any phenomenon or an attribute that can assume
different values.
 The most important single distinguishing feature of a variable
is that it varies; that is, it can take on different values.
 Based on the values that variables assume, variables can be
classified as Qualitative and Quantitative variables.
I. Qualitative variables: A qualitative variable has values that
are intrinsically non-numerical (categorical).
Example: Gender, Religion, Color of automobile, etc.
II. Quantitative variables: A quantitative variable has
values that are intrinsically numerical.
Example: Height, Family size, Weight, etc.
Quantitative Variables (Discrete And Continuous).
Discrete variable: takes whole number values and consists of
distinct recognizable individual elements that can be counted.
 It is a variable that assumes a finite or countable number of
possible values. These values are obtained by counting (0, 1,
2, ...).
E.g: Family size, No. of children in a family, no.of cars at the traffic
light.
 Continuous variable: takes any value including decimals. Such
a variable can theoretically assume an infinite number of
possible values. These values are obtained by measuring.
Example: Height, Weight, Time, Temperature, etc.
Scales of Measurement /levels of measurement

The level of measurement is one way in which variables can be classified.


 Broadly, this relates to the level of information content implicit in the
set of values and how each value may be interpreted (mathematically)
relative to other values on the variable
 - it is an issue which dictates how the variable can be used and
interpreted in statistical analysis.
The main purpose of using level of measurement are:
 It shows the information contained in the value of a variable.
 shows also that what mathematical operations and what statistical
analysis are permissible to be done on the values of the variable.
The Four Types of Scales of Measurement
Nominal Scales of variables are those qualitative variables which show
category of individuals.
 They reflect classification in to categories (name of groups) where
there is no particular order or qualitative difference to the labels.

 Numbers may be assigned to the variables simply for coding purposes.


 It is not possible to compare individual basing on the numbers assigned
to them.
 The only mathematical operation permissible on these variables is
counting. These variables
 Have mutually exclusive (non-overlapping) and exhaustive categories.
 no ranking or order between (among) the values of the variable.
 Example: Gender (Male, Female), Political Affiliation (Labour,
Conservative,Liberal), Ethnicity (White, Black, Asian, Other), etc.
CONT’D
Ordinal Scales of variables are also those qualitative variables
whose values can be ordered and ranked.
 Ranking and counting are the only mathematical operations to
be done on the values of the variables.
 But there is no precise difference between the values
(categories) of the variable.
 Example: Academic Rank (BSc, MSc, PhD), Grade Scores (A,
B, C, D, F), Strength (Very Weak, Week, Strong, Very Strong),
Health Status (Very Sick, Sick, Cured), Economic Status
(Lower Class, Middle Class, Higher Class), etc.
CONT’D
Interval Scales of variables are those quantitative variables when
the value of the variables is zero which does not show absence of
the characteristics i.e. there is no true zero.
Zero indicates lower than empty.
For example, for temperature measured in degrees Celsius, the
difference between 5. and 10 is treated the same as the difference
between 10. and 15.
However, we cannot say that 20 is twice as hot as 10., i.e. the
ratio between two different values has no quantitative meaning.
This is because there is no absolute zero on the Celsius scale; 0.
not imply ‘no heat’. Another example: SAT Score, exam score, IQ
etc.
 All mathematical operation are allowed except division!
CONT’D
Ratio Scales of variables are those quantitative variables when
the values of the variables are zero, it shows absence of the
characteristics.
Zero indicates absence of the characteristics.
 All mathematical operations are allowed to be operated on the
values of the variables.
Example: weight, height, time, income, consumption, salary,
expenditure, price, …etc.
TYPES OF DATA
There are two types of data based on the source of utilization:
Primary data are data collected for the first time either through
direct observation or by enquiring individuals. It refers to the data
collected either by or under the direct supervision and instruction
of the researcher.
Secondary data are data obtained from published or unpublished
sources like newspapers, journals, official records, e.t.c.
Based on the role of time, data can be classified as Cross-
sectional and Time series.
Cross-sectional Data: is a set of observations taken at a
point of time.
Time series Data: is a set of observations collected for a
sequence of time usually at equal intervals.
CONT’D
Based on the values of variable ,data can be classified as
Qualitative data and Quantitative data.
Qualitative data are measures of quality, type and may be
represented by name, symbol or a number code.
 It is non-numeric , conceptual and interpretation based.
Quantitative data are information regarding quantities and
measureable in numeric variables(e.g.how much, how many, how
often).
 It can be counted, measured and expressed using numbers.
Research design based on qualitative data is Qualitative
Research whereas Quantitative is Quantitative Research.
Data Collection
The first and foremost task in statistical investigation is data
collection.
Before data collection, four important points should be
considered.
I. The purpose of data collection (why we need to collect
data),
II. The data to be collected (what kind of data to be collected),
III. The source of data (where we can get the data) and
IV. the methods of data collection (how can we collect this
data).
These steps are called the why, what, where and how of the data
collection.
Methods of Data Collection
Primary data are collected from primary sources and secondary
data from secondary sources.
 Primary data can be collected through experimental methods
in laboratory in natural sciences and through survey method in
social sciences.
 The survey methods of data collection are:-
A. personal interview,
B. telephone interview,
C. mailed questionnaire and
D. personal observation.
Data Organization and presentation
In order to describe situations, draw conclusions or make
inferences about the population even to describe the sample, the
collected data must organize into some meaningful way.
 The most convenient way of organizing data is to construct a
frequency distribution.
 Frequency distribution is the organization of raw data in table
form, using classes and frequencies.
Basic terms in Frequency distribution: -
Class: is a description of a group of similar numbers in a data set.
Frequency: is the number of times a variable value is repeated.
Class frequency: the number of observations belonging to a
certain class.
TYPE OF FREQUENCY DISTRIBUTION
There are three types of frequency distributions;
1. Categorical ,
2. Ungrouped (discrete or frequency array) and
3. Grouped (continuous) frequency distributions.
Categorical FD:-a FD in which the data is qualitative i.e. either
nominal or ordinal.
Each category of the variable represents a single class and the
number of times each category repeats represents the frequency of
that class (category).
Best to organized qualitative data in qualitative research
methods!
EXAMPLE FOR CATEGORICAL FD
The blood type of 24 students is given below;
A B B AB O A
O O B AB B A
B B O A O AB
A O O O AB O
Solution:-
Class(Blood type) Frequency(number of students)

A 5
B 6
AB 4
O 9
Total 24
Ungrouped FD (Frequency Array)
A FD of numerical data (quantitative) in which each
value of a variable represents a single class (i.e. the
values of the variable are not grouped) and the number
of times each value repeats represents the frequency of
that class.
Eg:-Number of children for 21 families.
2 3 5 4 3 3 2
3 1 0 4 3 2 2
1 1 1 4 2 2 2
Grouped (Continuous) FD
 A frequency of numerical data in which several values of a
variable are grouped into one class.
 The number of observations belonging to the class is the
frequency of the class.
 It used continuous variable values either ratio or interval.
Basic Terms In Grouped FD
Class Limits(CL): the lowest and highest values that can be
included in a class are called class limits.
The lowest values are called lower class limits(LCL) and the
highest values are called upper class limits(UCL).
Class Boundaries: are class limits when there is no gap
between the UCL of the first class and the LCL of the second
class.
The lowest values are called lower class boundaries(LCB) and the
highest values are called upper class boundaries (UCB).
Class Width: the difference between UCB and LCB of a class.
It is also the difference between the lower limits of two
consecutive classes or it is the difference between upper limits of
two consecutive classes.
CONT’D
Class Mark: is the half way between the class limits or the class boundaries.
Relative Frequency(RF) or Absolute Frequency(AF)
It is a summary table in which the original data is condensed into groups
and their frequencies, which is called AF distribution.
 But if a researcher would like to know the proportion or percentage
of cases in each group, instead of simply, the number of cases, s/he
can do so by constructing a relative frequency distribution table.
 The RF distribution can be formed by dividing the frequency in
each class of the frequency distribution by the total number of
observation.
 Percentage frequency distribution=RF*100.
 The RFs are particularly helpful when comparing two or more
frequency distributions in which the numbers of cases under
investigation are not equal.
Cumulative Frequency
The above RF/PF distributions does not tell us directly the total
number (percentage) of units that lie below or above the
specified values of the classes.
A cumulative frequency distribution displays the total
number of observations above (below) a certain value.
When the interest of the investigator focuses on the number of
items below a specified value, then this specified value is the
upper boundary of the class and is known as less than
cumulative frequency distribution (LCF).
Similarly, when the interest lies in finding the number of cases
above a specified value, then this value is taken as the lower
boundary of the specified class and is known as more than
cumulative frequency distribution(MCF).
Construction of Grouped FD
EXAMPLE
Consider Mark of 50 students out of 40
16 21 26 24 11 17 25 26 13 27 24 26 3 27 23 24 15 22 22
12 22 29 18 22 28 25 7 17 22 28 19 23 23 22 3 19 13 31 23
28 24 9 20 33 30 23 20 8 21 24
Construct grouped frequency distribution.
Properties of Classes (Class Boundaries)
i. Complete and non-overlapping:
Complete- it should include all the data set.
Non-overlapping- no data should belong to two classes
ii. Clear and properly set: The W and K should be calculated
properly and W should be the same for all classes.
iii. Standardized: A class should follow logical and chronological
(increasing) order. The number of classes should be in between 5
and 20 i.e. 5≤K≤20. K depends on N. the larger the N the more the
K.
vi. Continuous: Even if there are no values in a class the class
must be included in the frequency distribution.
Advantages And Disadvantages Of Frequency Distributions
Advantages
 It condenses a large mass of data in to a comparatively small table.
 It attracts the attention of even a layman and gives him an insight into
the nature of the distribution.
 It helps for further statistical analysis, like central tendency, scatter,
and symmetry … of the data.
Disadvantages
 the identity of the observations is lost. We know only the
number of observations in a class and don not know what the
values are.
 Because the selection of the class width and the lower class limit
of the first class are to a certain extent arbitrary, different
frequency distributions may be constructed for the same data
and hence may give contradictory impressions
Data Presentation:-Graphic Display of Data
Bar Chart:
o It is the simplest and most commonly used diagrammatic
representation of a frequency distribution.
o It is the most common presentation for nominal, categorical or
discrete data.
o It uses a serious of separated and equally spaced bars.
o The heights of the bars represent the frequency or relative
frequency of the classes.
o But the width of the bars has no meaning; however, all the bars
should be the same width to avoid distortion. And also the bars
are separated by constant distance.
The Three Types of Bar Chart
i. Simple Bar Chart: is a diagram in which categories of a variable are
marked on the X axis and the frequencies of the categories are
marked on the Y axis.
 It is applicable for discrete variables, that is, for data given
according to some period, places and timings.
 These periods and timings are represented on the base line (X
axis) at regular interval and the corresponding frequencies are
represented on the Y-axis.
 The width of the bar represents nothing (it is meaningless), but it
should be equal for all bars.
 Each bar is separated by an equal space.
 It can also represent some magnitude (on the Y axis) over time,
space, groups, etc (on the X axis).
CONT’D
ii. Component Bar Chart
Component Bar Chart: is used when there is a desire to show a total or
aggregate is divided into its component parts.
 The bars represent total value of a variable with each total
broken into its component parts and different colors are used
for identification.
 In such type of diagrams, a bar is subdivided into parts in
proportion to the size of the subdivision.
 These subdivided rectangles are shaded differently by lines, dots
and colors so that they will be very easy to compare the
components.
 For making meaningful comparisons, the components of the
attributes are reduced to percentages.
 In that case each attribute will have 100 as its maximum volume.
This sort of component bar chart is known as percentage bar chart.
Cont’d
iii. Multiple Bar Chart
Multiple Bar Chart: is used to display data on more than one
variable. In the multiple bars diagram two or more sets of inter-
related data are interpreted.
CONT’D
Pie-chart
Pie chart is popularly used in practice to show percentage break
down of data.
 It is a circle representing a set of data by dividing the circle
into sectors proportional to the number of items in the
categories or it is a circle representing the total, cut into slices
in proportional to the size of the parts that make up the total.
 It gives the proportional sizes of different data groups as slice
of a pie or a circle.
CONT’D

Histogram
Histogram is the most common graphical presentation of a frequency
distribution for numerical data.
It uses a series of adjacent bars in which the width of each bar represents
the class width and the heights represent the frequency or RF of the class.
It is used for grouped data in which the class boundaries are marked on the
X axis and the frequencies are marked along the Y axis.
Example: Construct A Histogram To The
Following Grouped Data.
Frequency Polygon and Ogive curve
Frequency Polygon is a graph that consists of line segments
connecting the intersection of the class marks and the frequencies
of a continuous frequency distribution.
 It can also be constructed from histogram by joining the mid-
points of each bar.
Cumulative Frequency Curves (Ogive) As there are two
cumulative frequency distributions, there are two ogive
(pronounced as“oh-jive”) curves.
 These are the less than cumulative frequency which is a line
graph joining the intersection points of the upper class
boundaries and their corresponding less than cumulative
frequencies and the more than cumulative frequency which is a
line graph joining the intersection points of the lower class
boundaries and their corresponding more than cumulative
frequencies.
Cumulative Frequency
CHAPTER -2
SUMMARIZING OF DATA
2.1. Measures of Central Tendency
A single value which can be considered as a typical or representative
of a set of observations and around which the observations can be
considered as centered is called an ‘Average’ (or average value or
center of location).
Since, such a typical values tend to lie centrally within asset of
observations when arranged according to magnitudes; averages are
called Measures of Central Tendency.
 Objectives of Measures of Central Tendency (MCT)
1. To condense a mass of data in to one single value.
2. To facilitate comparison. Statistical devices like averages,
percentages and ratios used for this purpose.
Types of Measure of Central Tendency
There are many types of measures of central tendency, each possessing particular
properties and each being typical in some unique way. The most frequently
encountered ones are
I. Computed averages: Mean (Arithmetic Mean. Geometric Mean and
Harmonic Mean)
II. Positional averages: Median and Quantiles (Quartiles, Deciles,
Percentiles)
III. Mode
Desirable properties of good Measures of central tendency
A measure of central tendency is good or satisfactory if it possesses
the following characteristics.
1. It should be calculated based on all observations.
2. It should not be affected by extreme values. It should be as close
to the maximum number of observed values as possible.
The uses of Summation Notation
.
The Mean and its properties
a. Arithmetic mean(AM)
Simple Arithmetic Mean:-is the sum of all observations divided by total number
of observations.
Weighted Arithmetic Mean:
 While calculating the simple arithmetic mean we had given
equal importance to all values.
 But there are cases where the relative importance is not the
same for all items.
 When this is case, it is necessary to assign them weights (i.e.
relative importance) and then calculate a weighted arithmetic
mean.
 Let X1,X2,…,Xn be the values and W1,W2,…,Wn be the
corresponding weights then the weighted arithmetic mean
denoted by is given by
Properties Of Arithmetic Mean
The algebraic sum of the deviations of each value from the arithmetic mean
is zero. That is =0
The sum of the squares of the deviations from the mean is less
than the sum of the squares of the deviations about the other
score in the distribution. That is ≤ , A≠
If a constant C is added or subtracted from each value in
a distribution, then the new mean will be
= ⏈C respectively.
If each value of a distribution is multiplied by a constant C, the
new mean will be the original mean multiplied by C.
Arithmetic mean is affected by extreme values.
EXAMPLES
1. Find the arithmetic mean of A) 1, 2, 3, 4, 5. B) 1, 2,
3, 4, 100. Is there a great difference between the mean
of A and that of B?
2. A teacher attaches 2 to Quiz, 3 to Mid-term and 5 for
Final exam. If a student gets 90, 50 and 60 for Quiz,
Mid-term and Final-exam respectively, what is his/her
average academic performance
3. The mean weight of 50 women workers in a factory is
48 kg. The mean weight of 75 men working in the
same factory is 58 kg. Find the mean weight of all
workers in the factory.
Geometric Mean
.
CONT’D
If the variable values are measures as ratios, proportions or
percentage and some values are larger in magnitude and others are
small, then the geometric mean is a better representative of the
data than the simple average.
 In a “geometric series”, the most meaning full average is the
geometric mean.
The disadvantage of GM is that it cannot be calculated if one or
more observations are zero or negative. It is also affected by
extreme values but not to the extent of AM.
That means less affected by extreme values than AM.
EXAMPLES
1. Find the geometric mean of A) 1, 2, 3, 4, 5. B) 1, 2, 3, 4, 100. Is

there a great difference between the GM of A and that of B?

2. The price of a commodity increased by 5% from 1989 to 1990,

8% from 1990 to 1991 and by 77% from 1991 to 1992. Find the

average price increase.


3. A machine depreciated by 10% each in the first two years and
by 40% in the third year. Find out the average rate of
depreciation.
HARMONIC MEAN
Harmonic Mean is another specialized average which is useful in
averaging variables expressed as rate per unit of time, such as
speed, number of units produced per day.
 It is the reciprocal of the arithmetic mean of the numbers.

This is for raw data(individual series data)


CONT’D
EXAMPLES
1. A driver traveled 400 km per day for three days at a speed of
60, 50 and 40 kilometers per hour. Find the average speed of
the driver.
2. A student reads the first 100 pages of a book at a rate of 5
pages per hour, the next 100 pages at a rate of 8 pages per
hour. What is the student’s average reading speed?
3. Suppose a train moves 100 km with a speed of 40 km per
hour, then 150 km with a speed of 50 km per hour and the
next 135 km with a speed of 45 km per hour. Calculate the
average speed of the train.
4. In a factory a mechanic takes 15 days to fabricate a machine,
the second mechanic takes 18 days, the third takes 30 days
and the fourth takes 90 days. Find the average number of
days taken by the workers to fabricate the machine.
Median And Other Measure Of Position
Median is the half way point in a data set.
 It divides a data set into two equal parts such that half of the
numbers have a value less than the median and have will have
values greater than the median.
 Graphically median is the intersection of the less than and more
than cumulative frequency curves.

The median of a set of n observations X1,X2,…,Xn arranged in


ascending order of magnitude is the middle value if n is odd or the
arithmetic mean of the two middle values if n is even.
HOW TO COMPUTE ……………..?
.
Examples For Raw Data
1. Find the median of the following data sets.
a. 180, 201, 220, 191, 219, 209 and 220.
b. 62, 63, 64, 65, 66, 66, 68 and 78.
c. 3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3.
2. Given that three of five values, five of four and six
of two values. Find median values?
OTHERS POSITIONAL MEASURES
The median of a set of data divides a given data set into two
equal parts; there are also measures that divide a given data set
in to more than two equal parts. These measures are collectively
known as Quantiles. Quantiles include quartiles, deciles and
percentiles.
Quartiles: are values that divide a dataset into four equal
parts. These values are denoted by Q1, Q2 and Q3 such that
25% of the data fall below Q1, 50%below Q2 and 75% below
Q3.
Deciles: are values that divide the data into ten equal parts.
These values are denoted by D1, D2, …, D9 such that 10% of
the data fall below D1, 20% below D2, …, 90% below D9.
 Percentiles: are values that divide a dataset into 100 equal
parts. These values are denoted by P1, P2, …, P99.
04/27/2025
METHODS OF CALCULATION QUANTILES

04/27/2025
FOR GROUPED FREQUENCY DISTRIBUTION

04/27/2025
EXAMPLES
1. Given the data: 420,430,435,438,441,449,490,500,510 and 515.
find
a) all the quartiles
b) The 1st and 7th deciles
c) The 40th and 75th percentiles
2. Calcuate all quartiles, the 5th and 8th deciles, the 30th and 90th percentiles for the
students score data below.

04/27/2025
SOLUTIONS

04/27/2025
SOLUTIONS…..CONT..

04/27/2025
SOLUTION 2

04/27/2025
CONT….2

04/27/2025
CONT…3

04/27/2025
SOLUTION

04/27/2025
MODE AND ITS PROPERTIES
The mode is the most frequently occurring
value in a set of observations
or it is the value with the highest frequency.
A data set may have one mode (uni-modal), two
modes (bi-modal) and multimodal or no mode.
Good measure for qualitative variables values.
Ungrouped (individual series): Arrange the data in
ascending order and take the value appearing most
frequently (the most frequent value).

04/27/2025
CONT---
Grouped (continuous) series: In a frequency distribution,
the mode is located in the class with highest frequency
and that class is the modal class.

Properties of Mode
It is simple to calculate and easy to determine.
It is not based on all observations.
The mode can be used for both qualitative and
quantitative data types.
Mode is not affected by extreme values.
It is calculated for open ended class.
04/27/2025
From Previous Example

04/27/2025
Chapter -3
Measures Of Variation/Dispersion
In Measure Of Central Tendency, You Understand That:-
 Median is A Positional Average And Has Nothing To Do
With The Variability Of The Observations In A Data Set.
 Mode is The Largest Occurring Value Independent Of The
Other Values In The data Set.
 This Leads us To Conclude That A MCT Is Not Enough
To Have A Clear Idea About The Data Unless All
Observations Are The Same.
 Moreover Two Or More Data Sets May Have The Same
Mean Or Median But They May Be Quite Different. So
MCT Alone Do Not Provide Enough Information About
The Nature Of The Data.
04/27/2025
CONT…
 Due To This Reason, Measure Of Variation Will Be
Employed To Know The Extent Of Scatterdness Of
Value Around The Measures Of Central Tendency.
 Thus Measure Of Dispersion Tells Us The Extent To
Which The Values Of A Variable Vary About The
Measure Of Central Tendency.
 Therefore, measure of dispersion deals with the
variability of the data set when the observations are
different either is size or units.

04/27/2025
OBJECTIVES OF MEASURES OF VARIATION
 To have an idea about the reliability of the measure of
central tendency.
 To compare two or more sets of data with regard to their
variability.
 To provide information about the structure the data.
 To pave way to the use of other statistical measures.

04/27/2025
Types of Measures of Variation
There are two types of measures of variation.
1. Absolute measures of variation: It is said to be an absolute form
when it shows the actual amount of variation of an item from a
measure of central tendency and are expressed in concrete units
in which the data have been expressed.

For Example:- Range, variance, mean deviation, quartile


deviation, standard deviation, etc.

04/27/2025
CONT…
2. Relative measure of variation: It is the quotient obtained by
dividing the absolute measure by a quantity in respect to which
absolute deviation has been computed.

Relative measure of variation is a pure number and used for


making comparisons between different distributions.

 For instance, coefficient of range, coefficient of variation,


coefficient of mean deviation, standard score, coefficient of quartile
deviation, etc.

04/27/2025
Range And Relative Range
Range is the simplest and crudest measure of
dispersion. Range is defined as the difference between
the largest and the smallest values in the data.
Range hardly satisfies any property of good measure
of dispersion as it is based on two extreme values only,
ignoring the others.
It is not liable to further algebraic treatment.
Range for raw (Ungrouped) Data: R=maximum-
minimum or R=L-S

04/27/2025
Range… Cont…
Grouped Data: R=UCLlast-LCLfirst or CMlast-Cmfirst or UCBlast-LCBfirst or

R=WxK ,

Range has concrete units.


The relative measure of range is Coefficient of Range (CR).

04/27/2025
Quartile Deviation and Coefficient of Quartile Deviation
Quartile deviation is sometimes known as Semi-Interquartile Range
(SIR). The interquartile range is Q3 − Q1.
Thus, QD=
The corresponding relative measure of variation, coefficient of
quartile deviation is:
CQD=

04/27/2025
Mean Deviation and Coefficient of Mean Deviation
Mean deviation is a better measure than range and
quartile deviation.
Mean deviation is the arithmetic mean of the absolute
x
values of the deviation from some measures of central
tendency usually the mean and the median of a
distribution.
Hence we have mean deviation about the mean and mean
deviation about the median.
Mean deviation is always zero as stated in arithmetic
mean property, it is better to say absolute mean deviation
instead of mean deviation.
04/27/2025
VARIANCE AND COEFFICIENT OF VARIATION
The Variance and Standard Deviation are the most superior and
widely used measures of dispersions and both measure the
average dispersion of the observations around the mean.
The Variance of a data set is the sum of the squares of the
deviation of each observation taken from the mean divided
by total number of observations in the data set.
The positive square root of variance is called standard
deviation.

For population, the variance and standard deviation can be


computed as:

04/27/2025
CONT…

04/27/2025
FOR SAMPLE VARIANCE CAN BE CALCULATED AS:-
For a sample of n elements, the sample variance and
standard deviation denoted by S2 and S, respectively,
are calculated using the formulae:-

04/27/2025
Disadvantages Of Variance
The variation of the data is exaggerated because the deviation
(difference) of the each value from the mean is squared.

 variance gives more weight the extreme values as compared


to those which are near to the mean value.

The unit of variance is the square of the unit of measurement


of the variable values that leads us to wrong interpretations on
average!.

04/27/2025
Standard Deviation
Standard deviation is the positive square root of variance.

Standard deviation is considered to be the best measure of


dispersion because the unit of measurement is the same as
the data set and the exaggeration made by variance will be
eliminated by taking the square root of it.

If the standard deviation of the data is small the values are


concentrated near the mean and if it large the values are
scattered away from the mean.

04/27/2025
INTERPRETATION OF THE STANDARD DEVIATION
If the data are a sample and the distribution is normal or bell-
shaped (or close to it!) or approximately normally distributed, then
the following conclusions can be reached:

04/27/2025
EMPIRICAL RELATIONSHIP QD,MD AND SD
6QD=5MD=4SD
 QD=5MD/6 or QD=2SD/3
MD=6QD/5 or MD=4SD/5
SD=3QD/2=1.5QD or SD=5MD/4=1.25MD
NOTE
If there are two or more distributions of different
variables (having different units of measurement), there
variability cannot be compared by comparing the values
of the standard deviation.

04/27/2025
COEFFICIENT OF VARIATION (CV)
Coefficient Of Variation Used When:-
The groups have different units of measurement.
The size of the data between the groups is not the same.

Method Of Calculation:
It is a relative measure of standard deviation.
The coefficient of variation is the ratio of the standard
deviation to the mean and it is expressed as percent.

04/27/2025
CONT….

It is used for comparing the variability of two or more distributions.


The distribution having less CV is said to be:-
less variable
more consistent
more uniform
More reliable

04/27/2025
EXAMPLE
1. Calculate the R, CR,QD, CQD, MD(mean),
MD(median) and CMD for the following
data.20,28,40,12,30,15,50.
2. Calculate the R,CR, QD,CQD, MD and CMD for the
following data.

04/27/2025
SOLUTIONS FOR EXAMPLE 1

04/27/2025
CONT…

04/27/2025
SOLUTION FOR EXAMPLE..2

04/27/2025
SOLUTION CONT…
Mean=25.64

Q3 =31.07

Q1 =20 Mode= 27.64

Median =26.1

04/27/2025
SOLUTION CONT…

04/27/2025

You might also like