0% found this document useful (0 votes)

5 views254 pages

Module 1

The document outlines a course on Probability and Statistics taught by Dr. Mohit Kumar at VIT Chennai, covering various modules including introduction to statistics, random variables, correlation and regression, probability distributions, and hypothesis testing. It emphasizes the importance of statistical methods in decision-making and quality improvement, particularly in industrial contexts. Additionally, it discusses key concepts such as data, types of statistics, variability, and experimental design.

Uploaded by

anantvmanoj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views254 pages

Module 1

Uploaded by

anantvmanoj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 254

BMAT202L: Probability and Statistics

Dr. Mohit Kumar

VIT Chennai
Learning Objective

▶ Module 1: Introduction to Statistics

▶ Module 2: Random Variables
▶ Module 3: Correlation and Regression
▶ Module 4: Probability Distributions
▶ Module 5: Hypothesis Testing-I
▶ Module 6: Hypothesis Testing-II
▶ Module 7: Reliability
Module 1: Introduction to Statistics

▶ Statistics and data analysis

▶ Measures of central tendency
▶ Measures of dispersion
▶ Moments
▶ Skewness
▶ Kurtosis
Module 2: Random Variables

▶ Probability mass, distribution and density functions

▶ Joint probability distribution and density functions
▶ Marginal, Conditional distribution and density functions
▶ Mathematical expectation and its properties
▶ Covariance
▶ Moment generating function
Module 3: Correlation and Regression

▶ Rank correlation
▶ Partial and Multiple correlation
▶ Multiple regression
Module 4: Probability Distributions

▶ Binomial distribution
▶ Poisson distribution
▶ Normal distribution
▶ Gamma distribution
▶ Exponential distribution
▶ Weibull distribution
Module 5: Hypothesis Testing-I

▶ Types of errors
▶ Critical region
▶ Procedure for testing of hypothesis
▶ Large sample tests
▶ Z test for single proportion
▶ Difference of proportion
▶ Mean and difference of means
Module 6: Hypothesis Testing-II

▶ Small sample tests

▶ Student’s t-test
▶ F-test
▶ Chi-square test
▶ Goodness of fit
▶ Independence of attributes
▶ Design of experiments
▶ Analysis of variance
▶ One way-Two way-Three way classifications
▶ Completely Randomized Design (CRD)
▶ Randomized Block Design (RBD)
▶ Latin Square Design (LSD)
Module 7: Reliability

▶ Basic concepts
▶ Hazard function
▶ Reliabilities of series and parallel systems
▶ System reliability
▶ Maintainability
▶ Preventive and repair maintenance
▶ Availability
Text and Reference Books

1. Ronald E. Walpole, Raymond H. Myers, Sharon L. Myers, Keying Ye:

Probability & Statistics for Engineers & Scientists, 2012, 9th
Edition, Pearson.

2. Douglas C. Montgomery, George C. Runger: Applied Statistics and

Probability for Engineers, 2018, 7th Edition, John Wiley & Sons.

3. Jay L. Devore: Probability and Statistics for Engineering and the

Sciences, 2016, 9th Edition, Cengage Learning.

4. Richard A. Johnson: Miller Freund’s Probability and Statistics for

Engineers, 2018, 9th Edition, Pearson.

5. Bilal M. Ayyub, Richard H. McCuen: Probability, Statistics and

Reliability for Engineers and Scientists, 2011, 3rd Edition, CRC
Press.
Statistics and Data Analysis
Motivation: Why Study Statistics?

▶ In the middle of the 20th century, the Japanese were able to succeed where
others failed, to come up with high-quality products.
Motivation: Why Study Statistics?

▶ In the middle of the 20th century, the Japanese were able to succeed where
others failed, to come up with high-quality products.
▶ Much of the success of the Japanese was due to the use of statistical methods
and statistical thinking among management personnel.
▶ From 1980s till today, our attention is focused on improvement of quality in
industry.
Motivation: Why Study Statistics?

▶ What is data?
Important Aspects of Statistics

▶ What is data?
▶ plural of “datum” which means “a piece of information”.
Important Aspects of Statistics

▶ What is data?
▶ plural of “datum” which means “a piece of information”.
▶ facts collected together for reference or analysis.
Important Aspects of Statistics

▶ What is data?
▶ plural of “datum” which means “a piece of information”.
▶ facts collected together for reference or analysis.
▶ the quantities, characters, or symbols on which operations are performed
by a computer, which may be stored and transmitted in the form of
electrical signals and recorded on magnetic, optical, or mechanical
recording media.
Important Aspects of Statistics

▶ What is statistics?
Important Aspects of Statistics

▶ What is statistics?
▶ plural of “statistic” which means “a fact or piece of data obtained from a
study of a large quantity of numerical data”.
Important Aspects of Statistics

▶ What is statistics?
▶ plural of “statistic” which means “a fact or piece of data obtained from a
study of a large quantity of numerical data”.
▶ the practice or science of collecting and analysing numerical data in
large quantities, especially for the purpose of inferring proportions in a
whole from those in a representative sample.
Important Aspects of Statistics
▶ Types of Statistics
Important Aspects of Statistics
▶ Types of Statistics
▶ Descriptive Statistics: A set of brief descriptive coefficients that
summarizes a given data set, which can either be a representation of the
entire population or a sample.
Important Aspects of Statistics
▶ Types of Statistics
▶ Descriptive Statistics: A set of brief descriptive coefficients that
summarizes a given data set, which can either be a representation of the
entire population or a sample.
▶ Inferential Statistics:
Important Aspects of Statistics
▶ Types of Statistics
▶ Descriptive Statistics: A set of brief descriptive coefficients that
summarizes a given data set, which can either be a representation of the
entire population or a sample.
▶ Inferential Statistics:
▶ “inference” meaning “a conclusion reached on the basis of
evidence and reasoning”.
Important Aspects of Statistics
▶ Types of Statistics
▶ Descriptive Statistics: A set of brief descriptive coefficients that
summarizes a given data set, which can either be a representation of the
entire population or a sample.
▶ Inferential Statistics:
▶ “inference” meaning “a conclusion reached on the basis of
evidence and reasoning”.
▶ Inferential statistics makes inferences about populations using data
drawn from the population.
Important Aspects of Statistics
▶ Types of Statistics
▶ Descriptive Statistics: A set of brief descriptive coefficients that
summarizes a given data set, which can either be a representation of the
entire population or a sample.
▶ Inferential Statistics:
▶ “inference” meaning “a conclusion reached on the basis of
evidence and reasoning”.
▶ Inferential statistics makes inferences about populations using data
drawn from the population.

▶ Why Statistics is so important?

Important Aspects of Statistics
▶ Types of Statistics
▶ Descriptive Statistics: A set of brief descriptive coefficients that
summarizes a given data set, which can either be a representation of the
entire population or a sample.
▶ Inferential Statistics:
▶ “inference” meaning “a conclusion reached on the basis of
evidence and reasoning”.
▶ Inferential statistics makes inferences about populations using data
drawn from the population.

▶ Why Statistics is so important?

▶ Ability to make informed decisions in the presence of Uncertainty and
Variation.
Important Aspects of Statistics
▶ Types of Statistics
▶ Descriptive Statistics: A set of brief descriptive coefficients that
summarizes a given data set, which can either be a representation of the
entire population or a sample.
▶ Inferential Statistics:
▶ “inference” meaning “a conclusion reached on the basis of
evidence and reasoning”.
▶ Inferential statistics makes inferences about populations using data
drawn from the population.

▶ Why Statistics is so important?

▶ Ability to make informed decisions in the presence of Uncertainty and
Variation.
▶ Uncertainty is the situation which involves imperfect and/or unknown
information.
Important Aspects of Statistics
▶ Types of Statistics
▶ Descriptive Statistics: A set of brief descriptive coefficients that
summarizes a given data set, which can either be a representation of the
entire population or a sample.
▶ Inferential Statistics:
▶ “inference” meaning “a conclusion reached on the basis of
evidence and reasoning”.
▶ Inferential statistics makes inferences about populations using data
drawn from the population.

▶ Why Statistics is so important?

▶ Ability to make informed decisions in the presence of Uncertainty and
Variation.
▶ Uncertainty is the situation which involves imperfect and/or unknown
information.
▶ Variation is a change or slight difference in condition, amount, or level,
typically within certain limits.
Important Aspects of Statistics

▶ Sources of Variation
Important Aspects of Statistics

▶ Sources of Variation
▶ Inherent
Important Aspects of Statistics

▶ Sources of Variation
▶ Inherent
▶ External
Important Aspects of Statistics

▶ Sources of Variation
▶ Inherent
▶ External

▶ Example: An engineer may be concerned with a specific instrument that is

▶ Sources of Variation
▶ Inherent
▶ External

▶ Example: An engineer may be concerned with a specific instrument that is

▶ Sources of Variation
▶ Inherent
▶ External

▶ Example: An engineer may be concerned with a specific instrument that is

▶ Sources of Variation
▶ Inherent
▶ External

▶ Example: An engineer may be concerned with a specific instrument that is

▶ Sources of Variation
▶ Inherent
▶ External

▶ Example: An engineer may be concerned with a specific instrument that is

used to measure sulfur monoxide in the air during pollution studies. If the
engineer has doubts about the effectiveness of the instrument, there are two
sources of variation that must be dealt with.
▶ The first is the variation in sulfur monoxide values that are found at the
same locale on the same day.
▶ The second is the variation between values observed and the true amount
of sulfur monoxide that is in the air at the time.
▶ If either of these two sources of variation is exceedingly large (according to
some standard set by the engineer), the instrument may need to be replaced.
▶ If the device for measuring sulfur monoxide always gives the same value and
the value is accurate (i.e., it is correct), no statistical analysis is needed.
Important Aspects of Statistics
▶ Variability in Scientific Data
Important Aspects of Statistics
▶ Variability in Scientific Data
▶ Variability refers to the extent to which these data points differ from each
other.
Important Aspects of Statistics
▶ Variability in Scientific Data
▶ Variability refers to the extent to which these data points differ from each
other.
▶ Samples or Observations
Important Aspects of Statistics
▶ Variability in Scientific Data
▶ Variability refers to the extent to which these data points differ from each
other.
▶ Samples or Observations
▶ a small part or quantity intended to show what the whole is like.
Important Aspects of Statistics
▶ Variability in Scientific Data
▶ Variability refers to the extent to which these data points differ from each
other.
▶ Samples or Observations
▶ a small part or quantity intended to show what the whole is like.

▶ Populations
Important Aspects of Statistics
▶ Variability in Scientific Data
▶ Variability refers to the extent to which these data points differ from each
other.
▶ Samples or Observations
▶ a small part or quantity intended to show what the whole is like.

▶ Populations
▶ a finite or infinite collection of items under consideration.
Important Aspects of Statistics
▶ Variability in Scientific Data
▶ Variability refers to the extent to which these data points differ from each
other.
▶ Samples or Observations
▶ a small part or quantity intended to show what the whole is like.

▶ Populations
▶ a finite or infinite collection of items under consideration.

▶ Factors
Important Aspects of Statistics
▶ Variability in Scientific Data
▶ Variability refers to the extent to which these data points differ from each
other.
▶ Samples or Observations
▶ a small part or quantity intended to show what the whole is like.

▶ Populations
▶ a finite or infinite collection of items under consideration.

▶ Factors
▶ characteristics or quantities associated with population.
Important Aspects of Statistics
▶ Variability in Scientific Data
▶ Variability refers to the extent to which these data points differ from each
other.
▶ Samples or Observations
▶ a small part or quantity intended to show what the whole is like.

▶ Populations
▶ a finite or infinite collection of items under consideration.

▶ Factors
▶ characteristics or quantities associated with population.

▶ Experimental Design
Important Aspects of Statistics
▶ Variability in Scientific Data
▶ Variability refers to the extent to which these data points differ from each
other.
▶ Samples or Observations
▶ a small part or quantity intended to show what the whole is like.

▶ Populations
▶ a finite or infinite collection of items under consideration.

▶ Factors
▶ characteristics or quantities associated with population.

▶ Experimental Design
▶ a design developed by the experimenter by controlling the factors in the
data.
Important Aspects of Statistics
▶ Variability in Scientific Data
▶ Variability refers to the extent to which these data points differ from each
other.
▶ Samples or Observations
▶ a small part or quantity intended to show what the whole is like.

▶ Populations
▶ a finite or infinite collection of items under consideration.

▶ Factors
▶ characteristics or quantities associated with population.

▶ Experimental Design
▶ a design developed by the experimenter by controlling the factors in the
data.
▶ Observational Study
Important Aspects of Statistics
▶ Variability in Scientific Data
▶ Variability refers to the extent to which these data points differ from each
other.
▶ Samples or Observations
▶ a small part or quantity intended to show what the whole is like.

▶ Populations
▶ a finite or infinite collection of items under consideration.

▶ Factors
▶ characteristics or quantities associated with population.

▶ Experimental Design
▶ a design developed by the experimenter by controlling the factors in the
data.
▶ Observational Study
▶ a study of the data with no control on the factors.
The Role of Probability

▶ Concepts in probability form a major component that supplements statistical

methods and helps us gauge the strength of the statistical inference.

What is Probability?

How will it help to understand data better?

The Role of Probability

▶ Concepts in probability form a major component that supplements statistical

methods and helps us gauge the strength of the statistical inference.

What is Probability?
▶ The theory with sound mathematical foundation (Axioms) which involves
dealing with uncertainty and variation.

How will it help to understand data better?

The Role of Probability

▶ Concepts in probability form a major component that supplements statistical

methods and helps us gauge the strength of the statistical inference.

What is Probability?
▶ The theory with sound mathematical foundation (Axioms) which involves
dealing with uncertainty and variation.

How will it help to understand data better?

▶ Once the theory is used, it will provide a bridge between ”data” and ”model”
developed to understand the data better.
How Probability and Statistical Inference Work Together?

▶ Inductive Reasoning: The sample

along with inferential statistics
allows us to draw conclusions about
the population, with inferential
statistics making clear use of
elements of probability.
How Probability and Statistical Inference Work Together?

▶ Inductive Reasoning: The sample

along with inferential statistics
allows us to draw conclusions about
the population, with inferential
statistics making clear use of
elements of probability.
▶ Deductive Reasoning: Elements in
probability allow us to draw
conclusions about characteristics of
hypothetical data taken from the
population, based on known features
of the population.
Sampling Procedures: Collection of Data

Why Sampling?
▶ It is not always possible to study entire population.

What is Sampling?
Sampling Procedures: Collection of Data

Why Sampling?
▶ It is not always possible to study entire population.
▶ Time consuming.

What is Sampling?
Sampling Procedures: Collection of Data

Why Sampling?
▶ It is not always possible to study entire population.
▶ Time consuming.
▶ Not budget friendly.

What is Sampling?
Sampling Procedures: Collection of Data

Why Sampling?
▶ It is not always possible to study entire population.
▶ Time consuming.
▶ Not budget friendly.

What is Sampling?
▶ Sampling is the process to collect a limited amount of data to form a sample.
Sampling Procedures: Collection of Data

Why Sampling?
▶ It is not always possible to study entire population.
▶ Time consuming.
▶ Not budget friendly.

What is Sampling?
▶ Sampling is the process to collect a limited amount of data to form a sample.
▶ How can it be done?
Sampling Procedures: Collection of Data

Why Sampling?
▶ It is not always possible to study entire population.
▶ Time consuming.
▶ Not budget friendly.

What is Sampling?
▶ Sampling is the process to collect a limited amount of data to form a sample.
▶ How can it be done?
▶ It might seem straightforward, but it is not so, as every method may have a
tendency to produce biased sample which is not a good representative of the
population under study.
Sampling Procedures: Collection of Data

Why Sampling?
▶ It is not always possible to study entire population.
▶ Time consuming.
▶ Not budget friendly.

Simple Random Sampling

▶ A method in which a subset of a population in which each member of the
subset has an equal probability of being chosen.

Stratified Random Sampling

Sampling Methods

Simple Random Sampling

▶ A method in which a subset of a population in which each member of the
subset has an equal probability of being chosen.
▶ A simple random sample is meant to be an unbiased representation of a group.

Stratified Random Sampling

Sampling Methods

Simple Random Sampling

Stratified Random Sampling

▶ A method in which the population is divided into number of different
homogeneous subgroups, called “strata”, and then the simple random sampling
is done on the strata.
Experimental Design

Why Design an Experiment?

▶ To control the factors.

What is an Experimental Design?

Experimental Design

Why Design an Experiment?

▶ To control the factors.
▶ To have better understanding of the effect of factors on population.

What is an Experimental Design?

Experimental Design

Why Design an Experiment?

▶ To control the factors.
▶ To have better understanding of the effect of factors on population.

What is an Experimental Design?

▶ In an experiment, we deliberately change one or more process variables (or
factors) in order to observe the effect the changes have on one or more
response variables.
Experimental Design

Why Design an Experiment?

▶ To control the factors.
▶ To have better understanding of the effect of factors on population.

What is an Experimental Design?

▶ In an experiment, we deliberately change one or more process variables (or
factors) in order to observe the effect the changes have on one or more
response variables.
▶ The (statistical) design of experiments (DOE) is an efficient procedure for
planning experiments so that the data obtained can be analysed to yield valid
and objective conclusions.
Measures of Central Tendency
The Sample Mean: Individual Data
▶ The sample mean is also called the ‘arithmetic mean’ or the
‘average’.
The Sample Mean: Individual Data
▶ The sample mean is also called the ‘arithmetic mean’ or the
‘average’.
▶ Let x1 , x2 , . . . , xn denote the observations.
The Sample Mean: Individual Data
▶ The sample mean is also called the ‘arithmetic mean’ or the
‘average’.
▶ Let x1 , x2 , . . . , xn denote the observations.
▶ The sample mean, denoted by x̄, is

n
1X x1 + x2 + . . . + xn
x̄ = xi = .
n n
i=1
The Sample Mean: Individual Data
▶ The sample mean is also called the ‘arithmetic mean’ or the
‘average’.
▶ Let x1 , x2 , . . . , xn denote the observations.
▶ The sample mean, denoted by x̄, is

n
1X x1 + x2 + . . . + xn
x̄ = xi = .
n n
i=1

▶ Example: The intelligence quotients (IQs) of ten students in a

class are 70, 120, 110, 101, 88, 83, 95, 98, 107, 100. Find the
mean IQ.
The Sample Mean: Individual Data
▶ The sample mean is also called the ‘arithmetic mean’ or the
‘average’.
▶ Let x1 , x2 , . . . , xn denote the observations.
▶ The sample mean, denoted by x̄, is

n
1X x1 + x2 + . . . + xn
x̄ = xi = .
n n
i=1

▶ Example: The intelligence quotients (IQs) of ten students in a

class are 70, 120, 110, 101, 88, 83, 95, 98, 107, 100. Find the
mean IQ.
▶ The mean IQ is

70 + 120 + 110 + 101 + 88 + 83 + 95 + 98 + 107 + 100

x̄ = = 97.2
10
The Sample Mean: Discrete Frequency Data
▶ Let fi be the frequency of the variable xi , i = 1, 2, . . . , n.
The Sample Mean: Discrete Frequency Data
▶ Let fi be the frequency of the variable xi , i = 1, 2, . . . , n.
▶ The sample mean or weighted sample mean is

n
f1 x1 + f2 x2 + · · · + fn xn 1 X
x̄ = = fi xi ,
f1 + f2 · · · + fn N
i=1

Pn
where N = i=1 fi .
The Sample Mean: Discrete Frequency Data
▶ Let fi be the frequency of the variable xi , i = 1, 2, . . . , n.
▶ The sample mean or weighted sample mean is

n
f1 x1 + f2 x2 + · · · + fn xn 1 X
x̄ = = fi xi ,
f1 + f2 · · · + fn N
i=1

Pn
where N = i=1 fi .
▶ Example: Frequency distribution of the number of telephone calls
received at an exchange in 245 successive one-minute intervals are
No. of Calls 0 1 2 3 4 5 6 7
Frequency 14 21 25 43 51 40 39 12
Find the mean number of calls per minute at the exchange.
The Sample Mean: Discrete Frequency Data
▶ Let fi be the frequency of the variable xi , i = 1, 2, . . . , n.
▶ The sample mean or weighted sample mean is

n
f1 x1 + f2 x2 + · · · + fn xn 1 X
x̄ = = fi xi ,
f1 + f2 · · · + fn N
i=1

Pn
where N = i=1 fi .
▶ Example: Frequency distribution of the number of telephone calls
received at an exchange in 245 successive one-minute intervals are
No. of Calls 0 1 2 3 4 5 6 7
Frequency 14 21 25 43 51 40 39 12
Find the mean number of calls per minute at the exchange.
▶ The mean number of calls per minute at the exchange is

n
1 X 922
x̄ = fi x i = = 3.763.
N 245
i=1
The Sample Mean

▶ Let xi , i = 1, 2, . . . , n be a sample. If xi are large, the calculation

of the sample mean can be substantially reduced by using the
following method.
The Sample Mean

▶ Let xi , i = 1, 2, . . . , n be a sample. If xi are large, the calculation

of the sample mean can be substantially reduced by using the
following method.
▶ Take yi = xi −a
b , for all i = 1, . . . , n, where a and b ̸= 0 are
constants. Therefore, xi = a + byi . Then

x̄ = a + bȳ.
The Sample Mean

▶ Let xi , i = 1, 2, . . . , n be a sample. If xi are large, the calculation

of the sample mean can be substantially reduced by using the
following method.
▶ Take yi = xi −a
b , for all i = 1, . . . , n, where a and b ̸= 0 are
constants. Therefore, xi = a + byi . Then

x̄ = a + bȳ.

▶ Note: In case of grouped or continuous frequency distribution, xi

is taken as the mid-value of the corresponding ith class.
The Sample Mean: Grouped Frequency Data

▶ Example: Calculate the sample mean for the following frequency

distribution:
Class Interval 0-8 8-16 16-24 24-32 32-40 40-48
Frequency 8 7 16 24 15 7
The Sample Mean: Grouped Frequency Data

▶ Example: Calculate the sample mean for the following frequency

distribution:
Class Interval 0-8 8-16 16-24 24-32 32-40 40-48
Frequency 8 7 16 24 15 7
▶ Ans: 25.4026.
The Sample Median: Individual Data

▶ Let x1 , x2 , . . . , xn be the observations.

The Sample Median: Individual Data

▶ Let x1 , x2 , . . . , xn be the observations.

▶ Arrange the observations in increasing order of magnitude, say
x(1) , x(2) , . . . , x(n) . Then the sample median, denoted by x̃, is

 x((n+1)/2) if n is odd,
x̃ = x + x(n/2+1)
 (n/2) if n is even.
2
▶ Example: The sample median of the values 8, 4, 7, 6, 2, i.e.,
2, 4, 6, 7, 8 is 6.
Example: The sample median of 10, 15, 30, 70, 40, 80, i.e.,
10, 15, 30, 40, 70, 80 is (30 + 40)/2 = 35.
The Sample Median: Discrete Frequency Data

▶ In case of discrete frequency distribution, the sample median is obtained by

considering the cumulative frequencies (c.f.). The steps for calculating median
are as follows:
The Sample Median: Discrete Frequency Data

▶ In case of discrete frequency distribution, the sample median is obtained by

considering the cumulative frequencies (c.f.). The steps for calculating median
are as follows:
▶ Let fi be the frequency of xi , i = 1, 2, . . . , n.
The Sample Median: Discrete Frequency Data

▶ In case of discrete frequency distribution, the sample median is obtained by

considering the cumulative frequencies (c.f.). The steps for calculating median
are as follows:
▶ Let fi be the frequency of xi , i = 1, 2, . . . , n.
▶ Find N/2, where N = ni=1 fi .
P
The Sample Median: Discrete Frequency Data

▶ In case of discrete frequency distribution, the sample median is obtained by

considering the cumulative frequencies (c.f.). The steps for calculating median
are as follows:
▶ Let fi be the frequency of xi , i = 1, 2, . . . , n.
▶ Find N/2, where N = ni=1 fi .
P

▶ Find xi corresponding to the cumulative frequency just greater than N/2,

which is the median.
The Sample Median: Discrete Frequency Data

▶ In case of discrete frequency distribution, the sample median is obtained by

considering the cumulative frequencies (c.f.). The steps for calculating median
are as follows:
▶ Let fi be the frequency of xi , i = 1, 2, . . . , n.
▶ Find N/2, where N = ni=1 fi .
P

▶ Find xi corresponding to the cumulative frequency just greater than N/2,

which is the median.
▶ Example: Obtain the sample median for the following frequency distribution:
x 1 2 3 4 5 6 7 8 9
f 8 10 11 16 20 25 15 9 6
The Sample Median: Discrete Frequency Data

▶ In case of discrete frequency distribution, the sample median is obtained by

considering the cumulative frequencies (c.f.). The steps for calculating median
are as follows:
▶ Let fi be the frequency of xi , i = 1, 2, . . . , n.
▶ Find N/2, where N = ni=1 fi .
P

▶ Find xi corresponding to the cumulative frequency just greater than N/2,

which is the median.
▶ Example: Obtain the sample median for the following frequency distribution:
x 1 2 3 4 5 6 7 8 9
f 8 10 11 16 20 25 15 9 6

x 1 2 3 4 5 6 7 8 9
▶ f 8 10 11 16 20 25 15 9 6
c.f. 8 18 29 45 65 90 105 114 120
The Sample Median: Discrete Frequency Data

▶ In case of discrete frequency distribution, the sample median is obtained by

considering the cumulative frequencies (c.f.). The steps for calculating median
are as follows:
▶ Let fi be the frequency of xi , i = 1, 2, . . . , n.
▶ Find N/2, where N = ni=1 fi .
P

▶ Find xi corresponding to the cumulative frequency just greater than N/2,

which is the median.
▶ Example: Obtain the sample median for the following frequency distribution:
x 1 2 3 4 5 6 7 8 9
f 8 10 11 16 20 25 15 9 6

x 1 2 3 4 5 6 7 8 9
▶ f 8 10 11 16 20 25 15 9 6
c.f. 8 18 29 45 65 90 105 114 120
▶ N = 120 and N/2 = 60. The cumulative frequency just greater than N/2 is 65
and the value of x corresponding to 65 is 5. So, the sample median is 5.
The Sample Median: Grouped Frequency Data

▶ In the case of continuous frequency distribution, the class

corresponding to the c.f. just greater than N/2 is called the
median class and the value of median is obtained by the
following formula:

h
Median = l + (N/2 − c),
f

where
▶ l is the lower limit of the median class
▶ f is the frequency of the median class
▶ h is the magnitude of the median class
▶ c is the c.f.of the class preceding the median class
▶
P
N= f
The Sample Median: Grouped Frequency Data

▶ Example: Find the median wage of the following distribution:

Wages (in rupees) No. of workers
2000-3000 3
3000-4000 5
4000-5000 20
5000-6000 10
6000-7000 5
The Sample Median: Grouped Frequency Data

▶ Example: Find the median wage of the following distribution:

Wages (in rupees) No. of workers
2000-3000 3
3000-4000 5
4000-5000 20
5000-6000 10
6000-7000 5
▶ Solution:
Wages (in rupees) No. of workers c.f.
2000-3000 3 3
3000-4000 5 8
4000-5000 20 28
5000-6000 10 38
6000-7000 5 43
The Sample Median: Grouped Frequency Data

▶ N = 43, N/2 = 21.5.

The Sample Median: Grouped Frequency Data

▶ N = 43, N/2 = 21.5.

▶ Cumulative frequency just greater than 21.5 is 28 and the
corresponding class is 4000 − 5000.
The Sample Median: Grouped Frequency Data

▶ N = 43, N/2 = 21.5.

▶ Cumulative frequency just greater than 21.5 is 28 and the
corresponding class is 4000 − 5000.
▶ Median class is 4000 − 5000.
The Sample Median: Grouped Frequency Data

▶ N = 43, N/2 = 21.5.

▶ Cumulative frequency just greater than 21.5 is 28 and the
corresponding class is 4000 − 5000.
▶ Median class is 4000 − 5000.
▶ Here l = 4000, h = 1000, f = 20, c = 8. Therefore,

h 1000
Median = l + (N/2 − c) = 4000 + (21.5 − 8) = 4675
f 20
The Sample Median: Grouped Frequency Data

▶ N = 43, N/2 = 21.5.

▶ Cumulative frequency just greater than 21.5 is 28 and the
corresponding class is 4000 − 5000.
▶ Median class is 4000 − 5000.
▶ Here l = 4000, h = 1000, f = 20, c = 8. Therefore,

h 1000
Median = l + (N/2 − c) = 4000 + (21.5 − 8) = 4675
f 20

▶ The median wage is 4675 rupees.

The Trimmed Mean

Trimmed mean can be calculated as follows:

▶ Let the sample size be n and p% trimmed mean is desired.
The Trimmed Mean

Trimmed mean can be calculated as follows:

▶ Let the sample size be n and p% trimmed mean is desired.
▶ The number of data points to be trimmed are, nearest whole
integer to np/100.
The Trimmed Mean

Trimmed mean can be calculated as follows:

▶ Let the sample size be n and p% trimmed mean is desired.
▶ The number of data points to be trimmed are, nearest whole
integer to np/100.
▶ Arrange the sample values in increasing order.
▶ Trim an equal number of sample values from each end.
▶ Find the mean of the remaining sample values.
▶ The resulting mean of the remaining sample values is called the
‘p% trimmed mean’.
▶ Example: Find the 5%, 10% and 20% trimmed means of
90, 30, 105, 79, 99, 80, 39, 149, 191, 13, 232, 240, 5, 274, 107, 470.
The Geometric Mean
▶ Let x1 , x2 , . . . , xn be an observed sample.
The Geometric Mean
▶ Let x1 , x2 , . . . , xn be an observed sample.
▶ The geometric mean G is given as

√
G= n
x1 x2 . . . xn = (x1 x2 . . . xn )1/n .
The Geometric Mean
▶ Let x1 , x2 , . . . , xn be an observed sample.
▶ The geometric mean G is given as

√
G= n
x1 x2 . . . xn = (x1 x2 . . . xn )1/n .

▶ Taking logarithm on both sides gives

n
1X
log G = log xi
n
i=1
The Geometric Mean
▶ Let x1 , x2 , . . . , xn be an observed sample.
▶ The geometric mean G is given as

√
G= n
x1 x2 . . . xn = (x1 x2 . . . xn )1/n .

▶ Taking logarithm on both sides gives

n
1X
log G = log xi
n
i=1

▶ Taking antilog gives

n
!
1X
G = antilog log xi .
n
i=1
The Geometric Mean
▶ Let x1 , x2 , . . . , xn be an observed sample.
▶ The geometric mean G is given as

√
G= n
x1 x2 . . . xn = (x1 x2 . . . xn )1/n .

▶ Taking logarithm on both sides gives

n
1X
log G = log xi
n
i=1

▶ Taking antilog gives

n
!
1X
G = antilog log xi .
n
i=1

▶ Example: Find the geometric mean of 2, 4, 8, 12, 16, 24.

The Geometric Mean
▶ Let x1 , x2 , . . . , xn be an observed sample.
▶ The geometric mean G is given as

√
G= n
x1 x2 . . . xn = (x1 x2 . . . xn )1/n .

▶ Taking logarithm on both sides gives

n
1X
log G = log xi
n
i=1

▶ Taking antilog gives

n
!
1X
G = antilog log xi .
n
i=1

▶ Example: Find the geometric mean of 2, 4, 8, 12, 16, 24.

▶ Ans: 8.158
The Geometric Mean
▶ Let f1 , f2 , . . . , fn be the frequency distribution of the observations x1 , x2 , . . . , xn ,
respectively. Then the geometric mean is given by

G = (x1f1 x2f2 . . . xnfn )1/N ,

Pn
where N = i=1 fi .
The Geometric Mean
▶ Let f1 , f2 , . . . , fn be the frequency distribution of the observations x1 , x2 , . . . , xn ,
respectively. Then the geometric mean is given by

G = (x1f1 x2f2 . . . xnfn )1/N ,

where N = ni=1 fi .
P

▶ Taking logarithm on both sides gives

n
1 X
log G = fi log xi
N i=1
The Geometric Mean
▶ Let f1 , f2 , . . . , fn be the frequency distribution of the observations x1 , x2 , . . . , xn ,
respectively. Then the geometric mean is given by

G = (x1f1 x2f2 . . . xnfn )1/N ,

where N = ni=1 fi .
P

▶ Taking logarithm on both sides gives

n
1 X
log G = fi log xi
N i=1

▶ Taking antilog gives

n
!
1 X
G = antilog fi log xi .
N i=1
The Geometric Mean
▶ Let f1 , f2 , . . . , fn be the frequency distribution of the observations x1 , x2 , . . . , xn ,
respectively. Then the geometric mean is given by

G = (x1f1 x2f2 . . . xnfn )1/N ,

where N = ni=1 fi .
P

▶ Taking logarithm on both sides gives

n
1 X
log G = fi log xi
N i=1

▶ Taking antilog gives

n
!
1 X
G = antilog fi log xi .
N i=1

▶ Example: Find the geometric mean for the following distribution:

Marks 0-10 10-20 20-30 30-40 40-50
Number of Students 5 7 15 25 8
The Geometric Mean
▶ Let f1 , f2 , . . . , fn be the frequency distribution of the observations x1 , x2 , . . . , xn ,
respectively. Then the geometric mean is given by

G = (x1f1 x2f2 . . . xnfn )1/N ,

where N = ni=1 fi .
P

▶ Taking logarithm on both sides gives

n
1 X
log G = fi log xi
N i=1

▶ Taking antilog gives

n
!
1 X
G = antilog fi log xi .
N i=1

▶ Example: Find the geometric mean for the following distribution:

Marks 0-10 10-20 20-30 30-40 40-50
Number of Students 5 7 15 25 8
▶ Ans: 25.64 marks
The Harmonic Mean
▶ The harmonic mean H of n observations x1 , x2 , . . . , xn is given by
n
1 1 1 1 1 1X 1
= + + ··· + = .
H n x1 x2 xn n xi
i=1
The Harmonic Mean
▶ The harmonic mean H of n observations x1 , x2 , . . . , xn is given by
n
1 1 1 1 1 1X 1
= + + ··· + = .
H n x1 x2 xn n xi
i=1

▶ If f1 , f2 , . . . , fn is the frequency distribution of the observations

x1 , x2 , . . . , xn , respectively. Then the harmonic mean H is given by
n
1 1 f1 f2 fn 1 X fi
= + + ··· + = ,
H N x1 x2 xn N xi
i=1

Pn
where N = i=1 fi .
The Harmonic Mean
▶ The harmonic mean H of n observations x1 , x2 , . . . , xn is given by
n
1 1 1 1 1 1X 1
= + + ··· + = .
H n x1 x2 xn n xi
i=1

▶ If f1 , f2 , . . . , fn is the frequency distribution of the observations

x1 , x2 , . . . , xn , respectively. Then the harmonic mean H is given by
n
1 1 f1 f2 fn 1 X fi
= + + ··· + = ,
H N x1 x2 xn N xi
i=1

Pn
where N = i=1 fi .
▶ Example: A cyclist pedals from his house to his college at a speed of
10 kmph and back from college to his house at 15 kmph. Find the
average speed.
The Harmonic Mean
▶ The harmonic mean H of n observations x1 , x2 , . . . , xn is given by
n
1 1 1 1 1 1X 1
= + + ··· + = .
H n x1 x2 xn n xi
i=1

▶ If f1 , f2 , . . . , fn is the frequency distribution of the observations

x1 , x2 , . . . , xn , respectively. Then the harmonic mean H is given by
n
1 1 f1 f2 fn 1 X fi
= + + ··· + = ,
H N x1 x2 xn N xi
i=1

Pn
where N = i=1 fi .
▶ Example: A cyclist pedals from his house to his college at a speed of
10 kmph and back from college to his house at 15 kmph. Find the
average speed.
▶ Ans: 12 kmph
The Mode

▶ Mode is defined to be that value which occurs most often.

The Mode

▶ Mode is defined to be that value which occurs most often.

▶ Example 1: The mode for the set of values 3, 7, 2, 7, 5, 7, 3 is 7. Since 7 occurs
maximum number of times.
The Mode