0% found this document useful (0 votes)
5 views254 pages

Module 1

The document outlines a course on Probability and Statistics taught by Dr. Mohit Kumar at VIT Chennai, covering various modules including introduction to statistics, random variables, correlation and regression, probability distributions, and hypothesis testing. It emphasizes the importance of statistical methods in decision-making and quality improvement, particularly in industrial contexts. Additionally, it discusses key concepts such as data, types of statistics, variability, and experimental design.

Uploaded by

anantvmanoj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views254 pages

Module 1

The document outlines a course on Probability and Statistics taught by Dr. Mohit Kumar at VIT Chennai, covering various modules including introduction to statistics, random variables, correlation and regression, probability distributions, and hypothesis testing. It emphasizes the importance of statistical methods in decision-making and quality improvement, particularly in industrial contexts. Additionally, it discusses key concepts such as data, types of statistics, variability, and experimental design.

Uploaded by

anantvmanoj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 254

BMAT202L: Probability and Statistics

Dr. Mohit Kumar

VIT Chennai
Learning Objective

▶ Module 1: Introduction to Statistics


▶ Module 2: Random Variables
▶ Module 3: Correlation and Regression
▶ Module 4: Probability Distributions
▶ Module 5: Hypothesis Testing-I
▶ Module 6: Hypothesis Testing-II
▶ Module 7: Reliability
Module 1: Introduction to Statistics

▶ Statistics and data analysis


▶ Measures of central tendency
▶ Measures of dispersion
▶ Moments
▶ Skewness
▶ Kurtosis
Module 2: Random Variables

▶ Probability mass, distribution and density functions


▶ Joint probability distribution and density functions
▶ Marginal, Conditional distribution and density functions
▶ Mathematical expectation and its properties
▶ Covariance
▶ Moment generating function
Module 3: Correlation and Regression

▶ Rank correlation
▶ Partial and Multiple correlation
▶ Multiple regression
Module 4: Probability Distributions

▶ Binomial distribution
▶ Poisson distribution
▶ Normal distribution
▶ Gamma distribution
▶ Exponential distribution
▶ Weibull distribution
Module 5: Hypothesis Testing-I

▶ Types of errors
▶ Critical region
▶ Procedure for testing of hypothesis
▶ Large sample tests
▶ Z test for single proportion
▶ Difference of proportion
▶ Mean and difference of means
Module 6: Hypothesis Testing-II

▶ Small sample tests


▶ Student’s t-test
▶ F-test
▶ Chi-square test
▶ Goodness of fit
▶ Independence of attributes
▶ Design of experiments
▶ Analysis of variance
▶ One way-Two way-Three way classifications
▶ Completely Randomized Design (CRD)
▶ Randomized Block Design (RBD)
▶ Latin Square Design (LSD)
Module 7: Reliability

▶ Basic concepts
▶ Hazard function
▶ Reliabilities of series and parallel systems
▶ System reliability
▶ Maintainability
▶ Preventive and repair maintenance
▶ Availability
Text and Reference Books

1. Ronald E. Walpole, Raymond H. Myers, Sharon L. Myers, Keying Ye:


Probability & Statistics for Engineers & Scientists, 2012, 9th
Edition, Pearson.

2. Douglas C. Montgomery, George C. Runger: Applied Statistics and


Probability for Engineers, 2018, 7th Edition, John Wiley & Sons.

3. Jay L. Devore: Probability and Statistics for Engineering and the


Sciences, 2016, 9th Edition, Cengage Learning.

4. Richard A. Johnson: Miller Freund’s Probability and Statistics for


Engineers, 2018, 9th Edition, Pearson.

5. Bilal M. Ayyub, Richard H. McCuen: Probability, Statistics and


Reliability for Engineers and Scientists, 2011, 3rd Edition, CRC
Press.
Statistics and Data Analysis
Motivation: Why Study Statistics?

▶ In the middle of the 20th century, the Japanese were able to succeed where
others failed, to come up with high-quality products.
Motivation: Why Study Statistics?

▶ In the middle of the 20th century, the Japanese were able to succeed where
others failed, to come up with high-quality products.
▶ Much of the success of the Japanese was due to the use of statistical methods
and statistical thinking among management personnel.
Motivation: Why Study Statistics?

▶ In the middle of the 20th century, the Japanese were able to succeed where
others failed, to come up with high-quality products.
▶ Much of the success of the Japanese was due to the use of statistical methods
and statistical thinking among management personnel.
▶ From 1980s till today, our attention is focused on improvement of quality in
industry.
Motivation: Why Study Statistics?

▶ In the middle of the 20th century, the Japanese were able to succeed where
others failed, to come up with high-quality products.
▶ Much of the success of the Japanese was due to the use of statistical methods
and statistical thinking among management personnel.
▶ From 1980s till today, our attention is focused on improvement of quality in
industry.
▶ With boom in Data Science, Statistics is inevitable.
Important Aspects of Statistics

▶ What is data?
Important Aspects of Statistics

▶ What is data?
▶ plural of “datum” which means “a piece of information”.
Important Aspects of Statistics

▶ What is data?
▶ plural of “datum” which means “a piece of information”.
▶ facts collected together for reference or analysis.
Important Aspects of Statistics

▶ What is data?
▶ plural of “datum” which means “a piece of information”.
▶ facts collected together for reference or analysis.
▶ the quantities, characters, or symbols on which operations are performed
by a computer, which may be stored and transmitted in the form of
electrical signals and recorded on magnetic, optical, or mechanical
recording media.
Important Aspects of Statistics

▶ What is data?
▶ plural of “datum” which means “a piece of information”.
▶ facts collected together for reference or analysis.
▶ the quantities, characters, or symbols on which operations are performed
by a computer, which may be stored and transmitted in the form of
electrical signals and recorded on magnetic, optical, or mechanical
recording media.

▶ What is statistics?
Important Aspects of Statistics

▶ What is data?
▶ plural of “datum” which means “a piece of information”.
▶ facts collected together for reference or analysis.
▶ the quantities, characters, or symbols on which operations are performed
by a computer, which may be stored and transmitted in the form of
electrical signals and recorded on magnetic, optical, or mechanical
recording media.

▶ What is statistics?
▶ plural of “statistic” which means “a fact or piece of data obtained from a
study of a large quantity of numerical data”.
Important Aspects of Statistics

▶ What is data?
▶ plural of “datum” which means “a piece of information”.
▶ facts collected together for reference or analysis.
▶ the quantities, characters, or symbols on which operations are performed
by a computer, which may be stored and transmitted in the form of
electrical signals and recorded on magnetic, optical, or mechanical
recording media.

▶ What is statistics?
▶ plural of “statistic” which means “a fact or piece of data obtained from a
study of a large quantity of numerical data”.
▶ the practice or science of collecting and analysing numerical data in
large quantities, especially for the purpose of inferring proportions in a
whole from those in a representative sample.
Important Aspects of Statistics
▶ Types of Statistics
Important Aspects of Statistics
▶ Types of Statistics
▶ Descriptive Statistics: A set of brief descriptive coefficients that
summarizes a given data set, which can either be a representation of the
entire population or a sample.
Important Aspects of Statistics
▶ Types of Statistics
▶ Descriptive Statistics: A set of brief descriptive coefficients that
summarizes a given data set, which can either be a representation of the
entire population or a sample.
▶ Inferential Statistics:
Important Aspects of Statistics
▶ Types of Statistics
▶ Descriptive Statistics: A set of brief descriptive coefficients that
summarizes a given data set, which can either be a representation of the
entire population or a sample.
▶ Inferential Statistics:
▶ “inference” meaning “a conclusion reached on the basis of
evidence and reasoning”.
Important Aspects of Statistics
▶ Types of Statistics
▶ Descriptive Statistics: A set of brief descriptive coefficients that
summarizes a given data set, which can either be a representation of the
entire population or a sample.
▶ Inferential Statistics:
▶ “inference” meaning “a conclusion reached on the basis of
evidence and reasoning”.
▶ Inferential statistics makes inferences about populations using data
drawn from the population.
Important Aspects of Statistics
▶ Types of Statistics
▶ Descriptive Statistics: A set of brief descriptive coefficients that
summarizes a given data set, which can either be a representation of the
entire population or a sample.
▶ Inferential Statistics:
▶ “inference” meaning “a conclusion reached on the basis of
evidence and reasoning”.
▶ Inferential statistics makes inferences about populations using data
drawn from the population.

▶ Why Statistics is so important?


Important Aspects of Statistics
▶ Types of Statistics
▶ Descriptive Statistics: A set of brief descriptive coefficients that
summarizes a given data set, which can either be a representation of the
entire population or a sample.
▶ Inferential Statistics:
▶ “inference” meaning “a conclusion reached on the basis of
evidence and reasoning”.
▶ Inferential statistics makes inferences about populations using data
drawn from the population.

▶ Why Statistics is so important?


▶ Ability to make informed decisions in the presence of Uncertainty and
Variation.
Important Aspects of Statistics
▶ Types of Statistics
▶ Descriptive Statistics: A set of brief descriptive coefficients that
summarizes a given data set, which can either be a representation of the
entire population or a sample.
▶ Inferential Statistics:
▶ “inference” meaning “a conclusion reached on the basis of
evidence and reasoning”.
▶ Inferential statistics makes inferences about populations using data
drawn from the population.

▶ Why Statistics is so important?


▶ Ability to make informed decisions in the presence of Uncertainty and
Variation.
▶ Uncertainty is the situation which involves imperfect and/or unknown
information.
Important Aspects of Statistics
▶ Types of Statistics
▶ Descriptive Statistics: A set of brief descriptive coefficients that
summarizes a given data set, which can either be a representation of the
entire population or a sample.
▶ Inferential Statistics:
▶ “inference” meaning “a conclusion reached on the basis of
evidence and reasoning”.
▶ Inferential statistics makes inferences about populations using data
drawn from the population.

▶ Why Statistics is so important?


▶ Ability to make informed decisions in the presence of Uncertainty and
Variation.
▶ Uncertainty is the situation which involves imperfect and/or unknown
information.
▶ Variation is a change or slight difference in condition, amount, or level,
typically within certain limits.
Important Aspects of Statistics

▶ Sources of Variation
Important Aspects of Statistics

▶ Sources of Variation
▶ Inherent
Important Aspects of Statistics

▶ Sources of Variation
▶ Inherent
▶ External
Important Aspects of Statistics

▶ Sources of Variation
▶ Inherent
▶ External

▶ Example: An engineer may be concerned with a specific instrument that is


used to measure sulfur monoxide in the air during pollution studies. If the
engineer has doubts about the effectiveness of the instrument, there are two
sources of variation that must be dealt with.
Important Aspects of Statistics

▶ Sources of Variation
▶ Inherent
▶ External

▶ Example: An engineer may be concerned with a specific instrument that is


used to measure sulfur monoxide in the air during pollution studies. If the
engineer has doubts about the effectiveness of the instrument, there are two
sources of variation that must be dealt with.
▶ The first is the variation in sulfur monoxide values that are found at the
same locale on the same day.
Important Aspects of Statistics

▶ Sources of Variation
▶ Inherent
▶ External

▶ Example: An engineer may be concerned with a specific instrument that is


used to measure sulfur monoxide in the air during pollution studies. If the
engineer has doubts about the effectiveness of the instrument, there are two
sources of variation that must be dealt with.
▶ The first is the variation in sulfur monoxide values that are found at the
same locale on the same day.
▶ The second is the variation between values observed and the true amount
of sulfur monoxide that is in the air at the time.
Important Aspects of Statistics

▶ Sources of Variation
▶ Inherent
▶ External

▶ Example: An engineer may be concerned with a specific instrument that is


used to measure sulfur monoxide in the air during pollution studies. If the
engineer has doubts about the effectiveness of the instrument, there are two
sources of variation that must be dealt with.
▶ The first is the variation in sulfur monoxide values that are found at the
same locale on the same day.
▶ The second is the variation between values observed and the true amount
of sulfur monoxide that is in the air at the time.
▶ If either of these two sources of variation is exceedingly large (according to
some standard set by the engineer), the instrument may need to be replaced.
Important Aspects of Statistics

▶ Sources of Variation
▶ Inherent
▶ External

▶ Example: An engineer may be concerned with a specific instrument that is


used to measure sulfur monoxide in the air during pollution studies. If the
engineer has doubts about the effectiveness of the instrument, there are two
sources of variation that must be dealt with.
▶ The first is the variation in sulfur monoxide values that are found at the
same locale on the same day.
▶ The second is the variation between values observed and the true amount
of sulfur monoxide that is in the air at the time.
▶ If either of these two sources of variation is exceedingly large (according to
some standard set by the engineer), the instrument may need to be replaced.
▶ If the device for measuring sulfur monoxide always gives the same value and
the value is accurate (i.e., it is correct), no statistical analysis is needed.
Important Aspects of Statistics
▶ Variability in Scientific Data
Important Aspects of Statistics
▶ Variability in Scientific Data
▶ Variability refers to the extent to which these data points differ from each
other.
Important Aspects of Statistics
▶ Variability in Scientific Data
▶ Variability refers to the extent to which these data points differ from each
other.
▶ Samples or Observations
Important Aspects of Statistics
▶ Variability in Scientific Data
▶ Variability refers to the extent to which these data points differ from each
other.
▶ Samples or Observations
▶ a small part or quantity intended to show what the whole is like.
Important Aspects of Statistics
▶ Variability in Scientific Data
▶ Variability refers to the extent to which these data points differ from each
other.
▶ Samples or Observations
▶ a small part or quantity intended to show what the whole is like.

▶ Populations
Important Aspects of Statistics
▶ Variability in Scientific Data
▶ Variability refers to the extent to which these data points differ from each
other.
▶ Samples or Observations
▶ a small part or quantity intended to show what the whole is like.

▶ Populations
▶ a finite or infinite collection of items under consideration.
Important Aspects of Statistics
▶ Variability in Scientific Data
▶ Variability refers to the extent to which these data points differ from each
other.
▶ Samples or Observations
▶ a small part or quantity intended to show what the whole is like.

▶ Populations
▶ a finite or infinite collection of items under consideration.

▶ Factors
Important Aspects of Statistics
▶ Variability in Scientific Data
▶ Variability refers to the extent to which these data points differ from each
other.
▶ Samples or Observations
▶ a small part or quantity intended to show what the whole is like.

▶ Populations
▶ a finite or infinite collection of items under consideration.

▶ Factors
▶ characteristics or quantities associated with population.
Important Aspects of Statistics
▶ Variability in Scientific Data
▶ Variability refers to the extent to which these data points differ from each
other.
▶ Samples or Observations
▶ a small part or quantity intended to show what the whole is like.

▶ Populations
▶ a finite or infinite collection of items under consideration.

▶ Factors
▶ characteristics or quantities associated with population.

▶ Experimental Design
Important Aspects of Statistics
▶ Variability in Scientific Data
▶ Variability refers to the extent to which these data points differ from each
other.
▶ Samples or Observations
▶ a small part or quantity intended to show what the whole is like.

▶ Populations
▶ a finite or infinite collection of items under consideration.

▶ Factors
▶ characteristics or quantities associated with population.

▶ Experimental Design
▶ a design developed by the experimenter by controlling the factors in the
data.
Important Aspects of Statistics
▶ Variability in Scientific Data
▶ Variability refers to the extent to which these data points differ from each
other.
▶ Samples or Observations
▶ a small part or quantity intended to show what the whole is like.

▶ Populations
▶ a finite or infinite collection of items under consideration.

▶ Factors
▶ characteristics or quantities associated with population.

▶ Experimental Design
▶ a design developed by the experimenter by controlling the factors in the
data.
▶ Observational Study
Important Aspects of Statistics
▶ Variability in Scientific Data
▶ Variability refers to the extent to which these data points differ from each
other.
▶ Samples or Observations
▶ a small part or quantity intended to show what the whole is like.

▶ Populations
▶ a finite or infinite collection of items under consideration.

▶ Factors
▶ characteristics or quantities associated with population.

▶ Experimental Design
▶ a design developed by the experimenter by controlling the factors in the
data.
▶ Observational Study
▶ a study of the data with no control on the factors.
The Role of Probability

▶ Concepts in probability form a major component that supplements statistical


methods and helps us gauge the strength of the statistical inference.

What is Probability?

How will it help to understand data better?


The Role of Probability

▶ Concepts in probability form a major component that supplements statistical


methods and helps us gauge the strength of the statistical inference.

What is Probability?
▶ The theory with sound mathematical foundation (Axioms) which involves
dealing with uncertainty and variation.

How will it help to understand data better?


The Role of Probability

▶ Concepts in probability form a major component that supplements statistical


methods and helps us gauge the strength of the statistical inference.

What is Probability?
▶ The theory with sound mathematical foundation (Axioms) which involves
dealing with uncertainty and variation.

How will it help to understand data better?


▶ Once the theory is used, it will provide a bridge between ”data” and ”model”
developed to understand the data better.
How Probability and Statistical Inference Work Together?

▶ Inductive Reasoning: The sample


along with inferential statistics
allows us to draw conclusions about
the population, with inferential
statistics making clear use of
elements of probability.
How Probability and Statistical Inference Work Together?

▶ Inductive Reasoning: The sample


along with inferential statistics
allows us to draw conclusions about
the population, with inferential
statistics making clear use of
elements of probability.
▶ Deductive Reasoning: Elements in
probability allow us to draw
conclusions about characteristics of
hypothetical data taken from the
population, based on known features
of the population.
Sampling Procedures: Collection of Data

Why Sampling?
▶ It is not always possible to study entire population.

What is Sampling?
Sampling Procedures: Collection of Data

Why Sampling?
▶ It is not always possible to study entire population.
▶ Time consuming.

What is Sampling?
Sampling Procedures: Collection of Data

Why Sampling?
▶ It is not always possible to study entire population.
▶ Time consuming.
▶ Not budget friendly.

What is Sampling?
Sampling Procedures: Collection of Data

Why Sampling?
▶ It is not always possible to study entire population.
▶ Time consuming.
▶ Not budget friendly.

What is Sampling?
▶ Sampling is the process to collect a limited amount of data to form a sample.
Sampling Procedures: Collection of Data

Why Sampling?
▶ It is not always possible to study entire population.
▶ Time consuming.
▶ Not budget friendly.

What is Sampling?
▶ Sampling is the process to collect a limited amount of data to form a sample.
▶ How can it be done?
Sampling Procedures: Collection of Data

Why Sampling?
▶ It is not always possible to study entire population.
▶ Time consuming.
▶ Not budget friendly.

What is Sampling?
▶ Sampling is the process to collect a limited amount of data to form a sample.
▶ How can it be done?
▶ It might seem straightforward, but it is not so, as every method may have a
tendency to produce biased sample which is not a good representative of the
population under study.
Sampling Procedures: Collection of Data

Why Sampling?
▶ It is not always possible to study entire population.
▶ Time consuming.
▶ Not budget friendly.

What is Sampling?
▶ Sampling is the process to collect a limited amount of data to form a sample.
▶ How can it be done?
▶ It might seem straightforward, but it is not so, as every method may have a
tendency to produce biased sample which is not a good representative of the
population under study.
▶ Are there any specific methods?
Sampling Methods

Simple Random Sampling


▶ A method in which a subset of a population in which each member of the
subset has an equal probability of being chosen.

Stratified Random Sampling


Sampling Methods

Simple Random Sampling


▶ A method in which a subset of a population in which each member of the
subset has an equal probability of being chosen.
▶ A simple random sample is meant to be an unbiased representation of a group.

Stratified Random Sampling


Sampling Methods

Simple Random Sampling


▶ A method in which a subset of a population in which each member of the
subset has an equal probability of being chosen.
▶ A simple random sample is meant to be an unbiased representation of a group.

Stratified Random Sampling


▶ A method in which the population is divided into number of different
homogeneous subgroups, called “strata”, and then the simple random sampling
is done on the strata.
Experimental Design

Why Design an Experiment?


▶ To control the factors.

What is an Experimental Design?


Experimental Design

Why Design an Experiment?


▶ To control the factors.
▶ To have better understanding of the effect of factors on population.

What is an Experimental Design?


Experimental Design

Why Design an Experiment?


▶ To control the factors.
▶ To have better understanding of the effect of factors on population.

What is an Experimental Design?


▶ In an experiment, we deliberately change one or more process variables (or
factors) in order to observe the effect the changes have on one or more
response variables.
Experimental Design

Why Design an Experiment?


▶ To control the factors.
▶ To have better understanding of the effect of factors on population.

What is an Experimental Design?


▶ In an experiment, we deliberately change one or more process variables (or
factors) in order to observe the effect the changes have on one or more
response variables.
▶ The (statistical) design of experiments (DOE) is an efficient procedure for
planning experiments so that the data obtained can be analysed to yield valid
and objective conclusions.
Measures of Central Tendency
The Sample Mean: Individual Data
▶ The sample mean is also called the ‘arithmetic mean’ or the
‘average’.
The Sample Mean: Individual Data
▶ The sample mean is also called the ‘arithmetic mean’ or the
‘average’.
▶ Let x1 , x2 , . . . , xn denote the observations.
The Sample Mean: Individual Data
▶ The sample mean is also called the ‘arithmetic mean’ or the
‘average’.
▶ Let x1 , x2 , . . . , xn denote the observations.
▶ The sample mean, denoted by x̄, is

n
1X x1 + x2 + . . . + xn
x̄ = xi = .
n n
i=1
The Sample Mean: Individual Data
▶ The sample mean is also called the ‘arithmetic mean’ or the
‘average’.
▶ Let x1 , x2 , . . . , xn denote the observations.
▶ The sample mean, denoted by x̄, is

n
1X x1 + x2 + . . . + xn
x̄ = xi = .
n n
i=1

▶ Example: The intelligence quotients (IQs) of ten students in a


class are 70, 120, 110, 101, 88, 83, 95, 98, 107, 100. Find the
mean IQ.
The Sample Mean: Individual Data
▶ The sample mean is also called the ‘arithmetic mean’ or the
‘average’.
▶ Let x1 , x2 , . . . , xn denote the observations.
▶ The sample mean, denoted by x̄, is

n
1X x1 + x2 + . . . + xn
x̄ = xi = .
n n
i=1

▶ Example: The intelligence quotients (IQs) of ten students in a


class are 70, 120, 110, 101, 88, 83, 95, 98, 107, 100. Find the
mean IQ.
▶ The mean IQ is

70 + 120 + 110 + 101 + 88 + 83 + 95 + 98 + 107 + 100


x̄ = = 97.2
10
The Sample Mean: Discrete Frequency Data
▶ Let fi be the frequency of the variable xi , i = 1, 2, . . . , n.
The Sample Mean: Discrete Frequency Data
▶ Let fi be the frequency of the variable xi , i = 1, 2, . . . , n.
▶ The sample mean or weighted sample mean is

n
f1 x1 + f2 x2 + · · · + fn xn 1 X
x̄ = = fi xi ,
f1 + f2 · · · + fn N
i=1

Pn
where N = i=1 fi .
The Sample Mean: Discrete Frequency Data
▶ Let fi be the frequency of the variable xi , i = 1, 2, . . . , n.
▶ The sample mean or weighted sample mean is

n
f1 x1 + f2 x2 + · · · + fn xn 1 X
x̄ = = fi xi ,
f1 + f2 · · · + fn N
i=1

Pn
where N = i=1 fi .
▶ Example: Frequency distribution of the number of telephone calls
received at an exchange in 245 successive one-minute intervals are
No. of Calls 0 1 2 3 4 5 6 7
Frequency 14 21 25 43 51 40 39 12
Find the mean number of calls per minute at the exchange.
The Sample Mean: Discrete Frequency Data
▶ Let fi be the frequency of the variable xi , i = 1, 2, . . . , n.
▶ The sample mean or weighted sample mean is

n
f1 x1 + f2 x2 + · · · + fn xn 1 X
x̄ = = fi xi ,
f1 + f2 · · · + fn N
i=1

Pn
where N = i=1 fi .
▶ Example: Frequency distribution of the number of telephone calls
received at an exchange in 245 successive one-minute intervals are
No. of Calls 0 1 2 3 4 5 6 7
Frequency 14 21 25 43 51 40 39 12
Find the mean number of calls per minute at the exchange.
▶ The mean number of calls per minute at the exchange is

n
1 X 922
x̄ = fi x i = = 3.763.
N 245
i=1
The Sample Mean

▶ Let xi , i = 1, 2, . . . , n be a sample. If xi are large, the calculation


of the sample mean can be substantially reduced by using the
following method.
The Sample Mean

▶ Let xi , i = 1, 2, . . . , n be a sample. If xi are large, the calculation


of the sample mean can be substantially reduced by using the
following method.
▶ Take yi = xi −a
b , for all i = 1, . . . , n, where a and b ̸= 0 are
constants. Therefore, xi = a + byi . Then

x̄ = a + bȳ.
The Sample Mean

▶ Let xi , i = 1, 2, . . . , n be a sample. If xi are large, the calculation


of the sample mean can be substantially reduced by using the
following method.
▶ Take yi = xi −a
b , for all i = 1, . . . , n, where a and b ̸= 0 are
constants. Therefore, xi = a + byi . Then

x̄ = a + bȳ.

▶ Note: In case of grouped or continuous frequency distribution, xi


is taken as the mid-value of the corresponding ith class.
The Sample Mean: Grouped Frequency Data

▶ Example: Calculate the sample mean for the following frequency


distribution:
Class Interval 0-8 8-16 16-24 24-32 32-40 40-48
Frequency 8 7 16 24 15 7
The Sample Mean: Grouped Frequency Data

▶ Example: Calculate the sample mean for the following frequency


distribution:
Class Interval 0-8 8-16 16-24 24-32 32-40 40-48
Frequency 8 7 16 24 15 7
▶ Ans: 25.4026.
The Sample Median: Individual Data

▶ Let x1 , x2 , . . . , xn be the observations.


The Sample Median: Individual Data

▶ Let x1 , x2 , . . . , xn be the observations.


▶ Arrange the observations in increasing order of magnitude, say
x(1) , x(2) , . . . , x(n) . Then the sample median, denoted by x̃, is

 x((n+1)/2) if n is odd,
x̃ = x + x(n/2+1)
 (n/2) if n is even.
2
The Sample Median: Individual Data

▶ Let x1 , x2 , . . . , xn be the observations.


▶ Arrange the observations in increasing order of magnitude, say
x(1) , x(2) , . . . , x(n) . Then the sample median, denoted by x̃, is

 x((n+1)/2) if n is odd,
x̃ = x + x(n/2+1)
 (n/2) if n is even.
2
▶ Example: The sample median of the values 8, 4, 7, 6, 2, i.e.,
2, 4, 6, 7, 8 is 6.
Example: The sample median of 10, 15, 30, 70, 40, 80, i.e.,
10, 15, 30, 40, 70, 80 is (30 + 40)/2 = 35.
The Sample Median: Discrete Frequency Data

▶ In case of discrete frequency distribution, the sample median is obtained by


considering the cumulative frequencies (c.f.). The steps for calculating median
are as follows:
The Sample Median: Discrete Frequency Data

▶ In case of discrete frequency distribution, the sample median is obtained by


considering the cumulative frequencies (c.f.). The steps for calculating median
are as follows:
▶ Let fi be the frequency of xi , i = 1, 2, . . . , n.
The Sample Median: Discrete Frequency Data

▶ In case of discrete frequency distribution, the sample median is obtained by


considering the cumulative frequencies (c.f.). The steps for calculating median
are as follows:
▶ Let fi be the frequency of xi , i = 1, 2, . . . , n.
▶ Find N/2, where N = ni=1 fi .
P
The Sample Median: Discrete Frequency Data

▶ In case of discrete frequency distribution, the sample median is obtained by


considering the cumulative frequencies (c.f.). The steps for calculating median
are as follows:
▶ Let fi be the frequency of xi , i = 1, 2, . . . , n.
▶ Find N/2, where N = ni=1 fi .
P

▶ Find xi corresponding to the cumulative frequency just greater than N/2,


which is the median.
The Sample Median: Discrete Frequency Data

▶ In case of discrete frequency distribution, the sample median is obtained by


considering the cumulative frequencies (c.f.). The steps for calculating median
are as follows:
▶ Let fi be the frequency of xi , i = 1, 2, . . . , n.
▶ Find N/2, where N = ni=1 fi .
P

▶ Find xi corresponding to the cumulative frequency just greater than N/2,


which is the median.
▶ Example: Obtain the sample median for the following frequency distribution:
x 1 2 3 4 5 6 7 8 9
f 8 10 11 16 20 25 15 9 6
The Sample Median: Discrete Frequency Data

▶ In case of discrete frequency distribution, the sample median is obtained by


considering the cumulative frequencies (c.f.). The steps for calculating median
are as follows:
▶ Let fi be the frequency of xi , i = 1, 2, . . . , n.
▶ Find N/2, where N = ni=1 fi .
P

▶ Find xi corresponding to the cumulative frequency just greater than N/2,


which is the median.
▶ Example: Obtain the sample median for the following frequency distribution:
x 1 2 3 4 5 6 7 8 9
f 8 10 11 16 20 25 15 9 6

x 1 2 3 4 5 6 7 8 9
▶ f 8 10 11 16 20 25 15 9 6
c.f. 8 18 29 45 65 90 105 114 120
The Sample Median: Discrete Frequency Data

▶ In case of discrete frequency distribution, the sample median is obtained by


considering the cumulative frequencies (c.f.). The steps for calculating median
are as follows:
▶ Let fi be the frequency of xi , i = 1, 2, . . . , n.
▶ Find N/2, where N = ni=1 fi .
P

▶ Find xi corresponding to the cumulative frequency just greater than N/2,


which is the median.
▶ Example: Obtain the sample median for the following frequency distribution:
x 1 2 3 4 5 6 7 8 9
f 8 10 11 16 20 25 15 9 6

x 1 2 3 4 5 6 7 8 9
▶ f 8 10 11 16 20 25 15 9 6
c.f. 8 18 29 45 65 90 105 114 120
▶ N = 120 and N/2 = 60. The cumulative frequency just greater than N/2 is 65
and the value of x corresponding to 65 is 5. So, the sample median is 5.
The Sample Median: Grouped Frequency Data

▶ In the case of continuous frequency distribution, the class


corresponding to the c.f. just greater than N/2 is called the
median class and the value of median is obtained by the
following formula:

h
Median = l + (N/2 − c),
f

where
▶ l is the lower limit of the median class
▶ f is the frequency of the median class
▶ h is the magnitude of the median class
▶ c is the c.f.of the class preceding the median class

P
N= f
The Sample Median: Grouped Frequency Data

▶ Example: Find the median wage of the following distribution:


Wages (in rupees) No. of workers
2000-3000 3
3000-4000 5
4000-5000 20
5000-6000 10
6000-7000 5
The Sample Median: Grouped Frequency Data

▶ Example: Find the median wage of the following distribution:


Wages (in rupees) No. of workers
2000-3000 3
3000-4000 5
4000-5000 20
5000-6000 10
6000-7000 5
▶ Solution:
Wages (in rupees) No. of workers c.f.
2000-3000 3 3
3000-4000 5 8
4000-5000 20 28
5000-6000 10 38
6000-7000 5 43
The Sample Median: Grouped Frequency Data

▶ N = 43, N/2 = 21.5.


The Sample Median: Grouped Frequency Data

▶ N = 43, N/2 = 21.5.


▶ Cumulative frequency just greater than 21.5 is 28 and the
corresponding class is 4000 − 5000.
The Sample Median: Grouped Frequency Data

▶ N = 43, N/2 = 21.5.


▶ Cumulative frequency just greater than 21.5 is 28 and the
corresponding class is 4000 − 5000.
▶ Median class is 4000 − 5000.
The Sample Median: Grouped Frequency Data

▶ N = 43, N/2 = 21.5.


▶ Cumulative frequency just greater than 21.5 is 28 and the
corresponding class is 4000 − 5000.
▶ Median class is 4000 − 5000.
▶ Here l = 4000, h = 1000, f = 20, c = 8. Therefore,

h 1000
Median = l + (N/2 − c) = 4000 + (21.5 − 8) = 4675
f 20
The Sample Median: Grouped Frequency Data

▶ N = 43, N/2 = 21.5.


▶ Cumulative frequency just greater than 21.5 is 28 and the
corresponding class is 4000 − 5000.
▶ Median class is 4000 − 5000.
▶ Here l = 4000, h = 1000, f = 20, c = 8. Therefore,

h 1000
Median = l + (N/2 − c) = 4000 + (21.5 − 8) = 4675
f 20

▶ The median wage is 4675 rupees.


The Trimmed Mean

Trimmed mean can be calculated as follows:


▶ Let the sample size be n and p% trimmed mean is desired.
The Trimmed Mean

Trimmed mean can be calculated as follows:


▶ Let the sample size be n and p% trimmed mean is desired.
▶ The number of data points to be trimmed are, nearest whole
integer to np/100.
The Trimmed Mean

Trimmed mean can be calculated as follows:


▶ Let the sample size be n and p% trimmed mean is desired.
▶ The number of data points to be trimmed are, nearest whole
integer to np/100.
▶ Arrange the sample values in increasing order.
The Trimmed Mean

Trimmed mean can be calculated as follows:


▶ Let the sample size be n and p% trimmed mean is desired.
▶ The number of data points to be trimmed are, nearest whole
integer to np/100.
▶ Arrange the sample values in increasing order.
▶ Trim an equal number of sample values from each end.
The Trimmed Mean

Trimmed mean can be calculated as follows:


▶ Let the sample size be n and p% trimmed mean is desired.
▶ The number of data points to be trimmed are, nearest whole
integer to np/100.
▶ Arrange the sample values in increasing order.
▶ Trim an equal number of sample values from each end.
▶ Find the mean of the remaining sample values.
The Trimmed Mean

Trimmed mean can be calculated as follows:


▶ Let the sample size be n and p% trimmed mean is desired.
▶ The number of data points to be trimmed are, nearest whole
integer to np/100.
▶ Arrange the sample values in increasing order.
▶ Trim an equal number of sample values from each end.
▶ Find the mean of the remaining sample values.
▶ The resulting mean of the remaining sample values is called the
‘p% trimmed mean’.
The Trimmed Mean

Trimmed mean can be calculated as follows:


▶ Let the sample size be n and p% trimmed mean is desired.
▶ The number of data points to be trimmed are, nearest whole
integer to np/100.
▶ Arrange the sample values in increasing order.
▶ Trim an equal number of sample values from each end.
▶ Find the mean of the remaining sample values.
▶ The resulting mean of the remaining sample values is called the
‘p% trimmed mean’.
▶ Example: Find the 5%, 10% and 20% trimmed means of
90, 30, 105, 79, 99, 80, 39, 149, 191, 13, 232, 240, 5, 274, 107, 470.
The Geometric Mean
▶ Let x1 , x2 , . . . , xn be an observed sample.
The Geometric Mean
▶ Let x1 , x2 , . . . , xn be an observed sample.
▶ The geometric mean G is given as


G= n
x1 x2 . . . xn = (x1 x2 . . . xn )1/n .
The Geometric Mean
▶ Let x1 , x2 , . . . , xn be an observed sample.
▶ The geometric mean G is given as


G= n
x1 x2 . . . xn = (x1 x2 . . . xn )1/n .

▶ Taking logarithm on both sides gives

n
1X
log G = log xi
n
i=1
The Geometric Mean
▶ Let x1 , x2 , . . . , xn be an observed sample.
▶ The geometric mean G is given as


G= n
x1 x2 . . . xn = (x1 x2 . . . xn )1/n .

▶ Taking logarithm on both sides gives

n
1X
log G = log xi
n
i=1

▶ Taking antilog gives

n
!
1X
G = antilog log xi .
n
i=1
The Geometric Mean
▶ Let x1 , x2 , . . . , xn be an observed sample.
▶ The geometric mean G is given as


G= n
x1 x2 . . . xn = (x1 x2 . . . xn )1/n .

▶ Taking logarithm on both sides gives

n
1X
log G = log xi
n
i=1

▶ Taking antilog gives

n
!
1X
G = antilog log xi .
n
i=1

▶ Example: Find the geometric mean of 2, 4, 8, 12, 16, 24.


The Geometric Mean
▶ Let x1 , x2 , . . . , xn be an observed sample.
▶ The geometric mean G is given as


G= n
x1 x2 . . . xn = (x1 x2 . . . xn )1/n .

▶ Taking logarithm on both sides gives

n
1X
log G = log xi
n
i=1

▶ Taking antilog gives

n
!
1X
G = antilog log xi .
n
i=1

▶ Example: Find the geometric mean of 2, 4, 8, 12, 16, 24.


▶ Ans: 8.158
The Geometric Mean
▶ Let f1 , f2 , . . . , fn be the frequency distribution of the observations x1 , x2 , . . . , xn ,
respectively. Then the geometric mean is given by

G = (x1f1 x2f2 . . . xnfn )1/N ,


Pn
where N = i=1 fi .
The Geometric Mean
▶ Let f1 , f2 , . . . , fn be the frequency distribution of the observations x1 , x2 , . . . , xn ,
respectively. Then the geometric mean is given by

G = (x1f1 x2f2 . . . xnfn )1/N ,

where N = ni=1 fi .
P

▶ Taking logarithm on both sides gives


n
1 X
log G = fi log xi
N i=1
The Geometric Mean
▶ Let f1 , f2 , . . . , fn be the frequency distribution of the observations x1 , x2 , . . . , xn ,
respectively. Then the geometric mean is given by

G = (x1f1 x2f2 . . . xnfn )1/N ,

where N = ni=1 fi .
P

▶ Taking logarithm on both sides gives


n
1 X
log G = fi log xi
N i=1

▶ Taking antilog gives

n
!
1 X
G = antilog fi log xi .
N i=1
The Geometric Mean
▶ Let f1 , f2 , . . . , fn be the frequency distribution of the observations x1 , x2 , . . . , xn ,
respectively. Then the geometric mean is given by

G = (x1f1 x2f2 . . . xnfn )1/N ,

where N = ni=1 fi .
P

▶ Taking logarithm on both sides gives


n
1 X
log G = fi log xi
N i=1

▶ Taking antilog gives

n
!
1 X
G = antilog fi log xi .
N i=1

▶ Example: Find the geometric mean for the following distribution:


Marks 0-10 10-20 20-30 30-40 40-50
Number of Students 5 7 15 25 8
The Geometric Mean
▶ Let f1 , f2 , . . . , fn be the frequency distribution of the observations x1 , x2 , . . . , xn ,
respectively. Then the geometric mean is given by

G = (x1f1 x2f2 . . . xnfn )1/N ,

where N = ni=1 fi .
P

▶ Taking logarithm on both sides gives


n
1 X
log G = fi log xi
N i=1

▶ Taking antilog gives

n
!
1 X
G = antilog fi log xi .
N i=1

▶ Example: Find the geometric mean for the following distribution:


Marks 0-10 10-20 20-30 30-40 40-50
Number of Students 5 7 15 25 8
▶ Ans: 25.64 marks
The Harmonic Mean
▶ The harmonic mean H of n observations x1 , x2 , . . . , xn is given by
  n
1 1 1 1 1 1X 1
= + + ··· + = .
H n x1 x2 xn n xi
i=1
The Harmonic Mean
▶ The harmonic mean H of n observations x1 , x2 , . . . , xn is given by
  n
1 1 1 1 1 1X 1
= + + ··· + = .
H n x1 x2 xn n xi
i=1

▶ If f1 , f2 , . . . , fn is the frequency distribution of the observations


x1 , x2 , . . . , xn , respectively. Then the harmonic mean H is given by
  n
1 1 f1 f2 fn 1 X fi
= + + ··· + = ,
H N x1 x2 xn N xi
i=1

Pn
where N = i=1 fi .
The Harmonic Mean
▶ The harmonic mean H of n observations x1 , x2 , . . . , xn is given by
  n
1 1 1 1 1 1X 1
= + + ··· + = .
H n x1 x2 xn n xi
i=1

▶ If f1 , f2 , . . . , fn is the frequency distribution of the observations


x1 , x2 , . . . , xn , respectively. Then the harmonic mean H is given by
  n
1 1 f1 f2 fn 1 X fi
= + + ··· + = ,
H N x1 x2 xn N xi
i=1

Pn
where N = i=1 fi .
▶ Example: A cyclist pedals from his house to his college at a speed of
10 kmph and back from college to his house at 15 kmph. Find the
average speed.
The Harmonic Mean
▶ The harmonic mean H of n observations x1 , x2 , . . . , xn is given by
  n
1 1 1 1 1 1X 1
= + + ··· + = .
H n x1 x2 xn n xi
i=1

▶ If f1 , f2 , . . . , fn is the frequency distribution of the observations


x1 , x2 , . . . , xn , respectively. Then the harmonic mean H is given by
  n
1 1 f1 f2 fn 1 X fi
= + + ··· + = ,
H N x1 x2 xn N xi
i=1

Pn
where N = i=1 fi .
▶ Example: A cyclist pedals from his house to his college at a speed of
10 kmph and back from college to his house at 15 kmph. Find the
average speed.
▶ Ans: 12 kmph
The Mode

▶ Mode is defined to be that value which occurs most often.


The Mode

▶ Mode is defined to be that value which occurs most often.


▶ Example 1: The mode for the set of values 3, 7, 2, 7, 5, 7, 3 is 7. Since 7 occurs
maximum number of times.
The Mode

▶ Mode is defined to be that value which occurs most often.


▶ Example 1: The mode for the set of values 3, 7, 2, 7, 5, 7, 3 is 7. Since 7 occurs
maximum number of times.
▶ Example 2: Find the mode of the following set of observations:

3, 5, 7, 5, 9, 7, 5, 7, 6, 3, 9, 5, 6, 6, 3.
The Mode

▶ Mode is defined to be that value which occurs most often.


▶ Example 1: The mode for the set of values 3, 7, 2, 7, 5, 7, 3 is 7. Since 7 occurs
maximum number of times.
▶ Example 2: Find the mode of the following set of observations:

3, 5, 7, 5, 9, 7, 5, 7, 6, 3, 9, 5, 6, 6, 3.

Numbers 3 5 6 7 9

Frequency 3 4 3 3 2
The Mode

▶ Mode is defined to be that value which occurs most often.


▶ Example 1: The mode for the set of values 3, 7, 2, 7, 5, 7, 3 is 7. Since 7 occurs
maximum number of times.
▶ Example 2: Find the mode of the following set of observations:

3, 5, 7, 5, 9, 7, 5, 7, 6, 3, 9, 5, 6, 6, 3.

Numbers 3 5 6 7 9

Frequency 3 4 3 3 2
▶ The mode is 5.
The Mode

▶ Mode is defined to be that value which occurs most often.


▶ Example 1: The mode for the set of values 3, 7, 2, 7, 5, 7, 3 is 7. Since 7 occurs
maximum number of times.
▶ Example 2: Find the mode of the following set of observations:

3, 5, 7, 5, 9, 7, 5, 7, 6, 3, 9, 5, 6, 6, 3.

Numbers 3 5 6 7 9

Frequency 3 4 3 3 2
▶ The mode is 5.
▶ Example 3: The data set 1, 1, 2, 4, 4 have two modes 1 and 4, i.e., its mode is
not unique.
The Mode

▶ Mode is defined to be that value which occurs most often.


▶ Example 1: The mode for the set of values 3, 7, 2, 7, 5, 7, 3 is 7. Since 7 occurs
maximum number of times.
▶ Example 2: Find the mode of the following set of observations:

3, 5, 7, 5, 9, 7, 5, 7, 6, 3, 9, 5, 6, 6, 3.

Numbers 3 5 6 7 9

Frequency 3 4 3 3 2
▶ The mode is 5.
▶ Example 3: The data set 1, 1, 2, 4, 4 have two modes 1 and 4, i.e., its mode is
not unique.
▶ A dataset is said to be unimodal if it has one mode.
The Mode

▶ Mode is defined to be that value which occurs most often.


▶ Example 1: The mode for the set of values 3, 7, 2, 7, 5, 7, 3 is 7. Since 7 occurs
maximum number of times.
▶ Example 2: Find the mode of the following set of observations:

3, 5, 7, 5, 9, 7, 5, 7, 6, 3, 9, 5, 6, 6, 3.

Numbers 3 5 6 7 9

Frequency 3 4 3 3 2
▶ The mode is 5.
▶ Example 3: The data set 1, 1, 2, 4, 4 have two modes 1 and 4, i.e., its mode is
not unique.
▶ A dataset is said to be unimodal if it has one mode.
▶ A dataset with two modes is referred to as bimodal.
The Mode

▶ Mode is defined to be that value which occurs most often.


▶ Example 1: The mode for the set of values 3, 7, 2, 7, 5, 7, 3 is 7. Since 7 occurs
maximum number of times.
▶ Example 2: Find the mode of the following set of observations:

3, 5, 7, 5, 9, 7, 5, 7, 6, 3, 9, 5, 6, 6, 3.

Numbers 3 5 6 7 9

Frequency 3 4 3 3 2
▶ The mode is 5.
▶ Example 3: The data set 1, 1, 2, 4, 4 have two modes 1 and 4, i.e., its mode is
not unique.
▶ A dataset is said to be unimodal if it has one mode.
▶ A dataset with two modes is referred to as bimodal.
▶ A dataset with more than two modes is called multimodal.
The Mode: Grouped Frequency Data
▶ In the case of frequency distribution of a continuous variable, mode is obtained
by the following formula

h(f1 − f0 )
Mode = l + ,
2f1 − (f0 + f2 )

where l - the lower limit of the modal class (i.e, class interval with maximum
frequency)
f1 - frequency of the modal class
f0 - frequency of the pre-modal class
f2 - frequency of the post-modal class
h - size of the modal class (upper limit – lower limit).
The Mode: Grouped Frequency Data
▶ In the case of frequency distribution of a continuous variable, mode is obtained
by the following formula

h(f1 − f0 )
Mode = l + ,
2f1 − (f0 + f2 )

where l - the lower limit of the modal class (i.e, class interval with maximum
frequency)
f1 - frequency of the modal class
f0 - frequency of the pre-modal class
f2 - frequency of the post-modal class
h - size of the modal class (upper limit – lower limit).
▶ Example: The heights of 50 students are recorded as
Height (in cm) 125-130 130-135 135-140 140-145 145-150
No. of students 7 14 10 10 9
The Mode: Grouped Frequency Data
▶ In the case of frequency distribution of a continuous variable, mode is obtained
by the following formula

h(f1 − f0 )
Mode = l + ,
2f1 − (f0 + f2 )

where l - the lower limit of the modal class (i.e, class interval with maximum
frequency)
f1 - frequency of the modal class
f0 - frequency of the pre-modal class
f2 - frequency of the post-modal class
h - size of the modal class (upper limit – lower limit).
▶ Example: The heights of 50 students are recorded as
Height (in cm) 125-130 130-135 135-140 140-145 145-150
No. of students 7 14 10 10 9
▶ Here, the maximum frequency is 14 and the corresponding class 130-135 is the
modal class. Therefore, l = 130, h = 5, f0 = 14, f1 = 7, f2 = 10.
The Mode: Grouped Frequency Data
▶ In the case of frequency distribution of a continuous variable, mode is obtained
by the following formula

h(f1 − f0 )
Mode = l + ,
2f1 − (f0 + f2 )

where l - the lower limit of the modal class (i.e, class interval with maximum
frequency)
f1 - frequency of the modal class
f0 - frequency of the pre-modal class
f2 - frequency of the post-modal class
h - size of the modal class (upper limit – lower limit).
▶ Example: The heights of 50 students are recorded as
Height (in cm) 125-130 130-135 135-140 140-145 145-150
No. of students 7 14 10 10 9
▶ Here, the maximum frequency is 14 and the corresponding class 130-135 is the
modal class. Therefore, l = 130, h = 5, f0 = 14, f1 = 7, f2 = 10.
▶ Mode = 133.18.
Empirical Relationship between Mean, Median and Mode
▶ A frequency distribution is said to be symmetrical if

Mean = Median = Mode


Empirical Relationship between Mean, Median and Mode
▶ A frequency distribution is said to be symmetrical if

Mean = Median = Mode

▶ A frequency distribution is said to be negatively skewed if

Mean < Median < Mode


Empirical Relationship between Mean, Median and Mode
▶ A frequency distribution is said to be symmetrical if

Mean = Median = Mode

▶ A frequency distribution is said to be negatively skewed if

Mean < Median < Mode

▶ A frequency distribution is said to be positively skewed if

Mean > Median > Mode


Empirical Relationship between Mean, Median and Mode

▶ Karl Pearson has given an empirical relation connecting mean,


median and mode for moderately skewed frequency distribution
data
Mean − Mode = 3(Mean − Median)

or
Mode = 3Median − 2Mean
Measures of Dispersion
The Mean Absolute Deviation

▶ Let x1 , x2 , . . . , xn denote the observations in a sample.


The Mean Absolute Deviation

▶ Let x1 , x2 , . . . , xn denote the observations in a sample.


▶ Then deviations from the mean are

x1 − x̄, x2 − x̄, . . . , xn − x̄
The Mean Absolute Deviation

▶ Let x1 , x2 , . . . , xn denote the observations in a sample.


▶ Then deviations from the mean are

x1 − x̄, x2 − x̄, . . . , xn − x̄

Pn
▶ Sum of Deviations = i=1 (xi − x̄) = 0.
The Mean Absolute Deviation

▶ Let x1 , x2 , . . . , xn denote the observations in a sample.


▶ Then deviations from the mean are

x1 − x̄, x2 − x̄, . . . , xn − x̄

Pn
▶ Sum of Deviations = i=1 (xi − x̄) = 0.
▶ The Mean Absolute Deviation is given as

n
1X
Mean Absolute Deviation = |xi − x̄|
n
i=1
Sample Variance and Sample Standard Deviation
▶ The Sample Variance, denoted by s2 , is given by

n n
!
1 X 1 X
s2 = (xi − x̄)2 = xi2 − nx̄2 .
n−1 n−1
i=1 i=1
Sample Variance and Sample Standard Deviation
▶ The Sample Variance, denoted by s2 , is given by

n n
!
1 X 1 X
s2 = (xi − x̄)2 = xi2 − nx̄2 .
n−1 n−1
i=1 i=1

▶ The Sample Standard Deviation, denoted by s, is the positive



square root of s2 , that is s = s2 .
Sample Variance and Sample Standard Deviation
▶ The Sample Variance, denoted by s2 , is given by

n n
!
1 X 1 X
s2 = (xi − x̄)2 = xi2 − nx̄2 .
n−1 n−1
i=1 i=1

▶ The Sample Standard Deviation, denoted by s, is the positive



square root of s2 , that is s = s2 .
▶ Units for Standard Deviation and Variance
Sample Variance and Sample Standard Deviation
▶ The Sample Variance, denoted by s2 , is given by

n n
!
1 X 1 X
s2 = (xi − x̄)2 = xi2 − nx̄2 .
n−1 n−1
i=1 i=1

▶ The Sample Standard Deviation, denoted by s, is the positive



square root of s2 , that is s = s2 .
▶ Units for Standard Deviation and Variance
▶ Sample Variance is the average of the square of deviations from
the mean.
Sample Variance and Sample Standard Deviation
▶ The Sample Variance, denoted by s2 , is given by

n n
!
1 X 1 X
s2 = (xi − x̄)2 = xi2 − nx̄2 .
n−1 n−1
i=1 i=1

▶ The Sample Standard Deviation, denoted by s, is the positive



square root of s2 , that is s = s2 .
▶ Units for Standard Deviation and Variance
▶ Sample Variance is the average of the square of deviations from
the mean.
▶ If the data is measured in unit, then sample variance is measured
in unit2 .
Sample Variance and Sample Standard Deviation
▶ The Sample Variance, denoted by s2 , is given by

n n
!
1 X 1 X
s2 = (xi − x̄)2 = xi2 − nx̄2 .
n−1 n−1
i=1 i=1

▶ The Sample Standard Deviation, denoted by s, is the positive



square root of s2 , that is s = s2 .
▶ Units for Standard Deviation and Variance
▶ Sample Variance is the average of the square of deviations from
the mean.
▶ If the data is measured in unit, then sample variance is measured
in unit2 .
▶ Observe that sample standard deviation s is measured in unit, as
it is square root of s2 .
Sample Range and Coefficient of Variation

▶ The Sample Range is defined as

Range = xmax − xmin ,

where xmax = max{x1 , . . . , xn } and xmin = min{x1 , . . . , xn }.


Sample Range and Coefficient of Variation

▶ The Sample Range is defined as

Range = xmax − xmin ,

where xmax = max{x1 , . . . , xn } and xmin = min{x1 , . . . , xn }.


▶ The Coefficient of Variation is defined as

s
C.V. = 100%

Sample Range and Coefficient of Variation

▶ Example: The delay times (handling, setting, and positioning the


tools) for cutting 6 parts on an engine lathe are 0.6, 1.2, 0.9, 1.0,
0.6, and 0.8 minutes. Find the sample mean, the mean absolute
deviation, the sample variance, the sample standard deviation,
the sample range and the coefficient of variation.
Sample Range and Coefficient of Variation

▶ Example: The delay times (handling, setting, and positioning the


tools) for cutting 6 parts on an engine lathe are 0.6, 1.2, 0.9, 1.0,
0.6, and 0.8 minutes. Find the sample mean, the mean absolute
deviation, the sample variance, the sample standard deviation,
the sample range and the coefficient of variation.
▶ Ans: Sample Mean x̄ = 0.85 minute,
Mean Absolute Deviation = 0.18 minute,
Sample Variance s2 = 0.055 (minute)2 ,
Sample Standard Deviation s = 0.23 minute,
Sample Range = 0.6 minute,
Coefficient of Variation C.V. = 27.59%.
Quartiles

▶ Quartiles are three values that split sorted data into four parts,
each with an equal number of observations.
Quartiles

▶ Quartiles are three values that split sorted data into four parts,
each with an equal number of observations.
▶ First quartile: Also known as Q1 , or the lower quartile.
Quartiles

▶ Quartiles are three values that split sorted data into four parts,
each with an equal number of observations.
▶ First quartile: Also known as Q1 , or the lower quartile.
▶ 25% of the data falls below first quartile.
Quartiles

▶ Quartiles are three values that split sorted data into four parts,
each with an equal number of observations.
▶ First quartile: Also known as Q1 , or the lower quartile.
▶ 25% of the data falls below first quartile.
▶ Second quartile: Also known as Q2 , or the median.
Quartiles

▶ Quartiles are three values that split sorted data into four parts,
each with an equal number of observations.
▶ First quartile: Also known as Q1 , or the lower quartile.
▶ 25% of the data falls below first quartile.
▶ Second quartile: Also known as Q2 , or the median.
▶ 50% of the data falls below the second quartile.
Quartiles

▶ Quartiles are three values that split sorted data into four parts,
each with an equal number of observations.
▶ First quartile: Also known as Q1 , or the lower quartile.
▶ 25% of the data falls below first quartile.
▶ Second quartile: Also known as Q2 , or the median.
▶ 50% of the data falls below the second quartile.
▶ Third quartile: Also known as Q3 , or the upper quartile.
Quartiles

▶ Quartiles are three values that split sorted data into four parts,
each with an equal number of observations.
▶ First quartile: Also known as Q1 , or the lower quartile.
▶ 25% of the data falls below first quartile.
▶ Second quartile: Also known as Q2 , or the median.
▶ 50% of the data falls below the second quartile.
▶ Third quartile: Also known as Q3 , or the upper quartile.
▶ 75% of the data falls below the third quartile.
Computation of Quartiles: Individual Data
▶ There are several different ways to compute quartiles, but all of
them give approximately the same result.
Computation of Quartiles: Individual Data
▶ There are several different ways to compute quartiles, but all of
them give approximately the same result.
▶ The simplest method when computing by hand is as follows:
Computation of Quartiles: Individual Data
▶ There are several different ways to compute quartiles, but all of
them give approximately the same result.
▶ The simplest method when computing by hand is as follows:
▶ Let n represent the sample size.
Computation of Quartiles: Individual Data
▶ There are several different ways to compute quartiles, but all of
them give approximately the same result.
▶ The simplest method when computing by hand is as follows:
▶ Let n represent the sample size.
▶ Order the sample values from smallest to largest.
Computation of Quartiles: Individual Data
▶ There are several different ways to compute quartiles, but all of
them give approximately the same result.
▶ The simplest method when computing by hand is as follows:
▶ Let n represent the sample size.
▶ Order the sample values from smallest to largest.
 
▶ To find the ith quartile, compute the value i · n + 1 , for i = 1, 2, 3.
4
Computation of Quartiles: Individual Data
▶ There are several different ways to compute quartiles, but all of
them give approximately the same result.
▶ The simplest method when computing by hand is as follows:
▶ Let n represent the sample size.
▶ Order the sample values from smallest to largest.
 
▶ To find the ith quartile, compute the value i · n + 1 , for i = 1, 2, 3.
4
▶ If this is an integer, then the sample value in that position is the ith
quartile.
Computation of Quartiles: Individual Data
▶ There are several different ways to compute quartiles, but all of
them give approximately the same result.
▶ The simplest method when computing by hand is as follows:
▶ Let n represent the sample size.
▶ Order the sample values from smallest to largest.
 
▶ To find the ith quartile, compute the value i · n + 1 , for i = 1, 2, 3.
4
▶ If this is an integer, then the sample value in that position is the ith
quartile.
▶ If not, then take the average of the sample values on either side of this
position.
Computation of Quartiles: Individual Data
▶ There are several different ways to compute quartiles, but all of
them give approximately the same result.
▶ The simplest method when computing by hand is as follows:
▶ Let n represent the sample size.
▶ Order the sample values from smallest to largest.
 
▶ To find the ith quartile, compute the value i · n + 1 , for i = 1, 2, 3.
4
▶ If this is an integer, then the sample value in that position is the ith
quartile.
▶ If not, then take the average of the sample values on either side of this
position.
▶ Example:
Computation of Quartiles: Discrete Frequency Data

▶ If observations are given in the form of discrete frequency distribution


(i.e., xi occurs fi times, for i = 1, 2, . . . , n), arrange the data in
ascending order of the values.
Computation of Quartiles: Discrete Frequency Data

▶ If observations are given in the form of discrete frequency distribution


(i.e., xi occurs fi times, for i = 1, 2, . . . , n), arrange the data in
ascending order of the values.
▶ Find out cumulative frequencies.
Computation of Quartiles: Discrete Frequency Data

▶ If observations are given in the form of discrete frequency distribution


(i.e., xi occurs fi times, for i = 1, 2, . . . , n), arrange the data in
ascending order of the values.
▶ Find out cumulative frequencies.
▶ Let N = ni=1 fi be the sum of frequencies.
P
Computation of Quartiles: Discrete Frequency Data

▶ If observations are given in the form of discrete frequency distribution


(i.e., xi occurs fi times, for i = 1, 2, . . . , n), arrange the data in
ascending order of the values.
▶ Find out cumulative frequencies.
▶ Let N = ni=1 fi be the sum of frequencies.
P
 
N+1
▶ To find the ith quartile, compute the value i · , for i = 1, 2, 3.
4
Computation of Quartiles: Discrete Frequency Data

▶ If observations are given in the form of discrete frequency distribution


(i.e., xi occurs fi times, for i = 1, 2, . . . , n), arrange the data in
ascending order of the values.
▶ Find out cumulative frequencies.
▶ Let N = ni=1 fi be the sum of frequencies.
P
 
N+1
▶ To find the ith quartile, compute the value i · , for i = 1, 2, 3.
4
▶ Now look at the cumulative frequency column and find that total which
is either equal to this value or next higher than that and determine the
sample value corresponding to this.
Computation of Quartiles: Discrete Frequency Data

▶ If observations are given in the form of discrete frequency distribution


(i.e., xi occurs fi times, for i = 1, 2, . . . , n), arrange the data in
ascending order of the values.
▶ Find out cumulative frequencies.
▶ Let N = ni=1 fi be the sum of frequencies.
P
 
N+1
▶ To find the ith quartile, compute the value i · , for i = 1, 2, 3.
4
▶ Now look at the cumulative frequency column and find that total which
is either equal to this value or next higher than that and determine the
sample value corresponding to this.
▶ Example: Find the quartiles of the data
Marks 10 20 30 40 50 60
No. of Students 4 7 15 8 7 2
Computation of Quartiles: Discrete Frequency Data

▶ If observations are given in the form of discrete frequency distribution


(i.e., xi occurs fi times, for i = 1, 2, . . . , n), arrange the data in
ascending order of the values.
▶ Find out cumulative frequencies.
▶ Let N = ni=1 fi be the sum of frequencies.
P
 
N+1
▶ To find the ith quartile, compute the value i · , for i = 1, 2, 3.
4
▶ Now look at the cumulative frequency column and find that total which
is either equal to this value or next higher than that and determine the
sample value corresponding to this.
▶ Example: Find the quartiles of the data
Marks 10 20 30 40 50 60
No. of Students 4 7 15 8 7 2
▶ Ans: Q1 = 20, Q2 = 30, Q3 = 40.
Computation of Quartiles: Continuous Frequency Data

In the case of continuous frequency distribution, the classes are


arranged in ascending order, and the class corresponding to the
cumulative frequency just equal or greater than i · N/4 is called Qi
class, and the value of Qi is obtained by the following formula:
 
h i·N
Qi = l + − c , i = 1, 2, 3,
f 4

where l is the lower limit of the Qi class


f is the frequency of the Qi class
h is the magnitude of the Qi class
c is the cumulative frequency of the class preceding the Qi class
P
N= fi , total frequency.
Computation of Quartiles

▶ Interquartile Range
IQR = Q3 − Q1
Computation of Quartiles

▶ Interquartile Range
IQR = Q3 − Q1

▶ Quartile Deviation
IQR Q3 − Q1
QD = =
2 2
Computation of Quartiles

▶ Interquartile Range
IQR = Q3 − Q1

▶ Quartile Deviation
IQR Q3 − Q1
QD = =
2 2
▶ Coefficient of Quartile Deviation

Q3 − Q1
CQD =
Q3 + Q1
Computation of Quartiles

▶ Interquartile Range
IQR = Q3 − Q1

▶ Quartile Deviation
IQR Q3 − Q1
QD = =
2 2
▶ Coefficient of Quartile Deviation

Q3 − Q1
CQD =
Q3 + Q1

▶ Example: Compute the Upper and Lower Quartiles, Interquartile Range,


Quartile Deviation and Coefficient of Quartile Deviation from the following
data
Class 10-20 20-30 30-40 40-50 50-60 60-70 70-80
Frequency 12 19 5 10 9 6 6
Computation of Quartiles

▶ Interquartile Range
IQR = Q3 − Q1

▶ Quartile Deviation
IQR Q3 − Q1
QD = =
2 2
▶ Coefficient of Quartile Deviation

Q3 − Q1
CQD =
Q3 + Q1

▶ Example: Compute the Upper and Lower Quartiles, Interquartile Range,


Quartile Deviation and Coefficient of Quartile Deviation from the following
data
Class 10-20 20-30 30-40 40-50 50-60 60-70 70-80
Frequency 12 19 5 10 9 6 6
▶ Ans: Q1 = 22.5 and Q3 = 54.72, IQR = 32.22, QD = 16.11, CQD = 0.42.
Moments-Skewness-Kurtosis
Sample Moments

▶ In statistics, sample moments are a set of measures that are used


to describe the properties of a sample distribution.
Sample Moments

▶ In statistics, sample moments are a set of measures that are used


to describe the properties of a sample distribution.
▶ They are calculated based on the data points in a sample, and
provide information about the shape, location, and variability of
the distribution.
Sample Moments

▶ In statistics, sample moments are a set of measures that are used


to describe the properties of a sample distribution.
▶ They are calculated based on the data points in a sample, and
provide information about the shape, location, and variability of
the distribution.
▶ We will discuss two types of moments.
Sample Moments

▶ In statistics, sample moments are a set of measures that are used


to describe the properties of a sample distribution.
▶ They are calculated based on the data points in a sample, and
provide information about the shape, location, and variability of
the distribution.
▶ We will discuss two types of moments.
▶ Moments about the origin. (Origin may be zero or any other
constant, say A). It is also known as raw moments.
Sample Moments

▶ In statistics, sample moments are a set of measures that are used


to describe the properties of a sample distribution.
▶ They are calculated based on the data points in a sample, and
provide information about the shape, location, and variability of
the distribution.
▶ We will discuss two types of moments.
▶ Moments about the origin. (Origin may be zero or any other
constant, say A). It is also known as raw moments.
▶ Moments about the mean, which is also called central moments.
Moments about any Point

▶ Let x1 , x2 , . . . , xn be n observations with frequency distribution


f1 , f2 , . . . , fn , respectively.
Moments about any Point

▶ Let x1 , x2 , . . . , xn be n observations with frequency distribution


f1 , f2 , . . . , fn , respectively.
▶ For any non-negative integer r, the rth moment about the point A is
defined by
n n
1 X X
m′r = fi (xi − A)r , where N = fi .
N
i=1 i=1
Moments about any Point

▶ Let x1 , x2 , . . . , xn be n observations with frequency distribution


f1 , f2 , . . . , fn , respectively.
▶ For any non-negative integer r, the rth moment about the point A is
defined by
n n
1 X X
m′r = fi (xi − A)r , where N = fi .
N
i=1 i=1

▶ We can also write


n
1 X r
m′r = fi di , where di = xi − A.
N
i=1
Moments about any Point

▶ Let x1 , x2 , . . . , xn be n observations with frequency distribution


f1 , f2 , . . . , fn , respectively.
▶ For any non-negative integer r, the rth moment about the point A is
defined by
n n
1 X X
m′r = fi (xi − A)r , where N = fi .
N
i=1 i=1

▶ We can also write


n
1 X r
m′r = fi di , where di = xi − A.
N
i=1

Pn
▶ In Particular, m′0 = 1 and m′1 = x̄ − A, where x̄ = 1
N i=1 fi xi .
Moments about the Mean
▶ Let x1 , x2 , . . . , xn be n observations with frequency distribution
f1 , f2 , . . . , fn , respectively.
Moments about the Mean
▶ Let x1 , x2 , . . . , xn be n observations with frequency distribution
f1 , f2 , . . . , fn , respectively.
▶ The mean is given by x̄ = N1 ni=1 fi xi , where N = Ni=1 fi .
P P
Moments about the Mean
▶ Let x1 , x2 , . . . , xn be n observations with frequency distribution
f1 , f2 , . . . , fn , respectively.
▶ The mean is given by x̄ = N1 ni=1 fi xi , where N = Ni=1 fi .
P P

▶ The rth moment about the mean x̄ is defined as


n
1 X
mr = fi (xi − x̄)r .
N
i=1
Moments about the Mean
▶ Let x1 , x2 , . . . , xn be n observations with frequency distribution
f1 , f2 , . . . , fn , respectively.
▶ The mean is given by x̄ = N1 ni=1 fi xi , where N = Ni=1 fi .
P P

▶ The rth moment about the mean x̄ is defined as


n
1 X
mr = fi (xi − x̄)r .
N
i=1

▶ We can also write


n
1 X r
mr = fi zi , where zi = xi − x̄.
N
i=1
Moments about the Mean
▶ Let x1 , x2 , . . . , xn be n observations with frequency distribution
f1 , f2 , . . . , fn , respectively.
▶ The mean is given by x̄ = N1 ni=1 fi xi , where N = Ni=1 fi .
P P

▶ The rth moment about the mean x̄ is defined as


n
1 X
mr = fi (xi − x̄)r .
N
i=1

▶ We can also write


n
1 X r
mr = fi zi , where zi = xi − x̄.
N
i=1

▶ In particular, m0 = 1 and

n
1 X
m1 = fi (xi − x̄) = 0.
N
i=1
Relation between Moments about Mean and Moments
about any Point

n
1X
mr = fi (xi − x̄)r
N
i=1
n
1X
= fi (xi − A + A − x̄)r
N
i=1
n
1X
= fi (di − m′1 )r
N
i=1
 
n r  
1 X X r
= fi (−1)j dir−j m′j1 
N j
i=1 j=0
r  
r ′
mr−j m′j1
X
= (−1)j
j
j=0
Relation between Moments about Mean and Moments
about any Point

▶ In Particular, for r = 2, 3 and 4, we get

m2 = m′2 − m′2
1

m3 = m′3 − 3m′2 m′1 + 2m′3


1

m4 = m′4 − 4m′3 m′1 + 6m′2 m′2 ′4


1 − 3m1
Relation between Moments about Mean and Moments
about any Point

▶ In Particular, for r = 2, 3 and 4, we get

m2 = m′2 − m′2
1

m3 = m′3 − 3m′2 m′1 + 2m′3


1

m4 = m′4 − 4m′3 m′1 + 6m′2 m′2 ′4


1 − 3m1

▶ Example: Calculate the first four moments about the mean of the
following distribution
x 0 1 2 3 4 5 6 7 8
f 1 8 28 56 70 56 28 8 1
Relation between Moments about Mean and Moments
about any Point

▶ In Particular, for r = 2, 3 and 4, we get

m2 = m′2 − m′2
1

m3 = m′3 − 3m′2 m′1 + 2m′3


1

m4 = m′4 − 4m′3 m′1 + 6m′2 m′2 ′4


1 − 3m1

▶ Example: Calculate the first four moments about the mean of the
following distribution
x 0 1 2 3 4 5 6 7 8
f 1 8 28 56 70 56 28 8 1
▶ Ans: m1 = 0, m2 = 2, m3 = 0, m4 = 11.
Skewness
▶ The average and dispersion provide the location and scale of the
distribution.
Skewness
▶ The average and dispersion provide the location and scale of the
distribution.
▶ In addition to measures of central tendency and dispersion, we
also need to have an idea about the shape of the distribution.
Skewness
▶ The average and dispersion provide the location and scale of the
distribution.
▶ In addition to measures of central tendency and dispersion, we
also need to have an idea about the shape of the distribution.
▶ Measure of skewness studies the lack of symmetry of statistical
distribution.
Skewness
▶ The average and dispersion provide the location and scale of the
distribution.
▶ In addition to measures of central tendency and dispersion, we
also need to have an idea about the shape of the distribution.
▶ Measure of skewness studies the lack of symmetry of statistical
distribution.
▶ Lack of symmetry is called skewness for a frequency
distribution.
Skewness
▶ The average and dispersion provide the location and scale of the
distribution.
▶ In addition to measures of central tendency and dispersion, we
also need to have an idea about the shape of the distribution.
▶ Measure of skewness studies the lack of symmetry of statistical
distribution.
▶ Lack of symmetry is called skewness for a frequency
distribution.
▶ In a perfectly symmetrical distribution, mean, median and mode
coincide. Otherwise, the distribution becomes asymmetric.
Skewness
▶ The average and dispersion provide the location and scale of the
distribution.
▶ In addition to measures of central tendency and dispersion, we
also need to have an idea about the shape of the distribution.
▶ Measure of skewness studies the lack of symmetry of statistical
distribution.
▶ Lack of symmetry is called skewness for a frequency
distribution.
▶ In a perfectly symmetrical distribution, mean, median and mode
coincide. Otherwise, the distribution becomes asymmetric.
▶ When the distribution is skewed to the right, the mean is greater
than the mode.
Skewness
▶ The average and dispersion provide the location and scale of the
distribution.
▶ In addition to measures of central tendency and dispersion, we
also need to have an idea about the shape of the distribution.
▶ Measure of skewness studies the lack of symmetry of statistical
distribution.
▶ Lack of symmetry is called skewness for a frequency
distribution.
▶ In a perfectly symmetrical distribution, mean, median and mode
coincide. Otherwise, the distribution becomes asymmetric.
▶ When the distribution is skewed to the right, the mean is greater
than the mode.
▶ When the distribution is skewed to the left, the mean is less than
the mode.
Skewness
Measures of Skewness

Absolute Measures of Skewness


1. Skewness (Sk ) = Mean – Median
Measures of Skewness

Absolute Measures of Skewness


1. Skewness (Sk ) = Mean – Median
2. Skewness (Sk ) = Mean – Mode
Measures of Skewness

Absolute Measures of Skewness


1. Skewness (Sk ) = Mean – Median
2. Skewness (Sk ) = Mean – Mode
3. Skewness (Sk ) = (Q3 − Q2 ) − (Q2 − Q1 )
Measures of Skewness

Absolute Measures of Skewness


1. Skewness (Sk ) = Mean – Median
2. Skewness (Sk ) = Mean – Mode
3. Skewness (Sk ) = (Q3 − Q2 ) − (Q2 − Q1 )

▶ An absolute measure of skewness can not be used for purposes


of comparison.
Measures of Skewness

Absolute Measures of Skewness


1. Skewness (Sk ) = Mean – Median
2. Skewness (Sk ) = Mean – Mode
3. Skewness (Sk ) = (Q3 − Q2 ) − (Q2 − Q1 )

▶ An absolute measure of skewness can not be used for purposes


of comparison.
▶ For comparing distributions, we calculate the relative measures
which are called coefficient of skewness.
Measures of Skewness

Absolute Measures of Skewness


1. Skewness (Sk ) = Mean – Median
2. Skewness (Sk ) = Mean – Mode
3. Skewness (Sk ) = (Q3 − Q2 ) − (Q2 − Q1 )

▶ An absolute measure of skewness can not be used for purposes


of comparison.
▶ For comparing distributions, we calculate the relative measures
which are called coefficient of skewness.
▶ Coefficient of skewness are pure numbers independent of units
of measurements.
Relative Measures of Skewness

β and γ Coefficient of Skewness


▶ Karl Pearson defined the following β and γ coefficients of
skewness, based upon the second and third central moments:

m23
β1 =
m32
p m3
γ1 = ± β1 = 3/2
m2
Relative Measures of Skewness

β and γ Coefficient of Skewness


▶ Karl Pearson defined the following β and γ coefficients of
skewness, based upon the second and third central moments:

m23
β1 =
m32
p m3
γ1 = ± β1 = 3/2
m2

▶ For a symmetrical distribution, β1 shall be zero.


Relative Measures of Skewness

β and γ Coefficient of Skewness


▶ Karl Pearson defined the following β and γ coefficients of
skewness, based upon the second and third central moments:

m23
β1 =
m32
p m3
γ1 = ± β1 = 3/2
m2

▶ For a symmetrical distribution, β1 shall be zero.


▶ β1 as a measure of skewness does not tell about the direction of
skewness, i.e. positive or negative.
Relative Measures of Skewness

β and γ Coefficient of Skewness


▶ Karl Pearson defined the following β and γ coefficients of
skewness, based upon the second and third central moments:

m23
β1 =
m32
p m3
γ1 = ± β1 = 3/2
m2

▶ For a symmetrical distribution, β1 shall be zero.


▶ β1 as a measure of skewness does not tell about the direction of
skewness, i.e. positive or negative.
▶ This drawback is removed if we calculate Karl Pearson’s Gamma
coefficient γ1 .
Relative Measures of Skewness

Karl Pearson’s Coefficient of Skewness


▶ The formula for measuring Karl Pearson’s coefficient of
skewness is given by

Mean − Mode
Sk =
Standard Deviation
Relative Measures of Skewness

Karl Pearson’s Coefficient of Skewness


▶ The formula for measuring Karl Pearson’s coefficient of
skewness is given by

Mean − Mode
Sk =
Standard Deviation

▶ The value of this coefficient would be zero in a symmetrical


distribution.
Relative Measures of Skewness

Karl Pearson’s Coefficient of Skewness


▶ The formula for measuring Karl Pearson’s coefficient of
skewness is given by

Mean − Mode
Sk =
Standard Deviation

▶ The value of this coefficient would be zero in a symmetrical


distribution.
▶ If mean is greater than mode, coefficient of skewness would be
positive otherwise negative.
Relative Measures of Skewness

Karl Pearson’s Coefficient of Skewness


▶ The formula for measuring Karl Pearson’s coefficient of
skewness is given by

Mean − Mode
Sk =
Standard Deviation

▶ The value of this coefficient would be zero in a symmetrical


distribution.
▶ If mean is greater than mode, coefficient of skewness would be
positive otherwise negative.
▶ If mode is not well defined, we use the formula

3(Mean − Median)
Sk =
Standard Deviation
Relative Measures of Skewness

Bowleys’s Coefficient of Skewness


▶ This method is based on quartiles.
Relative Measures of Skewness

Bowleys’s Coefficient of Skewness


▶ This method is based on quartiles.
▶ The formula for calculating Bowleys’s coefficient of skewness is
given by

(Q3 − Q2 ) − (Q2 − Q1 )
Sk =
Q3 − Q1
Q3 − 2Q2 + Q1
=
Q3 − Q1
Relative Measures of Skewness

Bowleys’s Coefficient of Skewness


▶ This method is based on quartiles.
▶ The formula for calculating Bowleys’s coefficient of skewness is
given by

(Q3 − Q2 ) − (Q2 − Q1 )
Sk =
Q3 − Q1
Q3 − 2Q2 + Q1
=
Q3 − Q1

▶ The value of Sk would be zero if it is a symmetrical distribution.


Relative Measures of Skewness

Bowleys’s Coefficient of Skewness


▶ This method is based on quartiles.
▶ The formula for calculating Bowleys’s coefficient of skewness is
given by

(Q3 − Q2 ) − (Q2 − Q1 )
Sk =
Q3 − Q1
Q3 − 2Q2 + Q1
=
Q3 − Q1

▶ The value of Sk would be zero if it is a symmetrical distribution.


▶ If the value is greater than zero, it is positively skewed and if the
value is less than zero it is negatively skewed distribution.
Examples
▶ Example 1: The following are the marks of 150 students in an
examination. Calculate Karl Pearson’s coefficient of skewness.
Marks 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80
No. of Students 10 40 20 0 10 40 16 14
Examples
▶ Example 1: The following are the marks of 150 students in an
examination. Calculate Karl Pearson’s coefficient of skewness.
Marks 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80
No. of Students 10 40 20 0 10 40 16 14
▶ Ans: Sk = −0.744
Examples
▶ Example 1: The following are the marks of 150 students in an
examination. Calculate Karl Pearson’s coefficient of skewness.
Marks 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80
No. of Students 10 40 20 0 10 40 16 14
▶ Ans: Sk = −0.744
▶ Example 2: For a distribution Karl Pearson’s coefficient of skewness is
0.64, standard deviation is 13 and mean is 59.2. Find mode and
median.
Examples
▶ Example 1: The following are the marks of 150 students in an
examination. Calculate Karl Pearson’s coefficient of skewness.
Marks 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80
No. of Students 10 40 20 0 10 40 16 14
▶ Ans: Sk = −0.744
▶ Example 2: For a distribution Karl Pearson’s coefficient of skewness is
0.64, standard deviation is 13 and mean is 59.2. Find mode and
median.
▶ Ans: Mode = 50.88 and Median=56.42.
Examples
▶ Example 1: The following are the marks of 150 students in an
examination. Calculate Karl Pearson’s coefficient of skewness.
Marks 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80
No. of Students 10 40 20 0 10 40 16 14
▶ Ans: Sk = −0.744
▶ Example 2: For a distribution Karl Pearson’s coefficient of skewness is
0.64, standard deviation is 13 and mean is 59.2. Find mode and
median.
▶ Ans: Mode = 50.88 and Median=56.42.
▶ Example 3: Karl Pearson’s coefficient of skewness is 1.28, its mean is
164 and mode 100. Find the standard deviation.
Examples
▶ Example 1: The following are the marks of 150 students in an
examination. Calculate Karl Pearson’s coefficient of skewness.
Marks 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80
No. of Students 10 40 20 0 10 40 16 14
▶ Ans: Sk = −0.744
▶ Example 2: For a distribution Karl Pearson’s coefficient of skewness is
0.64, standard deviation is 13 and mean is 59.2. Find mode and
median.
▶ Ans: Mode = 50.88 and Median=56.42.
▶ Example 3: Karl Pearson’s coefficient of skewness is 1.28, its mean is
164 and mode 100. Find the standard deviation.
▶ Ans: SD = 50.
Examples
▶ Example 1: The following are the marks of 150 students in an
examination. Calculate Karl Pearson’s coefficient of skewness.
Marks 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80
No. of Students 10 40 20 0 10 40 16 14
▶ Ans: Sk = −0.744
▶ Example 2: For a distribution Karl Pearson’s coefficient of skewness is
0.64, standard deviation is 13 and mean is 59.2. Find mode and
median.
▶ Ans: Mode = 50.88 and Median=56.42.
▶ Example 3: Karl Pearson’s coefficient of skewness is 1.28, its mean is
164 and mode 100. Find the standard deviation.
▶ Ans: SD = 50.
▶ Example 4: For a frequency distribution the Bowley’s coefficient of
skewness is 1.2. If the sum of the 1st and 3rd quarterlies is 200 and
median is 76. Find the value of third quartile.
Examples
▶ Example 1: The following are the marks of 150 students in an
examination. Calculate Karl Pearson’s coefficient of skewness.
Marks 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80
No. of Students 10 40 20 0 10 40 16 14
▶ Ans: Sk = −0.744
▶ Example 2: For a distribution Karl Pearson’s coefficient of skewness is
0.64, standard deviation is 13 and mean is 59.2. Find mode and
median.
▶ Ans: Mode = 50.88 and Median=56.42.
▶ Example 3: Karl Pearson’s coefficient of skewness is 1.28, its mean is
164 and mode 100. Find the standard deviation.
▶ Ans: SD = 50.
▶ Example 4: For a frequency distribution the Bowley’s coefficient of
skewness is 1.2. If the sum of the 1st and 3rd quarterlies is 200 and
median is 76. Find the value of third quartile.
▶ Ans: Q3 = 120.
Kurtosis

▶ If we have the knowledge of the measures of central tendency,


dispersion and skewness, even then we cannot get a complete idea of a
distribution.
Kurtosis

▶ If we have the knowledge of the measures of central tendency,


dispersion and skewness, even then we cannot get a complete idea of a
distribution.
▶ In addition to these measures, we need to know another measure to get
the complete idea about the shape of the distribution which can be
studied with the help of Kurtosis.
Kurtosis

▶ If we have the knowledge of the measures of central tendency,


dispersion and skewness, even then we cannot get a complete idea of a
distribution.
▶ In addition to these measures, we need to know another measure to get
the complete idea about the shape of the distribution which can be
studied with the help of Kurtosis.
▶ Kurtosis gives a measure of flatness or peakedness of distribution.
Kurtosis

▶ If we have the knowledge of the measures of central tendency,


dispersion and skewness, even then we cannot get a complete idea of a
distribution.
▶ In addition to these measures, we need to know another measure to get
the complete idea about the shape of the distribution which can be
studied with the help of Kurtosis.
▶ Kurtosis gives a measure of flatness or peakedness of distribution.
▶ The degree of kurtosis of a distribution is measured relative to that of a
normal curve.
Kurtosis

▶ If we have the knowledge of the measures of central tendency,


dispersion and skewness, even then we cannot get a complete idea of a
distribution.
▶ In addition to these measures, we need to know another measure to get
the complete idea about the shape of the distribution which can be
studied with the help of Kurtosis.
▶ Kurtosis gives a measure of flatness or peakedness of distribution.
▶ The degree of kurtosis of a distribution is measured relative to that of a
normal curve.
▶ The curves with greater peakedness than the normal curve are called
”Leptokurtic”.
Kurtosis

▶ If we have the knowledge of the measures of central tendency,


dispersion and skewness, even then we cannot get a complete idea of a
distribution.
▶ In addition to these measures, we need to know another measure to get
the complete idea about the shape of the distribution which can be
studied with the help of Kurtosis.
▶ Kurtosis gives a measure of flatness or peakedness of distribution.
▶ The degree of kurtosis of a distribution is measured relative to that of a
normal curve.
▶ The curves with greater peakedness than the normal curve are called
”Leptokurtic”.
▶ The curves which are more flat than the normal curve are called
”Platykurtic”.
Kurtosis

▶ If we have the knowledge of the measures of central tendency,


dispersion and skewness, even then we cannot get a complete idea of a
distribution.
▶ In addition to these measures, we need to know another measure to get
the complete idea about the shape of the distribution which can be
studied with the help of Kurtosis.
▶ Kurtosis gives a measure of flatness or peakedness of distribution.
▶ The degree of kurtosis of a distribution is measured relative to that of a
normal curve.
▶ The curves with greater peakedness than the normal curve are called
”Leptokurtic”.
▶ The curves which are more flat than the normal curve are called
”Platykurtic”.
▶ The normal curve is called ”Mesokurtic”.
Kurtosis
Measures of Kurtosis

Karl Pearson’s Measures of Kurtosis


▶ For calculating the kurtosis, the second and fourth central
moments of variable are used. For this, following formula given
by Karl Pearson is used:

m4
β2 =
m22
γ2 = β2 − 3
Measures of Kurtosis

Karl Pearson’s Measures of Kurtosis


▶ For calculating the kurtosis, the second and fourth central
moments of variable are used. For this, following formula given
by Karl Pearson is used:

m4
β2 =
m22
γ2 = β2 − 3

▶ If β2 = 3 or γ2 = 0, then curve is said to be mesokurtic.


Measures of Kurtosis

Karl Pearson’s Measures of Kurtosis


▶ For calculating the kurtosis, the second and fourth central
moments of variable are used. For this, following formula given
by Karl Pearson is used:

m4
β2 =
m22
γ2 = β2 − 3

▶ If β2 = 3 or γ2 = 0, then curve is said to be mesokurtic.


▶ If β2 < 3 or γ2 < 0, then curve is said to be platykurtic.
Measures of Kurtosis

Karl Pearson’s Measures of Kurtosis


▶ For calculating the kurtosis, the second and fourth central
moments of variable are used. For this, following formula given
by Karl Pearson is used:

m4
β2 =
m22
γ2 = β2 − 3

▶ If β2 = 3 or γ2 = 0, then curve is said to be mesokurtic.


▶ If β2 < 3 or γ2 < 0, then curve is said to be platykurtic.
▶ If β2 > 3 or γ2 > 0, then curve is said to be leptokurtic.
Examples

▶ Example 1: First four moments about mean of a distribution are


0, 2.5, 0.7 and 18.75. Find coefficient of skewness and kurtosis.
Examples

▶ Example 1: First four moments about mean of a distribution are


0, 2.5, 0.7 and 18.75. Find coefficient of skewness and kurtosis.
▶ Ans: Skewness β1 = 0.031 and Kurtosis β2 = 3.
Examples

▶ Example 1: First four moments about mean of a distribution are


0, 2.5, 0.7 and 18.75. Find coefficient of skewness and kurtosis.
▶ Ans: Skewness β1 = 0.031 and Kurtosis β2 = 3.
▶ Example 2: The first four raw moments of a distribution are 2,
136, 320, and 40,000. Find out coefficients of skewness and
kurtosis.
Examples

▶ Example 1: First four moments about mean of a distribution are


0, 2.5, 0.7 and 18.75. Find coefficient of skewness and kurtosis.
▶ Ans: Skewness β1 = 0.031 and Kurtosis β2 = 3.
▶ Example 2: The first four raw moments of a distribution are 2,
136, 320, and 40,000. Find out coefficients of skewness and
kurtosis.
▶ Ans: Skewness β1 = 0.0904 and Kurtosis β2 = 2.333.

You might also like