Module 1
Module 1
VIT Chennai
Learning Objective
▶ Rank correlation
▶ Partial and Multiple correlation
▶ Multiple regression
Module 4: Probability Distributions
▶ Binomial distribution
▶ Poisson distribution
▶ Normal distribution
▶ Gamma distribution
▶ Exponential distribution
▶ Weibull distribution
Module 5: Hypothesis Testing-I
▶ Types of errors
▶ Critical region
▶ Procedure for testing of hypothesis
▶ Large sample tests
▶ Z test for single proportion
▶ Difference of proportion
▶ Mean and difference of means
Module 6: Hypothesis Testing-II
▶ Basic concepts
▶ Hazard function
▶ Reliabilities of series and parallel systems
▶ System reliability
▶ Maintainability
▶ Preventive and repair maintenance
▶ Availability
Text and Reference Books
▶ In the middle of the 20th century, the Japanese were able to succeed where
others failed, to come up with high-quality products.
Motivation: Why Study Statistics?
▶ In the middle of the 20th century, the Japanese were able to succeed where
others failed, to come up with high-quality products.
▶ Much of the success of the Japanese was due to the use of statistical methods
and statistical thinking among management personnel.
Motivation: Why Study Statistics?
▶ In the middle of the 20th century, the Japanese were able to succeed where
others failed, to come up with high-quality products.
▶ Much of the success of the Japanese was due to the use of statistical methods
and statistical thinking among management personnel.
▶ From 1980s till today, our attention is focused on improvement of quality in
industry.
Motivation: Why Study Statistics?
▶ In the middle of the 20th century, the Japanese were able to succeed where
others failed, to come up with high-quality products.
▶ Much of the success of the Japanese was due to the use of statistical methods
and statistical thinking among management personnel.
▶ From 1980s till today, our attention is focused on improvement of quality in
industry.
▶ With boom in Data Science, Statistics is inevitable.
Important Aspects of Statistics
▶ What is data?
Important Aspects of Statistics
▶ What is data?
▶ plural of “datum” which means “a piece of information”.
Important Aspects of Statistics
▶ What is data?
▶ plural of “datum” which means “a piece of information”.
▶ facts collected together for reference or analysis.
Important Aspects of Statistics
▶ What is data?
▶ plural of “datum” which means “a piece of information”.
▶ facts collected together for reference or analysis.
▶ the quantities, characters, or symbols on which operations are performed
by a computer, which may be stored and transmitted in the form of
electrical signals and recorded on magnetic, optical, or mechanical
recording media.
Important Aspects of Statistics
▶ What is data?
▶ plural of “datum” which means “a piece of information”.
▶ facts collected together for reference or analysis.
▶ the quantities, characters, or symbols on which operations are performed
by a computer, which may be stored and transmitted in the form of
electrical signals and recorded on magnetic, optical, or mechanical
recording media.
▶ What is statistics?
Important Aspects of Statistics
▶ What is data?
▶ plural of “datum” which means “a piece of information”.
▶ facts collected together for reference or analysis.
▶ the quantities, characters, or symbols on which operations are performed
by a computer, which may be stored and transmitted in the form of
electrical signals and recorded on magnetic, optical, or mechanical
recording media.
▶ What is statistics?
▶ plural of “statistic” which means “a fact or piece of data obtained from a
study of a large quantity of numerical data”.
Important Aspects of Statistics
▶ What is data?
▶ plural of “datum” which means “a piece of information”.
▶ facts collected together for reference or analysis.
▶ the quantities, characters, or symbols on which operations are performed
by a computer, which may be stored and transmitted in the form of
electrical signals and recorded on magnetic, optical, or mechanical
recording media.
▶ What is statistics?
▶ plural of “statistic” which means “a fact or piece of data obtained from a
study of a large quantity of numerical data”.
▶ the practice or science of collecting and analysing numerical data in
large quantities, especially for the purpose of inferring proportions in a
whole from those in a representative sample.
Important Aspects of Statistics
▶ Types of Statistics
Important Aspects of Statistics
▶ Types of Statistics
▶ Descriptive Statistics: A set of brief descriptive coefficients that
summarizes a given data set, which can either be a representation of the
entire population or a sample.
Important Aspects of Statistics
▶ Types of Statistics
▶ Descriptive Statistics: A set of brief descriptive coefficients that
summarizes a given data set, which can either be a representation of the
entire population or a sample.
▶ Inferential Statistics:
Important Aspects of Statistics
▶ Types of Statistics
▶ Descriptive Statistics: A set of brief descriptive coefficients that
summarizes a given data set, which can either be a representation of the
entire population or a sample.
▶ Inferential Statistics:
▶ “inference” meaning “a conclusion reached on the basis of
evidence and reasoning”.
Important Aspects of Statistics
▶ Types of Statistics
▶ Descriptive Statistics: A set of brief descriptive coefficients that
summarizes a given data set, which can either be a representation of the
entire population or a sample.
▶ Inferential Statistics:
▶ “inference” meaning “a conclusion reached on the basis of
evidence and reasoning”.
▶ Inferential statistics makes inferences about populations using data
drawn from the population.
Important Aspects of Statistics
▶ Types of Statistics
▶ Descriptive Statistics: A set of brief descriptive coefficients that
summarizes a given data set, which can either be a representation of the
entire population or a sample.
▶ Inferential Statistics:
▶ “inference” meaning “a conclusion reached on the basis of
evidence and reasoning”.
▶ Inferential statistics makes inferences about populations using data
drawn from the population.
▶ Sources of Variation
Important Aspects of Statistics
▶ Sources of Variation
▶ Inherent
Important Aspects of Statistics
▶ Sources of Variation
▶ Inherent
▶ External
Important Aspects of Statistics
▶ Sources of Variation
▶ Inherent
▶ External
▶ Sources of Variation
▶ Inherent
▶ External
▶ Sources of Variation
▶ Inherent
▶ External
▶ Sources of Variation
▶ Inherent
▶ External
▶ Sources of Variation
▶ Inherent
▶ External
▶ Populations
Important Aspects of Statistics
▶ Variability in Scientific Data
▶ Variability refers to the extent to which these data points differ from each
other.
▶ Samples or Observations
▶ a small part or quantity intended to show what the whole is like.
▶ Populations
▶ a finite or infinite collection of items under consideration.
Important Aspects of Statistics
▶ Variability in Scientific Data
▶ Variability refers to the extent to which these data points differ from each
other.
▶ Samples or Observations
▶ a small part or quantity intended to show what the whole is like.
▶ Populations
▶ a finite or infinite collection of items under consideration.
▶ Factors
Important Aspects of Statistics
▶ Variability in Scientific Data
▶ Variability refers to the extent to which these data points differ from each
other.
▶ Samples or Observations
▶ a small part or quantity intended to show what the whole is like.
▶ Populations
▶ a finite or infinite collection of items under consideration.
▶ Factors
▶ characteristics or quantities associated with population.
Important Aspects of Statistics
▶ Variability in Scientific Data
▶ Variability refers to the extent to which these data points differ from each
other.
▶ Samples or Observations
▶ a small part or quantity intended to show what the whole is like.
▶ Populations
▶ a finite or infinite collection of items under consideration.
▶ Factors
▶ characteristics or quantities associated with population.
▶ Experimental Design
Important Aspects of Statistics
▶ Variability in Scientific Data
▶ Variability refers to the extent to which these data points differ from each
other.
▶ Samples or Observations
▶ a small part or quantity intended to show what the whole is like.
▶ Populations
▶ a finite or infinite collection of items under consideration.
▶ Factors
▶ characteristics or quantities associated with population.
▶ Experimental Design
▶ a design developed by the experimenter by controlling the factors in the
data.
Important Aspects of Statistics
▶ Variability in Scientific Data
▶ Variability refers to the extent to which these data points differ from each
other.
▶ Samples or Observations
▶ a small part or quantity intended to show what the whole is like.
▶ Populations
▶ a finite or infinite collection of items under consideration.
▶ Factors
▶ characteristics or quantities associated with population.
▶ Experimental Design
▶ a design developed by the experimenter by controlling the factors in the
data.
▶ Observational Study
Important Aspects of Statistics
▶ Variability in Scientific Data
▶ Variability refers to the extent to which these data points differ from each
other.
▶ Samples or Observations
▶ a small part or quantity intended to show what the whole is like.
▶ Populations
▶ a finite or infinite collection of items under consideration.
▶ Factors
▶ characteristics or quantities associated with population.
▶ Experimental Design
▶ a design developed by the experimenter by controlling the factors in the
data.
▶ Observational Study
▶ a study of the data with no control on the factors.
The Role of Probability
What is Probability?
What is Probability?
▶ The theory with sound mathematical foundation (Axioms) which involves
dealing with uncertainty and variation.
What is Probability?
▶ The theory with sound mathematical foundation (Axioms) which involves
dealing with uncertainty and variation.
Why Sampling?
▶ It is not always possible to study entire population.
What is Sampling?
Sampling Procedures: Collection of Data
Why Sampling?
▶ It is not always possible to study entire population.
▶ Time consuming.
What is Sampling?
Sampling Procedures: Collection of Data
Why Sampling?
▶ It is not always possible to study entire population.
▶ Time consuming.
▶ Not budget friendly.
What is Sampling?
Sampling Procedures: Collection of Data
Why Sampling?
▶ It is not always possible to study entire population.
▶ Time consuming.
▶ Not budget friendly.
What is Sampling?
▶ Sampling is the process to collect a limited amount of data to form a sample.
Sampling Procedures: Collection of Data
Why Sampling?
▶ It is not always possible to study entire population.
▶ Time consuming.
▶ Not budget friendly.
What is Sampling?
▶ Sampling is the process to collect a limited amount of data to form a sample.
▶ How can it be done?
Sampling Procedures: Collection of Data
Why Sampling?
▶ It is not always possible to study entire population.
▶ Time consuming.
▶ Not budget friendly.
What is Sampling?
▶ Sampling is the process to collect a limited amount of data to form a sample.
▶ How can it be done?
▶ It might seem straightforward, but it is not so, as every method may have a
tendency to produce biased sample which is not a good representative of the
population under study.
Sampling Procedures: Collection of Data
Why Sampling?
▶ It is not always possible to study entire population.
▶ Time consuming.
▶ Not budget friendly.
What is Sampling?
▶ Sampling is the process to collect a limited amount of data to form a sample.
▶ How can it be done?
▶ It might seem straightforward, but it is not so, as every method may have a
tendency to produce biased sample which is not a good representative of the
population under study.
▶ Are there any specific methods?
Sampling Methods
n
1X x1 + x2 + . . . + xn
x̄ = xi = .
n n
i=1
The Sample Mean: Individual Data
▶ The sample mean is also called the ‘arithmetic mean’ or the
‘average’.
▶ Let x1 , x2 , . . . , xn denote the observations.
▶ The sample mean, denoted by x̄, is
n
1X x1 + x2 + . . . + xn
x̄ = xi = .
n n
i=1
n
1X x1 + x2 + . . . + xn
x̄ = xi = .
n n
i=1
n
f1 x1 + f2 x2 + · · · + fn xn 1 X
x̄ = = fi xi ,
f1 + f2 · · · + fn N
i=1
Pn
where N = i=1 fi .
The Sample Mean: Discrete Frequency Data
▶ Let fi be the frequency of the variable xi , i = 1, 2, . . . , n.
▶ The sample mean or weighted sample mean is
n
f1 x1 + f2 x2 + · · · + fn xn 1 X
x̄ = = fi xi ,
f1 + f2 · · · + fn N
i=1
Pn
where N = i=1 fi .
▶ Example: Frequency distribution of the number of telephone calls
received at an exchange in 245 successive one-minute intervals are
No. of Calls 0 1 2 3 4 5 6 7
Frequency 14 21 25 43 51 40 39 12
Find the mean number of calls per minute at the exchange.
The Sample Mean: Discrete Frequency Data
▶ Let fi be the frequency of the variable xi , i = 1, 2, . . . , n.
▶ The sample mean or weighted sample mean is
n
f1 x1 + f2 x2 + · · · + fn xn 1 X
x̄ = = fi xi ,
f1 + f2 · · · + fn N
i=1
Pn
where N = i=1 fi .
▶ Example: Frequency distribution of the number of telephone calls
received at an exchange in 245 successive one-minute intervals are
No. of Calls 0 1 2 3 4 5 6 7
Frequency 14 21 25 43 51 40 39 12
Find the mean number of calls per minute at the exchange.
▶ The mean number of calls per minute at the exchange is
n
1 X 922
x̄ = fi x i = = 3.763.
N 245
i=1
The Sample Mean
x̄ = a + bȳ.
The Sample Mean
x̄ = a + bȳ.
x 1 2 3 4 5 6 7 8 9
▶ f 8 10 11 16 20 25 15 9 6
c.f. 8 18 29 45 65 90 105 114 120
The Sample Median: Discrete Frequency Data
x 1 2 3 4 5 6 7 8 9
▶ f 8 10 11 16 20 25 15 9 6
c.f. 8 18 29 45 65 90 105 114 120
▶ N = 120 and N/2 = 60. The cumulative frequency just greater than N/2 is 65
and the value of x corresponding to 65 is 5. So, the sample median is 5.
The Sample Median: Grouped Frequency Data
h
Median = l + (N/2 − c),
f
where
▶ l is the lower limit of the median class
▶ f is the frequency of the median class
▶ h is the magnitude of the median class
▶ c is the c.f.of the class preceding the median class
▶
P
N= f
The Sample Median: Grouped Frequency Data
h 1000
Median = l + (N/2 − c) = 4000 + (21.5 − 8) = 4675
f 20
The Sample Median: Grouped Frequency Data
h 1000
Median = l + (N/2 − c) = 4000 + (21.5 − 8) = 4675
f 20
√
G= n
x1 x2 . . . xn = (x1 x2 . . . xn )1/n .
The Geometric Mean
▶ Let x1 , x2 , . . . , xn be an observed sample.
▶ The geometric mean G is given as
√
G= n
x1 x2 . . . xn = (x1 x2 . . . xn )1/n .
n
1X
log G = log xi
n
i=1
The Geometric Mean
▶ Let x1 , x2 , . . . , xn be an observed sample.
▶ The geometric mean G is given as
√
G= n
x1 x2 . . . xn = (x1 x2 . . . xn )1/n .
n
1X
log G = log xi
n
i=1
n
!
1X
G = antilog log xi .
n
i=1
The Geometric Mean
▶ Let x1 , x2 , . . . , xn be an observed sample.
▶ The geometric mean G is given as
√
G= n
x1 x2 . . . xn = (x1 x2 . . . xn )1/n .
n
1X
log G = log xi
n
i=1
n
!
1X
G = antilog log xi .
n
i=1
√
G= n
x1 x2 . . . xn = (x1 x2 . . . xn )1/n .
n
1X
log G = log xi
n
i=1
n
!
1X
G = antilog log xi .
n
i=1
where N = ni=1 fi .
P
where N = ni=1 fi .
P
n
!
1 X
G = antilog fi log xi .
N i=1
The Geometric Mean
▶ Let f1 , f2 , . . . , fn be the frequency distribution of the observations x1 , x2 , . . . , xn ,
respectively. Then the geometric mean is given by
where N = ni=1 fi .
P
n
!
1 X
G = antilog fi log xi .
N i=1
where N = ni=1 fi .
P
n
!
1 X
G = antilog fi log xi .
N i=1
Pn
where N = i=1 fi .
The Harmonic Mean
▶ The harmonic mean H of n observations x1 , x2 , . . . , xn is given by
n
1 1 1 1 1 1X 1
= + + ··· + = .
H n x1 x2 xn n xi
i=1
Pn
where N = i=1 fi .
▶ Example: A cyclist pedals from his house to his college at a speed of
10 kmph and back from college to his house at 15 kmph. Find the
average speed.
The Harmonic Mean
▶ The harmonic mean H of n observations x1 , x2 , . . . , xn is given by
n
1 1 1 1 1 1X 1
= + + ··· + = .
H n x1 x2 xn n xi
i=1
Pn
where N = i=1 fi .
▶ Example: A cyclist pedals from his house to his college at a speed of
10 kmph and back from college to his house at 15 kmph. Find the
average speed.
▶ Ans: 12 kmph
The Mode
3, 5, 7, 5, 9, 7, 5, 7, 6, 3, 9, 5, 6, 6, 3.
The Mode
3, 5, 7, 5, 9, 7, 5, 7, 6, 3, 9, 5, 6, 6, 3.
Numbers 3 5 6 7 9
▶
Frequency 3 4 3 3 2
The Mode
3, 5, 7, 5, 9, 7, 5, 7, 6, 3, 9, 5, 6, 6, 3.
Numbers 3 5 6 7 9
▶
Frequency 3 4 3 3 2
▶ The mode is 5.
The Mode
3, 5, 7, 5, 9, 7, 5, 7, 6, 3, 9, 5, 6, 6, 3.
Numbers 3 5 6 7 9
▶
Frequency 3 4 3 3 2
▶ The mode is 5.
▶ Example 3: The data set 1, 1, 2, 4, 4 have two modes 1 and 4, i.e., its mode is
not unique.
The Mode
3, 5, 7, 5, 9, 7, 5, 7, 6, 3, 9, 5, 6, 6, 3.
Numbers 3 5 6 7 9
▶
Frequency 3 4 3 3 2
▶ The mode is 5.
▶ Example 3: The data set 1, 1, 2, 4, 4 have two modes 1 and 4, i.e., its mode is
not unique.
▶ A dataset is said to be unimodal if it has one mode.
The Mode
3, 5, 7, 5, 9, 7, 5, 7, 6, 3, 9, 5, 6, 6, 3.
Numbers 3 5 6 7 9
▶
Frequency 3 4 3 3 2
▶ The mode is 5.
▶ Example 3: The data set 1, 1, 2, 4, 4 have two modes 1 and 4, i.e., its mode is
not unique.
▶ A dataset is said to be unimodal if it has one mode.
▶ A dataset with two modes is referred to as bimodal.
The Mode
3, 5, 7, 5, 9, 7, 5, 7, 6, 3, 9, 5, 6, 6, 3.
Numbers 3 5 6 7 9
▶
Frequency 3 4 3 3 2
▶ The mode is 5.
▶ Example 3: The data set 1, 1, 2, 4, 4 have two modes 1 and 4, i.e., its mode is
not unique.
▶ A dataset is said to be unimodal if it has one mode.
▶ A dataset with two modes is referred to as bimodal.
▶ A dataset with more than two modes is called multimodal.
The Mode: Grouped Frequency Data
▶ In the case of frequency distribution of a continuous variable, mode is obtained
by the following formula
h(f1 − f0 )
Mode = l + ,
2f1 − (f0 + f2 )
where l - the lower limit of the modal class (i.e, class interval with maximum
frequency)
f1 - frequency of the modal class
f0 - frequency of the pre-modal class
f2 - frequency of the post-modal class
h - size of the modal class (upper limit – lower limit).
The Mode: Grouped Frequency Data
▶ In the case of frequency distribution of a continuous variable, mode is obtained
by the following formula
h(f1 − f0 )
Mode = l + ,
2f1 − (f0 + f2 )
where l - the lower limit of the modal class (i.e, class interval with maximum
frequency)
f1 - frequency of the modal class
f0 - frequency of the pre-modal class
f2 - frequency of the post-modal class
h - size of the modal class (upper limit – lower limit).
▶ Example: The heights of 50 students are recorded as
Height (in cm) 125-130 130-135 135-140 140-145 145-150
No. of students 7 14 10 10 9
The Mode: Grouped Frequency Data
▶ In the case of frequency distribution of a continuous variable, mode is obtained
by the following formula
h(f1 − f0 )
Mode = l + ,
2f1 − (f0 + f2 )
where l - the lower limit of the modal class (i.e, class interval with maximum
frequency)
f1 - frequency of the modal class
f0 - frequency of the pre-modal class
f2 - frequency of the post-modal class
h - size of the modal class (upper limit – lower limit).
▶ Example: The heights of 50 students are recorded as
Height (in cm) 125-130 130-135 135-140 140-145 145-150
No. of students 7 14 10 10 9
▶ Here, the maximum frequency is 14 and the corresponding class 130-135 is the
modal class. Therefore, l = 130, h = 5, f0 = 14, f1 = 7, f2 = 10.
The Mode: Grouped Frequency Data
▶ In the case of frequency distribution of a continuous variable, mode is obtained
by the following formula
h(f1 − f0 )
Mode = l + ,
2f1 − (f0 + f2 )
where l - the lower limit of the modal class (i.e, class interval with maximum
frequency)
f1 - frequency of the modal class
f0 - frequency of the pre-modal class
f2 - frequency of the post-modal class
h - size of the modal class (upper limit – lower limit).
▶ Example: The heights of 50 students are recorded as
Height (in cm) 125-130 130-135 135-140 140-145 145-150
No. of students 7 14 10 10 9
▶ Here, the maximum frequency is 14 and the corresponding class 130-135 is the
modal class. Therefore, l = 130, h = 5, f0 = 14, f1 = 7, f2 = 10.
▶ Mode = 133.18.
Empirical Relationship between Mean, Median and Mode
▶ A frequency distribution is said to be symmetrical if
or
Mode = 3Median − 2Mean
Measures of Dispersion
The Mean Absolute Deviation
x1 − x̄, x2 − x̄, . . . , xn − x̄
The Mean Absolute Deviation
x1 − x̄, x2 − x̄, . . . , xn − x̄
Pn
▶ Sum of Deviations = i=1 (xi − x̄) = 0.
The Mean Absolute Deviation
x1 − x̄, x2 − x̄, . . . , xn − x̄
Pn
▶ Sum of Deviations = i=1 (xi − x̄) = 0.
▶ The Mean Absolute Deviation is given as
n
1X
Mean Absolute Deviation = |xi − x̄|
n
i=1
Sample Variance and Sample Standard Deviation
▶ The Sample Variance, denoted by s2 , is given by
n n
!
1 X 1 X
s2 = (xi − x̄)2 = xi2 − nx̄2 .
n−1 n−1
i=1 i=1
Sample Variance and Sample Standard Deviation
▶ The Sample Variance, denoted by s2 , is given by
n n
!
1 X 1 X
s2 = (xi − x̄)2 = xi2 − nx̄2 .
n−1 n−1
i=1 i=1
n n
!
1 X 1 X
s2 = (xi − x̄)2 = xi2 − nx̄2 .
n−1 n−1
i=1 i=1
n n
!
1 X 1 X
s2 = (xi − x̄)2 = xi2 − nx̄2 .
n−1 n−1
i=1 i=1
n n
!
1 X 1 X
s2 = (xi − x̄)2 = xi2 − nx̄2 .
n−1 n−1
i=1 i=1
n n
!
1 X 1 X
s2 = (xi − x̄)2 = xi2 − nx̄2 .
n−1 n−1
i=1 i=1
s
C.V. = 100%
x̄
Sample Range and Coefficient of Variation
▶ Quartiles are three values that split sorted data into four parts,
each with an equal number of observations.
Quartiles
▶ Quartiles are three values that split sorted data into four parts,
each with an equal number of observations.
▶ First quartile: Also known as Q1 , or the lower quartile.
Quartiles
▶ Quartiles are three values that split sorted data into four parts,
each with an equal number of observations.
▶ First quartile: Also known as Q1 , or the lower quartile.
▶ 25% of the data falls below first quartile.
Quartiles
▶ Quartiles are three values that split sorted data into four parts,
each with an equal number of observations.
▶ First quartile: Also known as Q1 , or the lower quartile.
▶ 25% of the data falls below first quartile.
▶ Second quartile: Also known as Q2 , or the median.
Quartiles
▶ Quartiles are three values that split sorted data into four parts,
each with an equal number of observations.
▶ First quartile: Also known as Q1 , or the lower quartile.
▶ 25% of the data falls below first quartile.
▶ Second quartile: Also known as Q2 , or the median.
▶ 50% of the data falls below the second quartile.
Quartiles
▶ Quartiles are three values that split sorted data into four parts,
each with an equal number of observations.
▶ First quartile: Also known as Q1 , or the lower quartile.
▶ 25% of the data falls below first quartile.
▶ Second quartile: Also known as Q2 , or the median.
▶ 50% of the data falls below the second quartile.
▶ Third quartile: Also known as Q3 , or the upper quartile.
Quartiles
▶ Quartiles are three values that split sorted data into four parts,
each with an equal number of observations.
▶ First quartile: Also known as Q1 , or the lower quartile.
▶ 25% of the data falls below first quartile.
▶ Second quartile: Also known as Q2 , or the median.
▶ 50% of the data falls below the second quartile.
▶ Third quartile: Also known as Q3 , or the upper quartile.
▶ 75% of the data falls below the third quartile.
Computation of Quartiles: Individual Data
▶ There are several different ways to compute quartiles, but all of
them give approximately the same result.
Computation of Quartiles: Individual Data
▶ There are several different ways to compute quartiles, but all of
them give approximately the same result.
▶ The simplest method when computing by hand is as follows:
Computation of Quartiles: Individual Data
▶ There are several different ways to compute quartiles, but all of
them give approximately the same result.
▶ The simplest method when computing by hand is as follows:
▶ Let n represent the sample size.
Computation of Quartiles: Individual Data
▶ There are several different ways to compute quartiles, but all of
them give approximately the same result.
▶ The simplest method when computing by hand is as follows:
▶ Let n represent the sample size.
▶ Order the sample values from smallest to largest.
Computation of Quartiles: Individual Data
▶ There are several different ways to compute quartiles, but all of
them give approximately the same result.
▶ The simplest method when computing by hand is as follows:
▶ Let n represent the sample size.
▶ Order the sample values from smallest to largest.
▶ To find the ith quartile, compute the value i · n + 1 , for i = 1, 2, 3.
4
Computation of Quartiles: Individual Data
▶ There are several different ways to compute quartiles, but all of
them give approximately the same result.
▶ The simplest method when computing by hand is as follows:
▶ Let n represent the sample size.
▶ Order the sample values from smallest to largest.
▶ To find the ith quartile, compute the value i · n + 1 , for i = 1, 2, 3.
4
▶ If this is an integer, then the sample value in that position is the ith
quartile.
Computation of Quartiles: Individual Data
▶ There are several different ways to compute quartiles, but all of
them give approximately the same result.
▶ The simplest method when computing by hand is as follows:
▶ Let n represent the sample size.
▶ Order the sample values from smallest to largest.
▶ To find the ith quartile, compute the value i · n + 1 , for i = 1, 2, 3.
4
▶ If this is an integer, then the sample value in that position is the ith
quartile.
▶ If not, then take the average of the sample values on either side of this
position.
Computation of Quartiles: Individual Data
▶ There are several different ways to compute quartiles, but all of
them give approximately the same result.
▶ The simplest method when computing by hand is as follows:
▶ Let n represent the sample size.
▶ Order the sample values from smallest to largest.
▶ To find the ith quartile, compute the value i · n + 1 , for i = 1, 2, 3.
4
▶ If this is an integer, then the sample value in that position is the ith
quartile.
▶ If not, then take the average of the sample values on either side of this
position.
▶ Example:
Computation of Quartiles: Discrete Frequency Data
▶ Interquartile Range
IQR = Q3 − Q1
Computation of Quartiles
▶ Interquartile Range
IQR = Q3 − Q1
▶ Quartile Deviation
IQR Q3 − Q1
QD = =
2 2
Computation of Quartiles
▶ Interquartile Range
IQR = Q3 − Q1
▶ Quartile Deviation
IQR Q3 − Q1
QD = =
2 2
▶ Coefficient of Quartile Deviation
Q3 − Q1
CQD =
Q3 + Q1
Computation of Quartiles
▶ Interquartile Range
IQR = Q3 − Q1
▶ Quartile Deviation
IQR Q3 − Q1
QD = =
2 2
▶ Coefficient of Quartile Deviation
Q3 − Q1
CQD =
Q3 + Q1
▶ Interquartile Range
IQR = Q3 − Q1
▶ Quartile Deviation
IQR Q3 − Q1
QD = =
2 2
▶ Coefficient of Quartile Deviation
Q3 − Q1
CQD =
Q3 + Q1
Pn
▶ In Particular, m′0 = 1 and m′1 = x̄ − A, where x̄ = 1
N i=1 fi xi .
Moments about the Mean
▶ Let x1 , x2 , . . . , xn be n observations with frequency distribution
f1 , f2 , . . . , fn , respectively.
Moments about the Mean
▶ Let x1 , x2 , . . . , xn be n observations with frequency distribution
f1 , f2 , . . . , fn , respectively.
▶ The mean is given by x̄ = N1 ni=1 fi xi , where N = Ni=1 fi .
P P
Moments about the Mean
▶ Let x1 , x2 , . . . , xn be n observations with frequency distribution
f1 , f2 , . . . , fn , respectively.
▶ The mean is given by x̄ = N1 ni=1 fi xi , where N = Ni=1 fi .
P P
▶ In particular, m0 = 1 and
n
1 X
m1 = fi (xi − x̄) = 0.
N
i=1
Relation between Moments about Mean and Moments
about any Point
n
1X
mr = fi (xi − x̄)r
N
i=1
n
1X
= fi (xi − A + A − x̄)r
N
i=1
n
1X
= fi (di − m′1 )r
N
i=1
n r
1 X X r
= fi (−1)j dir−j m′j1
N j
i=1 j=0
r
r ′
mr−j m′j1
X
= (−1)j
j
j=0
Relation between Moments about Mean and Moments
about any Point
m2 = m′2 − m′2
1
m2 = m′2 − m′2
1
▶ Example: Calculate the first four moments about the mean of the
following distribution
x 0 1 2 3 4 5 6 7 8
f 1 8 28 56 70 56 28 8 1
Relation between Moments about Mean and Moments
about any Point
m2 = m′2 − m′2
1
▶ Example: Calculate the first four moments about the mean of the
following distribution
x 0 1 2 3 4 5 6 7 8
f 1 8 28 56 70 56 28 8 1
▶ Ans: m1 = 0, m2 = 2, m3 = 0, m4 = 11.
Skewness
▶ The average and dispersion provide the location and scale of the
distribution.
Skewness
▶ The average and dispersion provide the location and scale of the
distribution.
▶ In addition to measures of central tendency and dispersion, we
also need to have an idea about the shape of the distribution.
Skewness
▶ The average and dispersion provide the location and scale of the
distribution.
▶ In addition to measures of central tendency and dispersion, we
also need to have an idea about the shape of the distribution.
▶ Measure of skewness studies the lack of symmetry of statistical
distribution.
Skewness
▶ The average and dispersion provide the location and scale of the
distribution.
▶ In addition to measures of central tendency and dispersion, we
also need to have an idea about the shape of the distribution.
▶ Measure of skewness studies the lack of symmetry of statistical
distribution.
▶ Lack of symmetry is called skewness for a frequency
distribution.
Skewness
▶ The average and dispersion provide the location and scale of the
distribution.
▶ In addition to measures of central tendency and dispersion, we
also need to have an idea about the shape of the distribution.
▶ Measure of skewness studies the lack of symmetry of statistical
distribution.
▶ Lack of symmetry is called skewness for a frequency
distribution.
▶ In a perfectly symmetrical distribution, mean, median and mode
coincide. Otherwise, the distribution becomes asymmetric.
Skewness
▶ The average and dispersion provide the location and scale of the
distribution.
▶ In addition to measures of central tendency and dispersion, we
also need to have an idea about the shape of the distribution.
▶ Measure of skewness studies the lack of symmetry of statistical
distribution.
▶ Lack of symmetry is called skewness for a frequency
distribution.
▶ In a perfectly symmetrical distribution, mean, median and mode
coincide. Otherwise, the distribution becomes asymmetric.
▶ When the distribution is skewed to the right, the mean is greater
than the mode.
Skewness
▶ The average and dispersion provide the location and scale of the
distribution.
▶ In addition to measures of central tendency and dispersion, we
also need to have an idea about the shape of the distribution.
▶ Measure of skewness studies the lack of symmetry of statistical
distribution.
▶ Lack of symmetry is called skewness for a frequency
distribution.
▶ In a perfectly symmetrical distribution, mean, median and mode
coincide. Otherwise, the distribution becomes asymmetric.
▶ When the distribution is skewed to the right, the mean is greater
than the mode.
▶ When the distribution is skewed to the left, the mean is less than
the mode.
Skewness
Measures of Skewness
m23
β1 =
m32
p m3
γ1 = ± β1 = 3/2
m2
Relative Measures of Skewness
m23
β1 =
m32
p m3
γ1 = ± β1 = 3/2
m2
m23
β1 =
m32
p m3
γ1 = ± β1 = 3/2
m2
m23
β1 =
m32
p m3
γ1 = ± β1 = 3/2
m2
Mean − Mode
Sk =
Standard Deviation
Relative Measures of Skewness
Mean − Mode
Sk =
Standard Deviation
Mean − Mode
Sk =
Standard Deviation
Mean − Mode
Sk =
Standard Deviation
3(Mean − Median)
Sk =
Standard Deviation
Relative Measures of Skewness
(Q3 − Q2 ) − (Q2 − Q1 )
Sk =
Q3 − Q1
Q3 − 2Q2 + Q1
=
Q3 − Q1
Relative Measures of Skewness
(Q3 − Q2 ) − (Q2 − Q1 )
Sk =
Q3 − Q1
Q3 − 2Q2 + Q1
=
Q3 − Q1
(Q3 − Q2 ) − (Q2 − Q1 )
Sk =
Q3 − Q1
Q3 − 2Q2 + Q1
=
Q3 − Q1
m4
β2 =
m22
γ2 = β2 − 3
Measures of Kurtosis
m4
β2 =
m22
γ2 = β2 − 3
m4
β2 =
m22
γ2 = β2 − 3
m4
β2 =
m22
γ2 = β2 − 3