0% found this document useful (0 votes)

5 views7 pages

Lecture Note 2

The document covers statistical methods, focusing on numerical descriptions of data, including mean, variance, standard deviation, skewness, and kurtosis. It also discusses box plots for data visualization and introduces concepts of probability, including classical and frequency definitions, along with axioms of probability. Examples and calculations illustrate these statistical concepts using sample data.

Uploaded by

Kabir Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views7 pages

Lecture Note 2

Uploaded by

Kabir Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

MTL390: Statistical Methods

Instructure: Dr. Biplab Paul

January 7, 2025

Lecture 2

Basic concepts and Data Visualization (cont.)

Numerical Description of Data

Let x1 , x2 , . . . , xn be a set of sample values. Then the sample mean (or empirical mean)
x̄ is defined by
n
1X
x̄ = xi .
n i=1
The sample variance is defined by
n
2 1 X
s = (xi − x̄)2 .
n − 1 i=1

The sample standard deviation is

√
s= s2 .

The sample skewness is defined by

1
Pn
n i=1 (xi − x̄)3
b1 = .
s3
Skewness is a measure of symmetry, or more precisely, the lack of symmetry. If b1 = 0,
then the distribution is symmetric about the mean. If b1 > 0, the distribution has a
longer right tail, and if b1 < 0, the distribution has a longer left tail. Thus, the skewness
of a normal distribution is zero.
The sample kurthosis is Pn
1 4
n i=1 (xi − x̄)
g1 = .
s4
The kurtosis for a standard normal distribution is three. For this reason, some sources
use the following definition of kurtosis (often referred to as ”excess kurtosis”):
1
Pn 4
n i=1 (xi − x̄)
Excess Kurtosis, g2 = − 3.
s4
This definition is used so that the standard normal distribution has a kurtosis of zero.
In addition, with the second definition, kurtosis is a measure of whether the distribution

1
is peaked or flat relative to a normal distribution. Kurtosis is based on the size of a
distribution’s tails. Positive kurtosis indicates too few observations in the tails, whereas
negative kurtosis indicates too many observations in the tail of the distribution.
For a data set, the median is the middle number of the ordered data set. If the data
set has an even number of elements, then the median is the average of the middle two
numbers.
The lower quartile is the middle number of the half of the data below the median,
and the upper quartile is the middle number of the half of the data above the median.
We will denote:
Q1 = lower quartile
Q2 = M = middle quartile (median)
Q3 = upper quartile
The difference between the quartiles is called the interquartile range (IQR):

IQR = Q3 − Q1 .

A possible outlier (mild outlier) is any data point that lies below

Q1 − 1.5 × IQR

or above
Q3 + 1.5 × IQR.
The mode is another commonly used measure of central tendency. It indicates where the
data tend to concentrate most.
The mode is the most frequently occurring member of the data set. If all the data
values are different, then by definition, the data set has no mode.
Example The following data give the time in months from hire to promotion to manager
for a random sample of 25 software engineers from all software engineers employed by a
large telecommunications firm.

5, 7, 229, 453, 12, 14, 18, 14, 14, 483, 22, 21, 25, 23, 24, 34, 37, 34, 49, 64, 47, 67, 69, 192, 125

Calculate the mean, median, mode, variance, and standard deviation for this sample.
Solution: The sample mean is:
n
1X
x̄ = xi = 83.28 months
n i=1
To obtain the median, first arrange the data in ascending order:

5, 7, 12, 14, 14, 14, 18, 21, 22, 23, 24, 25, 34, 34, 37, 47, 49, 64, 67, 69, 125, 192, 229, 453, 483

Now the median is the thirteenth number, which is 34 months.

Since 14 occurs most often (thrice), the mode is 14 months.
The sample variance is:
n
2 1 X 1
(xi −x̄)2 = (5 − 83.28)2 + (7 − 83.28)2 + · · · + (125 − 83.28)2 = 16, 478

s =
n − 1 i=1 24

2
And the sample standard deviation is:
√
s= s2 = 128.36 months
Remark Note that the mean is very much different from the other two measures of center
because of a few large data values.

Box Plots
The sample mean or the sample standard deviation focuses on a single aspect of the data
set, whereas histograms express rather general ideas about data.
A pictorial summary called a box plot (also called box-and-whisker plots) can be used
to describe several prominent features of a data set, such as:

• the center,

• the spread,

• the extent and nature of any departure from symmetry, and

• identification of outliers.

Construction Procedure
• Draw a vertical measurement axis and mark Q1 , Q2 (median), and Q3 on this axis
as shown in Figure 1.

• Construct a rectangular box whose bottom edge lies at the lower quartile Q1 and
whose upper edge lies at the upper quartile Q3 .

• Draw a horizontal line segment inside the box through the median Q2 .

• Extend the lines from each end of the box out to the farthest observation that is
still within 1.5 × IQR of the corresponding edge. These lines are called whiskers.

• Draw an open circle (or asterisks ∗) to identify each observation that falls between
1.5 × IQR and 3 × IQR from the edge to which it is closest; these are called mild
outliers.

• Draw a solid circle to identify each observation that falls more than 3 × IQR from
the closest edge; these are called extreme outliers.

3
Figure 1: Box-and-whiskers plot

Example The following data identify the time (in months) from hire to promotion
to chief pharmacist for a random sample of 25 employees from a large corporation of
drugstores:

5, 7, 12, 14, 14, 14, 18, 21, 22, 23, 24, 25, 34, 34, 37, 47, 49, 64, 67, 69, 125, 192, 229, 453, 483.

Construct a box plot. Do the data appear to be symmetrically distributed along the
measurement axis?
Solution: Referring to the data:

• The median is Q2 = 34.

• The lower quartile is Q1 = 14+18

2
= 16.

• The upper quartile is Q3 = 67+69

2
= 68.

The interquartile range is:

IQR = Q3 − Q1 = 68 − 16 = 52.

To find outliers, compute:

Q1 − 1.5 · IQR = 16 − 1.5 · 52 = −62

Q3 + 1.5 · IQR = 68 + 1.5 · 52 = 146.

Data points greater than 146 are outliers: 192, 229, 453, 483.

4
Figure 2: Box plot

5
Revision of Probability Distribution

A random (or statistical) experiment is an experiment in which:

(a) All outcomes of the experiment are known in advance.

(b) Any performance of the experiment results in an outcome that is not known in
advance.

(c) The experiment can be repeated under identical conditions.

In probability theory, we study the uncertainty of a random experiment. It is convenient
to associate with each such experiment a set Ω, the set of all possible outcomes of the
experiment.
The sample space of a statistical experiment is a pair (Ω, S), where:
(a) Ω is the set of all possible outcomes of the experiment.

(b) S is a σ-algebra of subsets of Ω.

The elements of Ω are called sample points. Any set A ∈ S is known as an event.
Clearly, A is a collection of sample points. We say that an event A happens if the outcome
of the experiment corresponds to a point in A. Each one-point set is known as a simple
or elementary event.
Two events A and B are said to be mutually exclusive or disjoint if A ∩ B = ∅.
Mutually exclusive events cannot happen together.

Let Ω be a nonempty set, and let P (Ω) ≡ {A : A ⊂ Ω} be the power set of Ω, i.e.,
the class of all subsets of Ω.
A collection of sets S ⊂ P (Ω) is called an algebra if:
(a) Ω ∈ S,

(b) A ∈ S implies Ac ∈ S,

(c) A, B ∈ S implies A ∪ B ∈ S (i.e., closure under pairwise unions).

A class S ⊂ P (Ω) is called a σ-algebra if it is an algebra and satisfies:
S
(d) An ∈ S for n ≥ 1 =⇒ n≥1 An ∈ S.

Classical Definition of Probability If there are n equally likely possibilities, of which

one must occur, and m of these are regarded as favorable to an event (or as a ”success”),
then the probability of the event (or a ”success”) is given by:
m
P (event) = .
n
The classical probability concept is not applicable in situations where the various pos-
sibilities cannot be regarded as equally likely. Suppose we are interested in whether or
not it will rain on a given day with known meteorological conditions. Clearly, we cannot
assume that the events of rain or no rain are equally likely. In such cases, one could use
the so-called frequency interpretation of probability. The frequentistic view is a

6
natural extension of the classical view of probability. This definition was developed as a
result of the work by R. von Mises in 1936.
Frequency Definition of Probability The probability of an outcome (event) is the
proportion of times the outcome (event) would occur in a long run of repeated experi-
ments.
For example, to find the probability of heads (H) using a biased coin, we would
imagine the coin is repeatedly tossed. Let n(H) be the number of times H appears in n
trials. Then the probability of heads is defined as:

n(H)
P (H) = lim .
n→∞ n
The frequency interpretation of probability is often useful. However, it is not com-
plete. Because of the condition of repetition under identical circumstances, the frequency
definition of probability is not applicable to every event. For a more complete picture, it
makes sense to develop the probability theory through axioms. Now we will define prob-
abilities axiomatically. This definition results from the 1933 studies of A.N. Kolmogorov.

Probability Axioms
Let (Ω, S) be a sample space. A set function P defined on S is called a probability measure
(or simply, probability) if it satisfies the following conditions:

1. P (A) ≥ 0 for all A ∈ S.

2. P (Ω) = 1.

3. Let {Aj }, Aj ∈ S, j = 1, 2, . . ., be a disjoint sequence of sets; that is,

Aj ∩ Ak = ∅ for j ̸= k,

where ∅ is the null set. Then

∞
! ∞
[ X
P Aj = P (Aj ),
j=1 j=1

S∞
where we have used the notation j=1 Aj to denote the union of disjoint sets Aj .

We call P (A) the probability of event A. Property (3) is called countable additivity.
That P (∅) = 0 and that P is also finitely additive follows from it.
The triple (Ω, S, P ) is called a probability space.

Stat I Chapter 3
No ratings yet
Stat I Chapter 3
48 pages
Measures of Dispersion
No ratings yet
Measures of Dispersion
59 pages
TỔNG HỢP CÔNG THỨC THỊ TRƯỜNG CHỨNG KHOÁN
No ratings yet
TỔNG HỢP CÔNG THỨC THỊ TRƯỜNG CHỨNG KHOÁN
4 pages
302 - Unit-2 Data Representation and Sampling Technique
No ratings yet
302 - Unit-2 Data Representation and Sampling Technique
25 pages
Moments, Skewness & Kurtosis
No ratings yet
Moments, Skewness & Kurtosis
9 pages
2a Notes Measures of Dispersion
No ratings yet
2a Notes Measures of Dispersion
10 pages
SmartPLS Report
No ratings yet
SmartPLS Report
201 pages
ZC-417 Quantitative Methods Exam Notes
No ratings yet
ZC-417 Quantitative Methods Exam Notes
144 pages
00 Probability 2
No ratings yet
00 Probability 2
19 pages
Describing Data 3 4
No ratings yet
Describing Data 3 4
17 pages
Lecture Slides - Capítulo 02
No ratings yet
Lecture Slides - Capítulo 02
21 pages
CGPA of UMP Students回复
No ratings yet
CGPA of UMP Students回复
55 pages
Statistics
No ratings yet
Statistics
36 pages
Quant Descriptive Statistics
No ratings yet
Quant Descriptive Statistics
37 pages
TSA Theory Part1
No ratings yet
TSA Theory Part1
98 pages
Lecture-2 Descriptive Statistics-Box Plot Descriptive Measures
No ratings yet
Lecture-2 Descriptive Statistics-Box Plot Descriptive Measures
44 pages
Probab, Stats
No ratings yet
Probab, Stats
17 pages
Lecture 2 Slides With Q&A 20242025
No ratings yet
Lecture 2 Slides With Q&A 20242025
38 pages
Notes 03
No ratings yet
Notes 03
21 pages
Chapter 2
No ratings yet
Chapter 2
46 pages
FDSA Unit 2
No ratings yet
FDSA Unit 2
44 pages
Lecture 3 Numerical Measures of Data
No ratings yet
Lecture 3 Numerical Measures of Data
36 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
51 pages
Algorithms For Calculating Variance - Welford Method
No ratings yet
Algorithms For Calculating Variance - Welford Method
11 pages
Week 4
No ratings yet
Week 4
18 pages
Part 1 QT
No ratings yet
Part 1 QT
40 pages
Chapter 8 Measure of Dispersion For Ungrouped Data
No ratings yet
Chapter 8 Measure of Dispersion For Ungrouped Data
24 pages
Rebound Hammer Summary
No ratings yet
Rebound Hammer Summary
39 pages
Unit 3 R As A Set of Statistical Tables
No ratings yet
Unit 3 R As A Set of Statistical Tables
31 pages
Statistics 1
No ratings yet
Statistics 1
10 pages
Lecture 4 Copy 1
No ratings yet
Lecture 4 Copy 1
13 pages
BS Important Questions With Answers-2
No ratings yet
BS Important Questions With Answers-2
17 pages
Alasan R2
No ratings yet
Alasan R2
24 pages
Prob and Stats Notes
No ratings yet
Prob and Stats Notes
12 pages
Self Quiz U4
No ratings yet
Self Quiz U4
7 pages
M2. Understanding A Data Set II
No ratings yet
M2. Understanding A Data Set II
33 pages
Descriptive Statistics PDF
100% (1)
Descriptive Statistics PDF
40 pages
Chapter 3 (Technical English For Statistics)
No ratings yet
Chapter 3 (Technical English For Statistics)
8 pages
Revision Module 1,2,3
No ratings yet
Revision Module 1,2,3
129 pages
Module Wise Important Formulae
No ratings yet
Module Wise Important Formulae
45 pages
Module 1 Overview - of - Statistics
No ratings yet
Module 1 Overview - of - Statistics
11 pages
Lecture 5&6
No ratings yet
Lecture 5&6
15 pages
Lecture4 Slides
No ratings yet
Lecture4 Slides
13 pages
Lean Six Sigma
No ratings yet
Lean Six Sigma
39 pages
Lec 1
No ratings yet
Lec 1
54 pages
Module 7 - Measures of Variability
No ratings yet
Module 7 - Measures of Variability
16 pages
Review Question - C3 - SACR3080
No ratings yet
Review Question - C3 - SACR3080
10 pages
Lecture 7 9
No ratings yet
Lecture 7 9
16 pages
1 Intro-Statistics
No ratings yet
1 Intro-Statistics
61 pages
MMW L10
No ratings yet
MMW L10
2 pages
Skewness, Moments and Kurtosis
No ratings yet
Skewness, Moments and Kurtosis
15 pages
Dsbda Unit 2
No ratings yet
Dsbda Unit 2
155 pages
SMA 2104 Maths For Sciences
No ratings yet
SMA 2104 Maths For Sciences
5 pages
Form 4 Maths Sir Fathi 28.07.2024
No ratings yet
Form 4 Maths Sir Fathi 28.07.2024
4 pages
Lecture-6: Introduction To Data Science
No ratings yet
Lecture-6: Introduction To Data Science
25 pages
CH 2 Lecture Notes
No ratings yet
CH 2 Lecture Notes
12 pages
Atp Examples
No ratings yet
Atp Examples
42 pages
Exercises
No ratings yet
Exercises
9 pages
Business Statistics: Measures of Central Tendency
No ratings yet
Business Statistics: Measures of Central Tendency
44 pages
PP 01 Soln
No ratings yet
PP 01 Soln
10 pages
Salary (Y) Years of Experience (X) : Regression Statistics
No ratings yet
Salary (Y) Years of Experience (X) : Regression Statistics
17 pages
Stats Notes by Warad
No ratings yet
Stats Notes by Warad
5 pages
New MPS 2017-2018
No ratings yet
New MPS 2017-2018
40 pages
Lecture Four - Measures of Dispersion
No ratings yet
Lecture Four - Measures of Dispersion
6 pages
Unit 2 Measures of Dispersion1
No ratings yet
Unit 2 Measures of Dispersion1
6 pages
Class Notes v1
No ratings yet
Class Notes v1
4 pages
Lesson II: Measures of Variability: Example 1
No ratings yet
Lesson II: Measures of Variability: Example 1
21 pages
Notes PDF
No ratings yet
Notes PDF
54 pages
DWDM Unit-2
No ratings yet
DWDM Unit-2
20 pages
Types of Statistics
No ratings yet
Types of Statistics
7 pages
Prob and Stats Notes PDF
No ratings yet
Prob and Stats Notes PDF
12 pages
CHIFAMBA MELBAH R213866M .Test Inclass
No ratings yet
CHIFAMBA MELBAH R213866M .Test Inclass
3 pages
Your Answer: average: µ±3σ covers - - - - - - - - - - of the items in a data set
No ratings yet
Your Answer: average: µ±3σ covers - - - - - - - - - - of the items in a data set
3 pages
Measures of Variability PDF
No ratings yet
Measures of Variability PDF
39 pages
Bio Statistics 3
No ratings yet
Bio Statistics 3
13 pages
SALMAN ALAM SHAH - Definitions of Statistics
No ratings yet
SALMAN ALAM SHAH - Definitions of Statistics
16 pages
Measures of Location
No ratings yet
Measures of Location
6 pages
Measures of Central Tendency: Mean
No ratings yet
Measures of Central Tendency: Mean
7 pages
Introduction To Probability and Statistics Thirteenth Edition
No ratings yet
Introduction To Probability and Statistics Thirteenth Edition
46 pages
Math in The Modern World Stat Lecture
No ratings yet
Math in The Modern World Stat Lecture
3 pages
CE Module 8 - Statistics and Probability (Principles)
No ratings yet
CE Module 8 - Statistics and Probability (Principles)
3 pages
HOMW Chap8&9Fall2009
No ratings yet
HOMW Chap8&9Fall2009
4 pages
552 Notes 1
No ratings yet
552 Notes 1
35 pages
Staticus: Math 103 Lecture 9 Class Notes
No ratings yet
Staticus: Math 103 Lecture 9 Class Notes
4 pages
Probability & Statistics Facts and Formulae: Guides To Statistical Information 1
No ratings yet
Probability & Statistics Facts and Formulae: Guides To Statistical Information 1
4 pages
Measures of Central Tendency
100% (15)
Measures of Central Tendency
15 pages
Digital Signal Processing (DSP) with Python Programming
From Everand
Digital Signal Processing (DSP) with Python Programming
Maurice Charbit
No ratings yet
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)

Lecture Note 2

Uploaded by

Lecture Note 2

Uploaded by

MTL390: Statistical Methods

Instructure: Dr. Biplab Paul

Basic concepts and Data Visualization (cont.)

Numerical Description of Data

The sample standard deviation is

The sample skewness is defined by

Now the median is the thirteenth number, which is 34 months.

• the extent and nature of any departure from symmetry, and

• The median is Q2 = 34.

• The lower quartile is Q1 = 14+18

• The upper quartile is Q3 = 67+69

The interquartile range is:

To find outliers, compute:

Q1 − 1.5 · IQR = 16 − 1.5 · 52 = −62

Q3 + 1.5 · IQR = 68 + 1.5 · 52 = 146.

A random (or statistical) experiment is an experiment in which:

(c) The experiment can be repeated under identical conditions.

(b) S is a σ-algebra of subsets of Ω.

(c) A, B ∈ S implies A ∪ B ∈ S (i.e., closure under pairwise unions).

Classical Definition of Probability If there are n equally likely possibilities, of which

1. P (A) ≥ 0 for all A ∈ S.

3. Let {Aj }, Aj ∈ S, j = 1, 2, . . ., be a disjoint sequence of sets; that is,

where ∅ is the null set. Then

You might also like