0% found this document useful (0 votes)
5 views7 pages

Lecture Note 2

The document covers statistical methods, focusing on numerical descriptions of data, including mean, variance, standard deviation, skewness, and kurtosis. It also discusses box plots for data visualization and introduces concepts of probability, including classical and frequency definitions, along with axioms of probability. Examples and calculations illustrate these statistical concepts using sample data.

Uploaded by

Kabir Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views7 pages

Lecture Note 2

The document covers statistical methods, focusing on numerical descriptions of data, including mean, variance, standard deviation, skewness, and kurtosis. It also discusses box plots for data visualization and introduces concepts of probability, including classical and frequency definitions, along with axioms of probability. Examples and calculations illustrate these statistical concepts using sample data.

Uploaded by

Kabir Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

MTL390: Statistical Methods

Instructure: Dr. Biplab Paul


January 7, 2025

Lecture 2

Basic concepts and Data Visualization (cont.)

Numerical Description of Data


Let x1 , x2 , . . . , xn be a set of sample values. Then the sample mean (or empirical mean)
x̄ is defined by
n
1X
x̄ = xi .
n i=1
The sample variance is defined by
n
2 1 X
s = (xi − x̄)2 .
n − 1 i=1

The sample standard deviation is



s= s2 .

The sample skewness is defined by


1
Pn
n i=1 (xi − x̄)3
b1 = .
s3
Skewness is a measure of symmetry, or more precisely, the lack of symmetry. If b1 = 0,
then the distribution is symmetric about the mean. If b1 > 0, the distribution has a
longer right tail, and if b1 < 0, the distribution has a longer left tail. Thus, the skewness
of a normal distribution is zero.
The sample kurthosis is Pn
1 4
n i=1 (xi − x̄)
g1 = .
s4
The kurtosis for a standard normal distribution is three. For this reason, some sources
use the following definition of kurtosis (often referred to as ”excess kurtosis”):
1
Pn 4
n i=1 (xi − x̄)
Excess Kurtosis, g2 = − 3.
s4
This definition is used so that the standard normal distribution has a kurtosis of zero.
In addition, with the second definition, kurtosis is a measure of whether the distribution

1
is peaked or flat relative to a normal distribution. Kurtosis is based on the size of a
distribution’s tails. Positive kurtosis indicates too few observations in the tails, whereas
negative kurtosis indicates too many observations in the tail of the distribution.
For a data set, the median is the middle number of the ordered data set. If the data
set has an even number of elements, then the median is the average of the middle two
numbers.
The lower quartile is the middle number of the half of the data below the median,
and the upper quartile is the middle number of the half of the data above the median.
We will denote:
Q1 = lower quartile
Q2 = M = middle quartile (median)
Q3 = upper quartile
The difference between the quartiles is called the interquartile range (IQR):

IQR = Q3 − Q1 .

A possible outlier (mild outlier) is any data point that lies below

Q1 − 1.5 × IQR

or above
Q3 + 1.5 × IQR.
The mode is another commonly used measure of central tendency. It indicates where the
data tend to concentrate most.
The mode is the most frequently occurring member of the data set. If all the data
values are different, then by definition, the data set has no mode.
Example The following data give the time in months from hire to promotion to manager
for a random sample of 25 software engineers from all software engineers employed by a
large telecommunications firm.

5, 7, 229, 453, 12, 14, 18, 14, 14, 483, 22, 21, 25, 23, 24, 34, 37, 34, 49, 64, 47, 67, 69, 192, 125

Calculate the mean, median, mode, variance, and standard deviation for this sample.
Solution: The sample mean is:
n
1X
x̄ = xi = 83.28 months
n i=1
To obtain the median, first arrange the data in ascending order:

5, 7, 12, 14, 14, 14, 18, 21, 22, 23, 24, 25, 34, 34, 37, 47, 49, 64, 67, 69, 125, 192, 229, 453, 483

Now the median is the thirteenth number, which is 34 months.


Since 14 occurs most often (thrice), the mode is 14 months.
The sample variance is:
n
2 1 X 1 
(xi −x̄)2 = (5 − 83.28)2 + (7 − 83.28)2 + · · · + (125 − 83.28)2 = 16, 478

s =
n − 1 i=1 24

2
And the sample standard deviation is:

s= s2 = 128.36 months
Remark Note that the mean is very much different from the other two measures of center
because of a few large data values.

Box Plots
The sample mean or the sample standard deviation focuses on a single aspect of the data
set, whereas histograms express rather general ideas about data.
A pictorial summary called a box plot (also called box-and-whisker plots) can be used
to describe several prominent features of a data set, such as:

• the center,

• the spread,

• the extent and nature of any departure from symmetry, and

• identification of outliers.

Construction Procedure
• Draw a vertical measurement axis and mark Q1 , Q2 (median), and Q3 on this axis
as shown in Figure 1.

• Construct a rectangular box whose bottom edge lies at the lower quartile Q1 and
whose upper edge lies at the upper quartile Q3 .

• Draw a horizontal line segment inside the box through the median Q2 .

• Extend the lines from each end of the box out to the farthest observation that is
still within 1.5 × IQR of the corresponding edge. These lines are called whiskers.

• Draw an open circle (or asterisks ∗) to identify each observation that falls between
1.5 × IQR and 3 × IQR from the edge to which it is closest; these are called mild
outliers.

• Draw a solid circle to identify each observation that falls more than 3 × IQR from
the closest edge; these are called extreme outliers.

3
Figure 1: Box-and-whiskers plot

Example The following data identify the time (in months) from hire to promotion
to chief pharmacist for a random sample of 25 employees from a large corporation of
drugstores:

5, 7, 12, 14, 14, 14, 18, 21, 22, 23, 24, 25, 34, 34, 37, 47, 49, 64, 67, 69, 125, 192, 229, 453, 483.

Construct a box plot. Do the data appear to be symmetrically distributed along the
measurement axis?
Solution: Referring to the data:

• The median is Q2 = 34.

• The lower quartile is Q1 = 14+18


2
= 16.

• The upper quartile is Q3 = 67+69


2
= 68.

The interquartile range is:

IQR = Q3 − Q1 = 68 − 16 = 52.

To find outliers, compute:

Q1 − 1.5 · IQR = 16 − 1.5 · 52 = −62

Q3 + 1.5 · IQR = 68 + 1.5 · 52 = 146.


Data points greater than 146 are outliers: 192, 229, 453, 483.

4
Figure 2: Box plot

5
Revision of Probability Distribution

A random (or statistical) experiment is an experiment in which:


(a) All outcomes of the experiment are known in advance.

(b) Any performance of the experiment results in an outcome that is not known in
advance.

(c) The experiment can be repeated under identical conditions.


In probability theory, we study the uncertainty of a random experiment. It is convenient
to associate with each such experiment a set Ω, the set of all possible outcomes of the
experiment.
The sample space of a statistical experiment is a pair (Ω, S), where:
(a) Ω is the set of all possible outcomes of the experiment.

(b) S is a σ-algebra of subsets of Ω.


The elements of Ω are called sample points. Any set A ∈ S is known as an event.
Clearly, A is a collection of sample points. We say that an event A happens if the outcome
of the experiment corresponds to a point in A. Each one-point set is known as a simple
or elementary event.
Two events A and B are said to be mutually exclusive or disjoint if A ∩ B = ∅.
Mutually exclusive events cannot happen together.

Let Ω be a nonempty set, and let P (Ω) ≡ {A : A ⊂ Ω} be the power set of Ω, i.e.,
the class of all subsets of Ω.
A collection of sets S ⊂ P (Ω) is called an algebra if:
(a) Ω ∈ S,

(b) A ∈ S implies Ac ∈ S,

(c) A, B ∈ S implies A ∪ B ∈ S (i.e., closure under pairwise unions).


A class S ⊂ P (Ω) is called a σ-algebra if it is an algebra and satisfies:
S
(d) An ∈ S for n ≥ 1 =⇒ n≥1 An ∈ S.

Classical Definition of Probability If there are n equally likely possibilities, of which


one must occur, and m of these are regarded as favorable to an event (or as a ”success”),
then the probability of the event (or a ”success”) is given by:
m
P (event) = .
n
The classical probability concept is not applicable in situations where the various pos-
sibilities cannot be regarded as equally likely. Suppose we are interested in whether or
not it will rain on a given day with known meteorological conditions. Clearly, we cannot
assume that the events of rain or no rain are equally likely. In such cases, one could use
the so-called frequency interpretation of probability. The frequentistic view is a

6
natural extension of the classical view of probability. This definition was developed as a
result of the work by R. von Mises in 1936.
Frequency Definition of Probability The probability of an outcome (event) is the
proportion of times the outcome (event) would occur in a long run of repeated experi-
ments.
For example, to find the probability of heads (H) using a biased coin, we would
imagine the coin is repeatedly tossed. Let n(H) be the number of times H appears in n
trials. Then the probability of heads is defined as:

n(H)
P (H) = lim .
n→∞ n
The frequency interpretation of probability is often useful. However, it is not com-
plete. Because of the condition of repetition under identical circumstances, the frequency
definition of probability is not applicable to every event. For a more complete picture, it
makes sense to develop the probability theory through axioms. Now we will define prob-
abilities axiomatically. This definition results from the 1933 studies of A.N. Kolmogorov.

Probability Axioms
Let (Ω, S) be a sample space. A set function P defined on S is called a probability measure
(or simply, probability) if it satisfies the following conditions:

1. P (A) ≥ 0 for all A ∈ S.

2. P (Ω) = 1.

3. Let {Aj }, Aj ∈ S, j = 1, 2, . . ., be a disjoint sequence of sets; that is,

Aj ∩ Ak = ∅ for j ̸= k,

where ∅ is the null set. Then



! ∞
[ X
P Aj = P (Aj ),
j=1 j=1

S∞
where we have used the notation j=1 Aj to denote the union of disjoint sets Aj .

We call P (A) the probability of event A. Property (3) is called countable additivity.
That P (∅) = 0 and that P is also finitely additive follows from it.
The triple (Ω, S, P ) is called a probability space.

You might also like