Lecture Note 2
Lecture Note 2
Lecture 2
1
is peaked or flat relative to a normal distribution. Kurtosis is based on the size of a
distribution’s tails. Positive kurtosis indicates too few observations in the tails, whereas
negative kurtosis indicates too many observations in the tail of the distribution.
For a data set, the median is the middle number of the ordered data set. If the data
set has an even number of elements, then the median is the average of the middle two
numbers.
The lower quartile is the middle number of the half of the data below the median,
and the upper quartile is the middle number of the half of the data above the median.
We will denote:
Q1 = lower quartile
Q2 = M = middle quartile (median)
Q3 = upper quartile
The difference between the quartiles is called the interquartile range (IQR):
IQR = Q3 − Q1 .
A possible outlier (mild outlier) is any data point that lies below
Q1 − 1.5 × IQR
or above
Q3 + 1.5 × IQR.
The mode is another commonly used measure of central tendency. It indicates where the
data tend to concentrate most.
The mode is the most frequently occurring member of the data set. If all the data
values are different, then by definition, the data set has no mode.
Example The following data give the time in months from hire to promotion to manager
for a random sample of 25 software engineers from all software engineers employed by a
large telecommunications firm.
5, 7, 229, 453, 12, 14, 18, 14, 14, 483, 22, 21, 25, 23, 24, 34, 37, 34, 49, 64, 47, 67, 69, 192, 125
Calculate the mean, median, mode, variance, and standard deviation for this sample.
Solution: The sample mean is:
n
1X
x̄ = xi = 83.28 months
n i=1
To obtain the median, first arrange the data in ascending order:
5, 7, 12, 14, 14, 14, 18, 21, 22, 23, 24, 25, 34, 34, 37, 47, 49, 64, 67, 69, 125, 192, 229, 453, 483
2
And the sample standard deviation is:
√
s= s2 = 128.36 months
Remark Note that the mean is very much different from the other two measures of center
because of a few large data values.
Box Plots
The sample mean or the sample standard deviation focuses on a single aspect of the data
set, whereas histograms express rather general ideas about data.
A pictorial summary called a box plot (also called box-and-whisker plots) can be used
to describe several prominent features of a data set, such as:
• the center,
• the spread,
• identification of outliers.
Construction Procedure
• Draw a vertical measurement axis and mark Q1 , Q2 (median), and Q3 on this axis
as shown in Figure 1.
• Construct a rectangular box whose bottom edge lies at the lower quartile Q1 and
whose upper edge lies at the upper quartile Q3 .
• Draw a horizontal line segment inside the box through the median Q2 .
• Extend the lines from each end of the box out to the farthest observation that is
still within 1.5 × IQR of the corresponding edge. These lines are called whiskers.
• Draw an open circle (or asterisks ∗) to identify each observation that falls between
1.5 × IQR and 3 × IQR from the edge to which it is closest; these are called mild
outliers.
• Draw a solid circle to identify each observation that falls more than 3 × IQR from
the closest edge; these are called extreme outliers.
3
Figure 1: Box-and-whiskers plot
Example The following data identify the time (in months) from hire to promotion
to chief pharmacist for a random sample of 25 employees from a large corporation of
drugstores:
5, 7, 12, 14, 14, 14, 18, 21, 22, 23, 24, 25, 34, 34, 37, 47, 49, 64, 67, 69, 125, 192, 229, 453, 483.
Construct a box plot. Do the data appear to be symmetrically distributed along the
measurement axis?
Solution: Referring to the data:
IQR = Q3 − Q1 = 68 − 16 = 52.
4
Figure 2: Box plot
5
Revision of Probability Distribution
(b) Any performance of the experiment results in an outcome that is not known in
advance.
Let Ω be a nonempty set, and let P (Ω) ≡ {A : A ⊂ Ω} be the power set of Ω, i.e.,
the class of all subsets of Ω.
A collection of sets S ⊂ P (Ω) is called an algebra if:
(a) Ω ∈ S,
(b) A ∈ S implies Ac ∈ S,
6
natural extension of the classical view of probability. This definition was developed as a
result of the work by R. von Mises in 1936.
Frequency Definition of Probability The probability of an outcome (event) is the
proportion of times the outcome (event) would occur in a long run of repeated experi-
ments.
For example, to find the probability of heads (H) using a biased coin, we would
imagine the coin is repeatedly tossed. Let n(H) be the number of times H appears in n
trials. Then the probability of heads is defined as:
n(H)
P (H) = lim .
n→∞ n
The frequency interpretation of probability is often useful. However, it is not com-
plete. Because of the condition of repetition under identical circumstances, the frequency
definition of probability is not applicable to every event. For a more complete picture, it
makes sense to develop the probability theory through axioms. Now we will define prob-
abilities axiomatically. This definition results from the 1933 studies of A.N. Kolmogorov.
Probability Axioms
Let (Ω, S) be a sample space. A set function P defined on S is called a probability measure
(or simply, probability) if it satisfies the following conditions:
2. P (Ω) = 1.
Aj ∩ Ak = ∅ for j ̸= k,
S∞
where we have used the notation j=1 Aj to denote the union of disjoint sets Aj .
We call P (A) the probability of event A. Property (3) is called countable additivity.
That P (∅) = 0 and that P is also finitely additive follows from it.
The triple (Ω, S, P ) is called a probability space.