Chapter 1 - Solutions of Exercises
Chapter 1 - Solutions of Exercises
1. If the distribution of the observations is (approximately) symmetric, the difference of the sample median
and the sample mean will be small. So if the difference is large the distribution is either skewed to the
right (𝑥 > 𝑀) or to the left (𝑥 < 𝑀).
40 − 𝑥 90 − 𝑥
3. The 𝑧-score of 40 is − 2 , so 𝑠 = −2 . And the z-score of 90 is 3, so 𝑠 = 3.
This information leads to two equations with the unknown 𝜇 and 𝜎: 40 − 𝑥 = −2𝑠 and
90 − 𝑥 = 3𝑠.
Solving the equations we find: 𝑠 = 10 and 𝑥 = 60.
(But you can also reason that the difference between the observations 90 and 40 is 5 standard deviations
90−40
(3 – (−2) = 5), so 𝑠 = 5 = 10 and the sample mean is 40 + 2 × 10 = 60)
4.
75.0−52,33
b. The largest value: 75.0, its 𝑧-score is 9.22 ≈ 2.46.
35.0−52,33
The smallest value: 35.0, its 𝑧-score is 9.22 ≈= −1.88.
Is the largest offer excessively large?
We know that according to the empirical rule (in symmetric distributions) z-scores with an absolute
values greater than 2 (so |𝑧| > 2) occur with probability 5% (about 1 of 20 observations) and an
absolute value greater than 3 with probability 0.3% (about 1 of 300): for 𝑧-scores of at least 2.46 the
probability is somewhere between 0.3% and 5%: not (very) exceptional.
To be more precise, one can use the standard normal distribution to find the probability
𝑃(|𝑍| > 2.46) = 2(1 − Φ(2.46)) ≈ 1.4%. This means that if you have 70 observations you will
expect 1 observation with such a 𝑧-score. For larger samples this is not an exceptional value.
c. The quartiles for 𝑛 = 50: 50×0.25 = 12.5, so 𝑄1 = 𝑥(13) = 45.3
𝑥(25) +𝑥(26) 50.8+51.2
50×0.5 = 25, so 𝑀 = = = 51.0
2 2
50×0.75 = 37.5, so 𝑄3 = 𝑥(38) = 59.0
The largest and the smallest observations are 75.0 and 35.0.
5-mumber-summary: 35.0, 45.3, 51.0, 59.0, 75.0.
𝑥(45) +𝑥(46)
90𝑡ℎ percentile: 50 × 0.90 = 45 (integer), so 90𝑡ℎ percentile = = 63.1.
2
𝑡ℎ 𝑡ℎ
95 percentile: 50 × 0.95 = 47.5 (integer), so 95 percentile = 𝑥(48) = 70.0.
99𝑡ℎ percentile: 50 × 0.99 = 49.5 (integer), so 99𝑡ℎ percentile = 𝑥(50) = 75.0.
d. 𝐼𝑄𝑅 = 𝑄3 - 𝑄1= 59.0 – 45.3 = 13.7
Outliers are outside (𝑄1- 1.5× 𝐼𝑄𝑅, 𝑄3 + 1.5× 𝐼𝑄𝑅) = (45.3 – 20.55, 59 + 20.55) = (24.75, 79.55)
All observations are inside the interval, so no outliers.
e. Box plot: see below
f. There are 11 observations larger then 60: frequency of the event is 11 and the relative frequency is
11/50 = 22% (an estimate of the probability in the population).
g. Though the graph does not have the perfect shape of a normal distribution, the shape approximates a
symmetric distribution with one peak and mound shaped.
Quartiles and outliers
5.
a. 𝑃(𝑍 ≤ 𝑄3 ) = 0.75, if 𝑄3 ≈ 0.67 (N(0,1)-table)
Because of symmetry we have:
𝑄1 = − 𝑄3 ≈ −0.67
b. 𝐼𝑄𝑅 = 𝑄3 − 𝑄1 = 1.34, so
(𝑄1 − 1.5 × 𝐼𝑄𝑅, 𝑄3 + 1.5 × 𝐼𝑄𝑅)
= (−0.67 − 1.5 × 1.34, 0.67 +2.01)
= (-2.68, +2.68)
c. Using symmetry: 2 × 𝑃(𝑍 ≥ 𝑄3 + 1.5 × 𝐼𝑄𝑅) = 2 × 𝑃(𝑍 ≥ 2.68) = 2[1 − 0.9963] = 0.74%
d. (𝑄1 − 3 × 𝐼𝑄𝑅, 𝑄3 + 3 × 𝐼𝑄𝑅) = (−0.67 − 3 × 1.34, 0.67 + 4.02) = (- 4.69, +4.69)
The probability: 𝑃(𝑍 < 𝑄1 − 3 × 𝐼𝑄𝑅 or 𝑍 > 𝑄3 + 3 × 𝐼𝑄𝑅 ) = 2× 𝑃(𝑍 ≥ 4.69) < 2×0.0001
e. The interval is (𝜇 − 2.68𝜎, 𝜇 + 2.68𝜎) = (100 − 2.68 × 12, 100 + 2.68 × 12) = (67.84,132.16)
The probability remains the same as in c.
6. a.
Both graphs show a skewness to the right with an outlier on the right hand side (528).
b. The sample skewness coefficient 1.453 deviates much from the reference value 0 of the normal
distribution. The sample kurtosis 2.942 is, however close to the reference value 3.
All together we can conclude that a normal distribution is not the most likely distribution for these
observations.
c. The points in the normal Q-Q plot are not positioned, closely and randomly, around the line 𝑦 = 𝑥,
which is an indication of a non-normal distribution.
(In general you should base the assessment of a distribution on several techniques: in this case the
normal Q-Q plot is perhaps the least clear, but the histogram or the box plot and the classic numerical
summary are quite clear: the data cannot be assumed to be normally distributed. The graphs and the
numerical measures show a skewness to the right.
7.
a. First we will determine the 25th and the 75th percentile: Q1 = x(10) = 141 and Q3 = x(29) = 154,
since 25% (75%) of 𝑛 = 38 is 9.5 (28.5).
The Inter Quartile Range 𝐼𝑄𝑅 = 154 – 141 = 13
Possible outliers are outside (Q1 - 1.5× 𝐼𝑄𝑅, Q3 + 1.5× 𝐼𝑄𝑅) = (121.5 , 173.5):
we have one (potential) outlier: 121
b. 1. The numerical summary: sample skewness coefficient = -0.41 deviates somewhat from 0, the
reference value of the normal distribution. The kurtosis = 3.09 is close to the reference value 3.
2. The histogram reveals a distribution which follows the normal distribution roughly, though there
seem to be two peaks. But the histogram is roughly symmetric and mound shaped.
3. The normal Q-Q plot shows points around the straight line 𝑦 = 𝑥: normality seems reasonable.
The total “verdict” based on these three aspects, is that the normal distribution can be assumed a
model of these observations.
8.
a. The histogram has roughly the shape of the exponential distribution: skewed to the right and a peak near
0.
The exponential Q-Q plot (produced by SPSS) should show points close the straight line through the
Origin. We observe some deviations: in the exponential Q-Q plot the values of the ordered observations
X (i ) are first higher than expected and for the larger values are less than expected (under the line 𝑦 = 𝑥)
This pattern in the Q-Q plot is reason to doubt the correctness of en exponential model.
In the histogram the above mentioned deviation return: the left side show more observations than we
would expect fora n exponential distribution.
1 1 1
b. If we assume an exponential distribution, where (𝑋) = 𝜆 , the estimate at hand for is 𝑥 = 7.5113 ≈
0.1331 (see also the output of the histogram).
Statistic Statistic Statistic Statistic Statistic Std. Statistic Statistic Statistic Std. Statistic Std.
Error Error Error
Salary offer 50 40,0 35,0 75,0 52,334 1,3045 9,2242 85,085 ,378 ,337 -,342 ,662
Valid N 50
(listwise)
Explore:
Descriptives
Median 51,000
Minimum 35,0
Maximum 75,0
Range 40,0
Percentiles
Percentiles
5 10 25 50 75 90 95
Extreme Values
1 50 75,0
2 49 72,3
Highest 3 48 70,0
4 47 65,4
5 46 63,2
Salary offer
1 1 35,0
2 2 39,2
Lowest 3 3 39,6
4 4 39,9
5 5 40,0