0% found this document useful (0 votes)
79 views4 pages

Chapter 1 - Solutions of Exercises

The document provides solutions to exercises involving descriptive statistics such as calculating measures of central tendency, dispersion, outliers and distributions from sample data. It demonstrates how to identify skewed distributions based on differences between the mean and median, determine standard deviation and the mean from z-scores, and assess normality using numerical summaries, histograms, normal Q-Q plots, and evaluating outliers. Methods for calculating quartiles, percentiles, and the interquartile range from a sample are also illustrated.

Uploaded by

DanyValentin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views4 pages

Chapter 1 - Solutions of Exercises

The document provides solutions to exercises involving descriptive statistics such as calculating measures of central tendency, dispersion, outliers and distributions from sample data. It demonstrates how to identify skewed distributions based on differences between the mean and median, determine standard deviation and the mean from z-scores, and assess normality using numerical summaries, histograms, normal Q-Q plots, and evaluating outliers. Methods for calculating quartiles, percentiles, and the interquartile range from a sample are also illustrated.

Uploaded by

DanyValentin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Chapter 1 – Solutions of exercises

1. If the distribution of the observations is (approximately) symmetric, the difference of the sample median
and the sample mean will be small. So if the difference is large the distribution is either skewed to the
right (𝑥 > 𝑀) or to the left (𝑥 < 𝑀).

2. Sample Sample Sample standard Sample


mean median deviation variance
𝑥 𝑀 𝑠 𝑠2
a. 2.5 3 3.15 9.9
b. 3.08 3 1.19 1.41
c. 49.6 49 8.77 76.93

40 − 𝑥 90 − 𝑥
3. The 𝑧-score of 40 is − 2 , so 𝑠 = −2 . And the z-score of 90 is 3, so 𝑠 = 3.
This information leads to two equations with the unknown 𝜇 and 𝜎: 40 − 𝑥 = −2𝑠 and
90 − 𝑥 = 3𝑠.
Solving the equations we find: 𝑠 = 10 and 𝑥 = 60.
(But you can also reason that the difference between the observations 90 and 40 is 5 standard deviations
90−40
(3 – (−2) = 5), so 𝑠 = 5 = 10 and the sample mean is 40 + 2 × 10 = 60)

4.
75.0−52,33
b. The largest value: 75.0, its 𝑧-score is 9.22 ≈ 2.46.
35.0−52,33
The smallest value: 35.0, its 𝑧-score is 9.22 ≈= −1.88.
Is the largest offer excessively large?
We know that according to the empirical rule (in symmetric distributions) z-scores with an absolute
values greater than 2 (so |𝑧| > 2) occur with probability 5% (about 1 of 20 observations) and an
absolute value greater than 3 with probability 0.3% (about 1 of 300): for 𝑧-scores of at least 2.46 the
probability is somewhere between 0.3% and 5%: not (very) exceptional.
To be more precise, one can use the standard normal distribution to find the probability
𝑃(|𝑍| > 2.46) = 2(1 − Φ(2.46)) ≈ 1.4%. This means that if you have 70 observations you will
expect 1 observation with such a 𝑧-score. For larger samples this is not an exceptional value.
c. The quartiles for 𝑛 = 50: 50×0.25 = 12.5, so 𝑄1 = 𝑥(13) = 45.3
𝑥(25) +𝑥(26) 50.8+51.2
50×0.5 = 25, so 𝑀 = = = 51.0
2 2
50×0.75 = 37.5, so 𝑄3 = 𝑥(38) = 59.0
The largest and the smallest observations are 75.0 and 35.0.
5-mumber-summary: 35.0, 45.3, 51.0, 59.0, 75.0.
𝑥(45) +𝑥(46)
90𝑡ℎ percentile: 50 × 0.90 = 45 (integer), so 90𝑡ℎ percentile = = 63.1.
2
𝑡ℎ 𝑡ℎ
95 percentile: 50 × 0.95 = 47.5 (integer), so 95 percentile = 𝑥(48) = 70.0.
99𝑡ℎ percentile: 50 × 0.99 = 49.5 (integer), so 99𝑡ℎ percentile = 𝑥(50) = 75.0.
d. 𝐼𝑄𝑅 = 𝑄3 - 𝑄1= 59.0 – 45.3 = 13.7
Outliers are outside (𝑄1- 1.5× 𝐼𝑄𝑅, 𝑄3 + 1.5× 𝐼𝑄𝑅) = (45.3 – 20.55, 59 + 20.55) = (24.75, 79.55)
All observations are inside the interval, so no outliers.
e. Box plot: see below
f. There are 11 observations larger then 60: frequency of the event is 11 and the relative frequency is
11/50 = 22% (an estimate of the probability in the population).
g. Though the graph does not have the perfect shape of a normal distribution, the shape approximates a
symmetric distribution with one peak and mound shaped.
Quartiles and outliers
5.
a. 𝑃(𝑍 ≤ 𝑄3 ) = 0.75, if 𝑄3 ≈ 0.67 (N(0,1)-table)
Because of symmetry we have:
𝑄1 = − 𝑄3 ≈ −0.67
b. 𝐼𝑄𝑅 = 𝑄3 − 𝑄1 = 1.34, so
(𝑄1 − 1.5 × 𝐼𝑄𝑅, 𝑄3 + 1.5 × 𝐼𝑄𝑅)
= (−0.67 − 1.5 × 1.34, 0.67 +2.01)
= (-2.68, +2.68)
c. Using symmetry: 2 × 𝑃(𝑍 ≥ 𝑄3 + 1.5 × 𝐼𝑄𝑅) = 2 × 𝑃(𝑍 ≥ 2.68) = 2[1 − 0.9963] = 0.74%
d. (𝑄1 − 3 × 𝐼𝑄𝑅, 𝑄3 + 3 × 𝐼𝑄𝑅) = (−0.67 − 3 × 1.34, 0.67 + 4.02) = (- 4.69, +4.69)
The probability: 𝑃(𝑍 < 𝑄1 − 3 × 𝐼𝑄𝑅 or 𝑍 > 𝑄3 + 3 × 𝐼𝑄𝑅 ) = 2× 𝑃(𝑍 ≥ 4.69) < 2×0.0001
e. The interval is (𝜇 − 2.68𝜎, 𝜇 + 2.68𝜎) = (100 − 2.68 × 12, 100 + 2.68 × 12) = (67.84,132.16)
The probability remains the same as in c.

6. a.

Both graphs show a skewness to the right with an outlier on the right hand side (528).
b. The sample skewness coefficient 1.453 deviates much from the reference value 0 of the normal
distribution. The sample kurtosis 2.942 is, however close to the reference value 3.
All together we can conclude that a normal distribution is not the most likely distribution for these
observations.
c. The points in the normal Q-Q plot are not positioned, closely and randomly, around the line 𝑦 = 𝑥,
which is an indication of a non-normal distribution.
(In general you should base the assessment of a distribution on several techniques: in this case the
normal Q-Q plot is perhaps the least clear, but the histogram or the box plot and the classic numerical
summary are quite clear: the data cannot be assumed to be normally distributed. The graphs and the
numerical measures show a skewness to the right.

7.
a. First we will determine the 25th and the 75th percentile: Q1 = x(10) = 141 and Q3 = x(29) = 154,
since 25% (75%) of 𝑛 = 38 is 9.5 (28.5).
The Inter Quartile Range 𝐼𝑄𝑅 = 154 – 141 = 13
Possible outliers are outside (Q1 - 1.5× 𝐼𝑄𝑅, Q3 + 1.5× 𝐼𝑄𝑅) = (121.5 , 173.5):
we have one (potential) outlier: 121
b. 1. The numerical summary: sample skewness coefficient = -0.41 deviates somewhat from 0, the
reference value of the normal distribution. The kurtosis = 3.09 is close to the reference value 3.
2. The histogram reveals a distribution which follows the normal distribution roughly, though there
seem to be two peaks. But the histogram is roughly symmetric and mound shaped.
3. The normal Q-Q plot shows points around the straight line 𝑦 = 𝑥: normality seems reasonable.
The total “verdict” based on these three aspects, is that the normal distribution can be assumed a
model of these observations.

8.
a. The histogram has roughly the shape of the exponential distribution: skewed to the right and a peak near
0.
The exponential Q-Q plot (produced by SPSS) should show points close the straight line through the
Origin. We observe some deviations: in the exponential Q-Q plot the values of the ordered observations
X (i ) are first higher than expected and for the larger values are less than expected (under the line 𝑦 = 𝑥)
This pattern in the Q-Q plot is reason to doubt the correctness of en exponential model.
In the histogram the above mentioned deviation return: the left side show more observations than we
would expect fora n exponential distribution.
1 1 1
b. If we assume an exponential distribution, where (𝑋) = 𝜆 , the estimate at hand for  is 𝑥 = 7.5113 ≈
0.1331 (see also the output of the histogram).

9. Results using SPSS (data exercise 4):


Descriptives:
Descriptive Statistics

N Range Minimum Maximum Mean Std. Variance Skewness Kurtosis


Deviation

Statistic Statistic Statistic Statistic Statistic Std. Statistic Statistic Statistic Std. Statistic Std.
Error Error Error

Salary offer 50 40,0 35,0 75,0 52,334 1,3045 9,2242 85,085 ,378 ,337 -,342 ,662
Valid N 50
(listwise)

Explore:
Descriptives

Statistic Std. Error

Mean 52,334 1,3045

95% Confidence Interval for Lower Bound 49,713


Mean Upper Bound 54,955

5% Trimmed Mean 52,009

Median 51,000

Salary offer Variance 85,085

Std. Deviation 9,2242

Minimum 35,0

Maximum 75,0

Range 40,0

Interquartile Range 13,9


Skewness ,378 ,337

Kurtosis -,342 ,662

Percentiles

Percentiles

5 10 25 50 75 90 95

Weighted 39,420 40,140 45,125 51,000 59,050 63,180 71,035


Salary offer
Average(Definition 1)
Tukey's Hinges Salary offer 45,300 51,000 59,000

Extreme Values

Case Number Value

1 50 75,0

2 49 72,3

Highest 3 48 70,0

4 47 65,4

5 46 63,2
Salary offer
1 1 35,0

2 2 39,2

Lowest 3 3 39,6

4 4 39,9

5 5 40,0

(no outliers, see the Box plot as well)

You might also like