0% found this document useful (0 votes)
149 views29 pages

Solution Chapter 2 Mandenhall

The document contains descriptive statistics and analysis of several datasets: - Jobs data was evenly split between close match, not a match, and unemployed. - Root causes of incidents were evenly split between engineering, procedures, and management. - Most PWB manufacturers were from Japan, USA, Taiwan, and Europe. - 90% of software code was defect-free. - Most aquifers were bedrock and wells were mostly below detectable MTBE limits. - Public wells outnumbered private wells. - Voltage readings decreased with a new process. - Surface roughness data were evenly spread out. - Oxon/thion ratios were higher for clear air days than foggy days.

Uploaded by

Uzair Khan
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
149 views29 pages

Solution Chapter 2 Mandenhall

The document contains descriptive statistics and analysis of several datasets: - Jobs data was evenly split between close match, not a match, and unemployed. - Root causes of incidents were evenly split between engineering, procedures, and management. - Most PWB manufacturers were from Japan, USA, Taiwan, and Europe. - 90% of software code was defect-free. - Most aquifers were bedrock and wells were mostly below detectable MTBE limits. - Public wells outnumbered private wells. - Voltage readings decreased with a new process. - Surface roughness data were evenly spread out. - Oxon/thion ratios were higher for clear air days than foggy days.

Uploaded by

Uzair Khan
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

CHAPTER 2 ..

Descriptive Statistics
2.2 The data were entered into Excel and the following pie chart was created:

Jobs Related to Studies

Unemployed

Close Match

Not Engineering Not a Match

The responses were pretty evenly divided between close match, not a match and unemployed.

Chapter 2

2.4

The Pareto diagram is shown below:

The root causes of the 83 incidences are pretty evenly split between the causes Engineering & Design, Procedures & Practices, and Management & Oversight.

Descriptive Statistics

2.6

The countries for all the European PWB manufacturers were combined and the following bar graph was produced:

PWB Manufacturers by Country of Origin


30 25 20 15 10 5 0 Japan USA Taiwan Europe Hong Kong S Korea Thailand

While not a leading location of PWB manufacturers, we dont see a major cause of concern about the viability of the PWB industry in Europe. 2.8 A pie chart for the data appears below:
Analysis of Softw are Code

"True" 10%

"False" 90%

It appears that a very high percentage of the software code is defect free.

Chapter 2

2.10

A bar graph was used to describe the aquifer variable below:

Aquifer
250 200 150 100 50 0 Bedrock Unconsolidated

It appears that most of the aquifer types are bedrock A pie chart was used to describe the detectable levels of MTBE information below:

Detectable MTBE
Detect 31%

Below Limit 69%

It appears that twice as many wells are below that detectable level than are above the detectable level.

Descriptive Statistics

A bar graph was used to describe the well class variable below:

Well Class
125 120 115 110 105 100 95 90 Private Public

There are more public than private wells in the data set. 2.12 a. We will allow each number to represent the stems and mark the leaves with an asterisk (*). The stem-and-leaf display is: Stems 1 2 3 4 5 6 b. 2.14 a. Leaves ********** ************* *********** **** **

10 of the 40 or 10/40 = .25 of the asteroid observations resulted in exactly 1 spectral image exposure. To construct a frequency histogram, first calculate the range by subtracting the smallest endpoint of the histogram from the largest. To include the largest and smallest values, we will start the histogram at 2.12 and end at 10.7625. Range = 10.7625 7.95 = 2.8125. Next, find the class width for 9 classes by dividing the range by the number of classes: Class width = 2.8125/9 = .3125

Chapter 2

The first class will begin at 7.95, below the smallest voltage reading. The classes are shown below: Class Class Interval 1 2 3 4 5 6 7 7.9500 8.2625 8.2625 8.5750 8.5750 8.8875 8.8875 9.2000 9.2000 9.5125 9.5125 9.8250 9.8250 10.1375
||||

Data Tabulation

Frequency 1 0

Relative Frequency .03 0 .10 0 0 .17

|||

3 0 0 5

|||| |||| |||| |||| |

15 5 1 Totals n = 30

.50 .17 .03 1.00

8 9

10.1375 10.4500 10.4500 10.7625

To obtain the class frequency, count the number of observations that fall within each class interval. The class relative frequency is the class frequency divided by the total number of observations (30).

b.
Stems 8. 9. 10. Leaves 05 72 72 80 98 97 87 80 87 55 95 70 84 80 73 98 84 26 05 29 03 55 26 12 05 15 00 15 02 01

The histogram in part a more effectively describes how the data falls.

Descriptive Statistics

c.

Using the same classes as in part a, our histogram now becomes:

d. 2.16

The new process appears to be worse than the old process. More voltage readings are less than 9.2 volts with the new process than with the old process.

Statistix was used to generate the following stem-and-leaf plot for the surface roughness data:
Stem and Leaf Plot of ROUGH Leaf Digit Unit = 0.1 1 0 represents 1.0 Stem 3 1 5 1 7 1 8 1 9 1 (5) 2 6 2 4 2 1 2 Leaves 001 22 45 7 9 00111 23 455 6 Minimum Median Maximum 1.0600 2.0400 2.6400

The data appear to be pretty evenly spread out across the entire range of the data. 2.18 a. The stem will be the first decimal value (tenths) and the leaves will be the second and third decimal values (hundredths and thousandths). The stem-and-leaf display is:
Stems .1 .2 .3 .4 .5 .6 Leaves 12 70 41 05 70 25 39 30 75

23 91 18

b.

The three oxon/thion ratios for the clear air days have been marked in boxes above. The ratios for the fog air days have been circled above. The clear air days do seem to produce oxion/thion ratios that are larger than the fog air days. Keep in mind, however, that such a statement cannot be made with any measure of reliability (yet).

10

Chapter 2

2.20

A stem-and-leaf display for the ratios is shown below:


Stem and Leaf Plot of RATIO Leaf Digit Unit = 0.1 2 2 represents 2.2 1 5 (9) 12 6 2 1 Stem 2 2 3 3 4 4 5 Leaves 2 5779 012223333 678889 0000 5 0 Minimum Median Maximum 2.2500 3.3750 5.0600

Only two of the 26 till ratios exceed the value of 4.5, so we would estimate this proportion to be 2/26 = 0.0769. 2.22 a. The sample mean radioactivity level is:

y=

y = 43.75 = 4.861
n 9

The median is the middle observation once they have been ordered. The 5th observation is 4.85. Thus the median is 4.85. The mode is 5.00. b. The average radioactivity level is 4.861. Half of the radioactivity levels are less than 4.85 and half are greater. The modal observation occurred 2 times. The mean of the data is y =

2.24

a. b.

y = 11.77 = 1.4713.
n 8

The median is the average of the middle two numbers once the data are arranged in order. The data arranged in order are: 1.37, 1.41, 1.42, 1.48, 1.50, 1.51, 1.53, 1.55 The middle two numbers are 1.48 and 1.50. The median is
1.48 + 1.50 = 1.49 2

c. 2.26

Since the mean is less than the median, the data are somewhat skewed to the left.

The following measures of central tendency were calculated for the 174 ship sanitation scores:
Descriptive Statistics Variable Rating N 174 Mean 94.420 Median 95.000 Mode 97.000

Descriptive Statistics

11

The average sanitation score of the 174 ships sampled was 94.420. Half of the 174 ships sampled had a sanitation score that was below 95 and half had a sanitation score that was above 95. The most frequently occurring sanitation score among the 174 ships sampled was the score 97. 2.28 a. The mean number of ant species discovered is:
y=

y = 3 + 3 + + 4 = 141 = 12.82
n 11 11

The median is the middle number once the data have been arranged in order: 3, 3, 4, 4, 4, 5, 5, 5, 7, 49, 52. The median is 5. The mode is the value with the highest frequency. Since both 4 and 5 occur 3 times, both 4 and 5 are modes. b. For this case, we would recommend that the median is a better measure of central tendency than the mean. There are 2 very large numbers compared to the rest. The mean is greatly affected by these 2 numbers, while the median is not. The mean total plant cover percentage for the Dry Steppe region is: y=

c.

y = 40 + 52 + + 27 = 202 = 40.4
n 5 5

The median is the middle number once the data have been arranged in order: 27, 40, 40, 43, 52. The median is 40. The mode is the value with the highest frequency. Since 40 occurs 2 times, 40 is the mode. d. The mean total plant cover percentage for the Gobi Desert region is:
y=

y = 30 + 16 + + 14 = 168 = 28
n 6 6

The median is the mean of the middle 2 numbers once the data have been arranged in order: 14, 16, 22, 30, 30, 56. The median is
22 + 30 52 = = 26 . 2 2

12

Chapter 2

The mode is the value with the highest frequency. Since 30 occurs 2 times, 30 is the mode. e. Yes, the total plant cover percentage distributions appear to be different for the 2 regions. The percentage of plant coverage in the Dry Steppe region is much greater than that in the Gobi Desert region. The mean number of power plants is: y= y 5 + 3 + + 3 80 = = =4 n 20 20

2.30

a.

The median is the mean of the middle 2 numbers once the data have been arranged in order: 1, 1, 1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 7, 9, 13 The median is 3+ 4 7 = = 3.5 . 2 2

The mode is the value with the highest frequency. Since 1 occurs 5 times, 1 is the mode. b. Deleting the largest number, 13, the new mean is: y=

y = 5 + 3 + + 3 = 67 = 3.526
n 19 19

The median is the middle number once the data have been arranged in order: 1, 1, 1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 7, 9 The median is 3. The mode is the value with the highest frequency. Since 1 occurs 5 times, 1 is the mode. By dropping the largest measurement from the data set, the mean drops from 4 to 3.526. The median drops from 3.5 to 3. There is no effect on the mode. c. Deleting the lowest 2 and highest 2 measurements leaves the following: 1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 7 The new mean is: y=

y = 5 + 3 + + 3 = 56 = 3.5
n 16 16

The trimmed mean has the advantage that any possible outliers have been eliminated.

Descriptive Statistics

13

2.32

a.

Range = 13 1 = 12

s2 =

( y)
n 1 n

498

802 20 = 178 = 9.3684 20 1 19

s = s 2 = 9.3584 = 3.061 b. Dropping the largest measurement: Range = 9 1 = 8

s2 =

( y)
n 1 n

329

67 2 19 = 92.7368421 = 5.1520 19 1 18

s = s 2 = 5.1520 = 2.270
By dropping the largest observation from the data set, the range decreased from 12 to 8, the variance decreased from 9.3684 to 5.1520 and the standard deviation decreased from 3.061 to 2.270. c. Dropping the largest and smallest measurements: Range = 9 1 = 8

s2 =

y2

( y)
n 1 n

328

662 18 = 86 = 5.0588 18 1 17

s = s 2 = 5.0588 = 2.249 By dropping the largest and smallest observations from the data set, the range decreased from 12 to 8, the variance decreased from 9.3684 to 5.0588 and the standard deviation decreased from 3.061 to 2.249. 2.34 Comparing the means of the two distributions, we see that the center of the true distribution will be located to the right of the false distribution when viewed on a number line. Comparing the standard deviations of the two distributions, we see that the spread of the true distribution will be much greater than the spread of the false distribution. Graphically, this would be represented by a true distribution that looked shorter and more spread out than the false distribution. 2.36 In mound-shaped symmetric distributions, the Empirical Rule tells us to expect approximately 95% of the data to fall within two standard deviations of the mean.

14

Chapter 2

y 2 s 19.5 2(4.7) (10.1, 28.9)

We expect approximately 95% of the SNR values in the population to fall between 10.1 and 28.9. We would not expect to see an SNR value outside of this interval. Therefore, a value of 30 would not be expected. 2.38 a. Since no information is given about the distribution of the velocities of the Winchester bullets, we can only use Chebyshev's Rule to describe the data. We know that at least 3/4 of the velocities will fall within the interval:
y 2s 936 2(10) 936 20 (916, 956)

Also, at least 8/9 of the velocities will fall within the interval:
y 3s 936 3(10) 936 30 (906, 966)

b. 2.40 a. b.

Since a velocity of 1,000 is much larger than the largest value in the second interval in part a, it is very unlikely that the bullet was manufactured by Winchester. From the printout, y = 2.425 and s = 1.259
y s 2.425 1.259 (1.166, 3.684) y 2s 2.425 2(1.259) 2.425 2.518 (0.093, 4.943) y 3s 2.425 3(1.259) 2.425 3.777 (1.352, 6.202)

c.

24 observations fall in the interval y s or 24/40 = .60. The Empirical Rule says there should be approximately .68 of the measurements within 1 standard deviation of the mean. This is fairly close. 38 observations fall in the interval y 2s or 38/40 = .95. The Empirical Rule says there should be approximately .95 of the measurements within 2 standard deviations of the mean. This agrees with the Empirical Rule. 40 observations fall in the interval y 3s or 40/40 = 1.00. The Empirical Rule says approximately all of the measurements should fall within 3 standard deviations of the mean. Again, this agrees with the Empirical Rule.

2.42

a.

The lower quartile is the observation with rank equal to:


= 1/4(n + 1) = 1/4(40 + 1) = 10.25 = 10
QL = y(10) = 1

Descriptive Statistics

15

The upper quartile is the observation with rank equal to:


u = 3/4(n + 1) = 3/4(40 + 1) = 30.75 u = 31 QU = y(31) = 3

25% of the observations fall at or below 1 and 75% of the observations fall at or below 3. b. The z-score for an observation is z =
y 97 = = 2.425 40 n yy s

y=

s2 =

( y) y
n 1 n

297

(97) 2 40 = 1.584 40 1

s = s 2 = 1.584 = 1.259
The z-score for 6 is z = 6 2.425 = 2.84 1.259

This observation falls 2.84 standard deviations above the mean of 2.425. 2.44 a. Using SAS, the output is:
EGG LENGTH The UNIVARIATE Procedure Variable: length Moments N Mean Std Deviation Skewness Uncorrected SS Coeff Variation 130 60.6538462 43.9861168 2.1091362 727842 72.5199136 Sum Weights Sum Observations Variance Kurtosis Corrected SS Std Error Mean 130 7885 1934.77847 4.70026314 249586.423 3.85783765 1

Basic Statistical Measures Location Mean Median Mode 60.65385 49.50000 35.00000 Variability Std Deviation Variance Range Interquartile Range 43.98612 1935 220.00000 32.00000

16

Chapter 2

Tests for Location: Mu0=0 Test Student's t Sign Signed Rank -Statistict M S 15.72224 65 4257.5 -----p Value-----Pr > |t| Pr >= |M| Pr >= |S| <.0001 <.0001 <.0001

Quantiles (Definition 5) Quantile 100% Max 99% 95% 90% 75% Q3 50% Median 25% Q1 10% 5% 1% 0% Min EGG LENGTH The UNIVARIATE Procedure Variable: length Extreme Observations ----Lowest---Value 16.0 16.0 17.0 18.0 18.5 Obs 102 90 101 103 100 ----Highest--Value 195 205 216 218 236 Obs 128 123 122 129 130 Estimate 236.0 218.0 160.0 122.0 67.0 49.5 35.0 23.0 19.0 16.0 16.0 2

Missing Values Missing Value . -----Percent Of----Missing All Obs Obs 1.52 100.00

Count 2

The 10th percentile egg length is 23. b. From the printout, y = 60.65 and s = 43.99. The z-score corresponding to the Moas, P. australis bird species egg length is:
z= y y 205 60.65 = = 3.28 43.99 s

Descriptive Statistics

17

The z-score for Moas, P. australis is 3.28 standard deviations above the mean. This is a fairly large value for a z-score. This indicates that the egg length for Moas, P. australis could be very unusual. 2.46 Since the 90th percentile of the study sample in the subdivision was .00372 mg/L, which is less than the USEPA level of .015 mg/L, the water customers in the subdivision are not at risk of drinking water with unhealthy lead levels. We will use SAS to compute the percentiles. The output is:
The UNIVARIATE Procedure Variable: LEFTEYE Moments N Mean Std Deviation Skewness Uncorrected SS Coeff Variation 25 -0.1544 0.19676721 -4.5220839 1.5252 -127.4399 Sum Weights Sum Observations Variance Kurtosis Corrected SS Std Error Mean 25 -3.86 0.03871733 21.7019614 0.929216 0.03935344

2.48

Basic Statistical Measures Location Mean Median Mode -0.15440 -0.11000 -0.17000 Variability Std Deviation Variance Range Interquartile Range 0.19677 0.03872 1.03000 0.08000

NOTE: The mode displayed is the smallest of 4 modes with a count of 3. Tests for Location: Mu0=0 Test Student's t Sign Signed Rank -Statistict M S -3.92342 -12.5 -162.5 -----p Value-----Pr > |t| Pr >= |M| Pr >= |S| 0.0006 <.0001 <.0001

Quantiles (Definition 5) Quantile 100% Max 99% 95% 90% 75% Q3 50% Median 25% Q1 10% 5% 1% 0% Min Estimate -0.04 -0.04 -0.06 -0.06 -0.08 -0.11 -0.16 -0.20 -0.21 -1.07 -1.07

18

Chapter 2

Extreme Observations ----Lowest---Value -1.07 -0.21 -0.20 -0.17 -0.17 Obs 7 18 11 22 17 ----Highest--Value -0.07 -0.06 -0.06 -0.06 -0.04 Obs 23 2 12 21 16

a. b.

From the printout above, the 10th percentile is 0.20. Thus, 10% of the cylinder power measurements are below 0.20 and 90% are above 0.20. From the printout above, the 95th percentile is 0.06. Thus, 95% of the cylinder power measurements are below 0.06 and 5% are above 0.06.
y y 1.07 (.1544) = = 4.65 . A power measurement of 1.07 is 4.65 standard .19677 s deviations below the mean. Since this z-score is so small, it is an extremely unlikely value to observe. The cylinder value of 1.07 is an extreme value. z=

c.

2.50

a.

From the data, median = 2.000, QL = 1.250, and QU = 3.000. The interquartile range is IQR = QU QL = 3.000 1.250 = 1.750. The inner and outer fences are located a distance of 1.5(IQR) = 1.5(1.75) = 2.625 and 3(IQR) = 3(1.75) = 5.25 below QL and above QU , respectively. Values between the inner and outer fences are suspect outliers and are designated by *. Highly suspect outliers lie outside the outer fences and are designated by o. The closest points to the inner fences which are still inside the inner fences are marked by x and whiskers are drawn between these points and the box. The box plot is shown below:

b.

From Exercise 2.40, y = 2.425 and s = 1.259. The z-score corresponding to the suspect outliers is:
z=

6 2.425 = 2.84 1.259

Since this value is fairly close to 3, the two observations with the value 6 are suspect outliers.
Descriptive Statistics 19

2.52

a.

Using MINITAB, the box plot of the data is:


Boxplot of Rating
100

95

90 Rating

85

80

75

The descriptive statistics are:


Descriptive Statistics: Rating Variable Rating Variable Rating N 174 Minimum 74.000 Mean 94.420 Maximum 100.000 Median 95.000 Q1 92.000 TrMean 94.814 Q3 98.000 StDev 4.380 SE Mean 0.332

The median is 95, the upper quartile is QU = 98 and the lower quartile is QL = 92. The interquartile range is IQR = QU QL = 98 92 = 6. The lower inner fence is QL 1.5(IQR) = 92 1.5(6) = 92 9 = 83. The upper inner fence is QU + 1.5(IQR) = 98 + 1.5(6) = 98 + 9 = 107. The lower outer fence is QL 3(IQR) = 92 3(6) = 92 18 = 74. The upper outer fence is QU + 3(IQR) = 98 + 3(6) = 98 + 18 = 116. There are 5 observations below the lower inner fence. These are suspect outliers. The observations have values 74, 79, 79, 81, and 81. There are no observations below the lower outer fence. b. The z-scores corresponding to these points are: 74: z = 81: z = c. y y 74 94.42 = = 4.66 s 4.38 y y 81 94.42 = = 3.06 s 4.38 79: z = y y 79 94.42 = = 3.52 s 4.38

According to the box plot, there are 5 suspect outliers. According to the z-score method, all of the suspect outliers have z-scores more than 3 standard deviations from the mean. They would all be considered outliers.

20

Chapter 2

2.54

Statistix was used to get the following information:


Breakdown for TRANERR INTRINSICS No PROJECTIONS No 1 0.0000034 M 5 1.6200 0.7918 6 1.3500 0.9690 Yes 5 25.160 6.8002 0 M M 5 25.160 6.8002 Total 6 20.967 11.937 5 1.6200 0.7918 11 12.173 13.175

N Mean SD N Mean SD N Mean SD

Yes

Total

Cases Included 11

Missing Cases 0

a.

For trials with perturbed intrinsics but no perturbed projections, we find: y = 1.62 and s = 0.7918. For trials with perturbed projections but no perturbed intrinsics, we find: y = 25.16 and s = 6.8002. To determine if the value is an outlier, we calculate the z-score for the observation. For trials with perturbed intrinsics but no perturbed projections, we find: z= y y 4.5 1.62 = = 3.637 s 0.7918

b.

c.

For trials with perturbed projections but no perturbed intrinsics, we find: z= y y 4.5 25.16 = = 3.038 s 6.8002

This observation is considered an outlier in both these situations. We would not expect it to come from either of these camera perturbations. If we had to choose, we would select the trials with perturbed projections but no perturbed intrinsics since the z-score for that camera perturbation is the smallest (in absolute value). 2.56 a. The variable reason is qualitative because the responses are categories. There are 7 values for the variable Infant, Child, Medical, Infant & Medical, Child & Medical, Infant & Child, and Infant, Child & Medical. To compute the relative frequencies, divide the frequencies (or number of requests) by the total sample size of 30,337. The relative frequency for Infant is 1,852 / 30,337 = .061. The rest of the relative frequencies are computed in the table.

b.

Descriptive Statistics

21

Reason Infant

Number of Requests 1,852

Computation 1,852 / 30,337

Relative Frequency .061

Child Medical Infant & Medical Child & Medical Infant & Child Infant & Child & Medical Total

17,148 8,377 44 903 1,878 135 30,337

17,148 / 30,337 8,377 / 30,337 44 / 30,337 903 / 30,337 1,878 / 30,337 135 / 30,337

.565 .276 .001 .030 .062 .004 .999

c.

A bar chart for the data is:


C har t of Fr equency vs Reason
18000 16000 Frequency 14000 12000 10000 8000 6000 4000 2000 0
nt fa In ild Ch a ic ed M l nt fa In & M ic ed al il d Ch & M al ic ed il d Ch i ld ch & ic ed M al

nt fa In

&

fa In

nt

&

Reason

d. 2.58 a.

The proportion of car owners who requested on-off bag switches who gave Medical as one of the reasons is (8,377 + 44 + 903 + 135) / 30,337 = 9,459 / 30,337 = .312. Mean = 2.1197: the average aftershock of these 2929 earthquake aftershocks was 2.1197 measured on the Richter scale. Median = 2.000: half of the 2929 earthquake aftershocks registered above 2.000 on the Richter scale and half registered below 2.000 on the Richter scale. Q1 = 1.700: 25% of the 2929 earthquake aftershocks registered below 1.700 on the Richter scale and 75% registered above 1.700 on the Richter scale. Q3 = 2.400: 75% of the 2929 earthquake aftershocks registered below 2.400 on the Richter scale and 25% registered above 2.400 on the Richter scale.

22

Chapter 2

s = 0.6636: Assuming a mound-shaped symmetric distribution, the Empirical Rule tells us that approximately 95% of the aftershocks register between y 2s 2.1197 2(0.6636) (0.7925, 3.4469) on the Richter scale. b. Statistix constructed the following box plot for the aftershock magnitude data:

We can see that there are number suspects (shown as the *s) and highly suspect outliers (shown as os) that are identified in this box plot. 2.60 A pie chart of the data is:
P ie C har t of C ount vs Dr ive Star
2 4.1% 3 17.3% Category 2 3 4 5

5 18.4%

4 60.2%

More than half of the cars received 4 star ratings (60.2%). A little less than a quarter of the cars tested received ratings of 3 stars or less.

Descriptive Statistics

23

2.62

Excel was used to generate the following summary statistics:


RATIO Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum Sum Count 3.50731 0.12444 3.375 4.09 0.63454 0.40264 0.2953 0.27643 2.81 2.25 5.06 91.19 26

a.

Mean = 3.50731: the average of the 26 till ratios is 3.50731. Median = 3.375: half of the 26 till ratios fall above 3.375, half fall below 3.375. Mode = 4.09: the most frequently occurring till ratio is the value 4.09. Range = 2.81: the difference between the largest and the smallest till ratio is 2.81. Variance = 0.40264: the variance has no useful interpretation. It is the calculation that allows us to determine the value of the standard deviation. Standard Deviation = 0.63454: Assuming a mound-shaped symmetric distribution, the Empirical Rule tells us that approximately 95% of the till ratios fall between y 2s 3.50731 2(0.63454) (2.23823, 4.77639).

b.

24

Chapter 2

c.

Statistix constructed the following box plot for the till ratio data:

We see one suspect outlier (5.06) identified in the box plot. 2.64 a. To convert a frequency histogram to a relative frequency histogram, we must first divide each of the frequencies by the sum of all the frequencies, which is 1 + 1 + 3 + 4 + 10 + 7 + 5 + 4 + 3 + 2 + 3 + 2 + 3 + 2 = 50. The relative frequency table is:
Class Interval Frequency Relative Frequency 5.515.5 1 1/50 = .02 15.525.5 1 1/50 = .02 25.535.5 3 3/50 = .06 35.545.5 4 4/50 = .08 45.555.5 10 10/50 = .20 55.565.5 7 7/50 = .14 65.575.5 5 5/50 = .10 75.585.5 4 4/50 = .08 85.595.5 3 3/50 = .06 95.5105.5 2 2/50 = .04

145.5155.5 165.5175.5 175.5185.5 195.5205.5

3 2 3 2

3/50 = .06 2/50 = .04 3/50 = .06 2/50 = .04

Descriptive Statistics

25

The relative frequency histogram is:

b.

It would be very unusual to observe a drill chip with a length of at least 190 mm. There are only 2 out of 50 drill chips that are 190 mm or longer. The proportion of drill chips with lengths of at least 190 mm is .04. To construct a stem and leaf display for the PCB levels of rural soil samples, we will use the digits to the left of the decimal point as the stems and the digits to the right of the decimal point as the leaves.
Stems 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Leaves 0568 Frequency 4 0 1 0 1 Relative Frequency .286 .000 .071 .000 .071

2.66

a.

5 3 12 078 0 0 0

2 3 1 1 1 Totals n = 14

.143 .214 .071 .071 .071 .998

!
23

26

Chapter 2

b.

For the urban samples, the stem-and-leaf display is:


Relative Frequency .133 .067 .067

Stems 11 12 13 14 15 16 17 18 19 20 21 22 23 24 ! 29 ! 49 ! 94 ! 107 ! 141

Leaves 00 0 0

Frequency 2 1 1

0 00 0 0 0 0 0 0 0 0

1 0 2 1 1 0 1 1 1 1 1 1 Totals n = 15

.067 .000 .133 .067 .067 .000 .067 .067 .067 .067 .067 .067 1.003

Descriptive Statistics

27

c.
Stems 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 ! 29 ! 49 ! 94 ! 107 ! 141 Leaves 0568 Frequency 4 0 1 0 1

5 3 12 078 00 00 0 0 0 00 0 0 0 0 0 0 0 0 0

Relative Frequency .133 .000 .034 .000 .034

2 3 2 1 1 1 0 2 1 1 0 1 1 1 1 1 1 Totals n = 29

.069 .103 .069 .069 .034 .000 .034 .034 .000 .069 .034 .034 .034 .034

.034 .034 .034 .034 .993

The graph supports the researchers claim that a significant difference exists between the PCB levels for urban and rural areas. The rural PCB levels are almost all less than the lowest reading for the urban areas. 2.68 a. mean = y = y 5891 = = 117.82 n 50 117 + 118 median = = 117.5 , the average of the middle two observations, once 2 the observations have been arranged in order. modes = 97, 112, 124, 128, and 131, all of which appear 3 times.

28

Chapter 2

b.

range = 150 88 = 62

variance = s 2 =

( y)
n 1 n

705119

(5891) 2 50 = 11041.38 49 49

= 225.3343 standard deviation = s = 225.3343 = 15.01


c. y s 117.82 (15.01) (102.81, 132.83) The number of observations that appear in this interval is 31. This is close to the 68% or .68(50) 34 given by the Empirical Rule. y 2s 117.82 2(15.01) 117.82 30.02 (87.80, 147.84) The number of observations that appear in this interval is 49. This is close to the 95% or .95(50) 48 given by the Empirical Rule. y 3s 117.82 3(15.01) 117.82 45.03 (72.79, 162.85) The number of observations that appear in this interval is 50. This agrees with 100% or 1.00(50) = 50 given by the Empirical Rule. d. To construct a box plot, we first must make some preliminary calculations. First, we compute QL ,and QU . The lower quartile, QL , is the data point with the rank of (n + 1)/4 = (50 + 1)/4 = 12.75 13. The 13th ranked data point is 109. The upper quartile, QU , is the data point with the rank of 3(n + 1)/4 = 3(50 + 1)/4 = 38.25 38. The 38th ranked data point is 131. The interquartile range, IQR, is QU QL = 131 109 = 22. The inner fences are located 1.5(IQR) = 1.5(22) = 33 below QL and above QU . The inner fences are 109 33 = 76 and 131 + 33 = 164. The outer fences are located 3(IQR) = 3(22) = 66 below QL and above QU . The outer fences are 109 66 = 43 and 131 + 66 = 197. The box plot is shown below.

There are no suspect outliers in the data set because no data points lie outside the inner fence.

Descriptive Statistics

29

e.

The 70th percentile is the data point that has a rank of .70(n) = .70(50) = 35. The 35th data point is 128. Seventy percent of the data points have a value less than or equal to 128. Using Minitab, a stem-and-leaf display of the data is:
Stem-and-leaf of VELOCITY Leaf Unit = 100 1 3 12 18 20 21 21 23 (5) 23 10 1 1 1 18 18 19 19 20 20 21 21 22 22 23 23 24 24 4 79 001112444 566788 12 7 99 11344 5666777777889 001222344 9 N = 51

2.70

a.

b. c.

From this stem-and-leaf display, it is fairly obvious that there are two different distributions since there are two groups of data. Since there appears to be two distributions, we will compute two sets of numerical descriptive measures. We will call the group with the smaller velocities A1775A and the group with the larger velocities A1775B. For A1775A: y =

y = 408,707
n
2

s2 = s=

n = n 1 283,329.3 = 532.29

( x)

21

= 19,462.2 7,960,019,531 21 1 408,707 2 21 = 283,329.3

For A1775B: y =

y = 685,154
n
2

s2 = s=

= n 1 314,694.83 = 560.98

( y)
n

30

= 22,838.5 15,656,992,942 30 1 685,1542 30 = 314,694.83

30

Chapter 2

d.

To determine which of the two clusters this observation probably belongs to, we will compute z-scores for this observation for each of the two clusters. For A1775A: 20,000 19,462.2 = 1.01 z= 532.29 Since this z-score is so small, it would not be unlikely that this observation came from this cluster. For A1775B: z= 20,000 22,838.5 = 5.06 560.98

Since this z-score is so large (in magnitude), it would be very unlikely that this observation came from this cluster. Thus, this observation probably came from the cluster A1775A. 2.72 a. To sketch the distributions, we need to calculate the intervals y s, y 2s,and y 3s for each of the four scenarios.
ys y 2s Sex Lifts/Minute M 1 21.6938.81 13.1347.37 4 17.1330.53 10.4337.23 y 3s 4.4755.93 3.7343.93

1 4

16.6822.90 13.5726.01 10.4629.12 12.5919.05 9.3622.28 6.1325.51

The distributions are sketched below:

Descriptive Statistics

31

b.

The y 2s intervals are calculated in part a. We expect approximately 95% of the measurements to fall in these calculated intervals. The average male could be expected to safely lift the 25 kilogram box at a rate of 4 lifts per minute based on the y 2s interval calculated (10.4337.23). The average female could not be expected to safely lift the 25 kilogram box at a rate of 4 lifts per minute based on the y 2s interval calculated (9.3622.28).

c.

32

Chapter 2

You might also like