List of Correction For Applied Statistics Module
List of Correction For Applied Statistics Module
Shift 1 → 4: Sum → x , x
2
Shift 1 → 3: Sum → x2 , x
data is taken from the population. These values are called as parameters.
‒ the calculated variance is s 6.8 and the standard deviation is s 2.6077 if the data
2
However, if the two samples do not have the same units of measurement or the
variables are different, the variance and standard deviation for each sample cannot be compared
directly. As an example, suppose a car dealer wants to compare the variation between the
number of sales of car for a year and the commission (in RM) made by the salesperson. It is
very clear that these two variables have two different units. Hence, the best way to compare the
variability within these two variables is by using the coefficient of variation. It is means that if
CVar 1 CVar 2 , then the variable one is less variable than the variable two.
Solution:
The sample coefficient of variation are;
3
CVar sales 100% 3.19%
number of sales of cars: 95 .
2823
CVar commission 100% 13.65%
commision: 20675 .
Hence, since
CVar sales CVar commission , the number of sales of cars is less variable
than the commissions.
37 Example 1.8 Example 1.9:
A manufacturer measured the volume of a A manufacturer measured the volume of a
sample of 11 bottles of chemical solvents. sample of 11 bottles of chemical solvents. The
The results are recorded (in ml) as follows. results are recorded (in ml) as follows.
40 45 38 25 42 31 30 44 26 27 40 45 38 25 42 31 30 44 26 27 36
36 Show that Q1 equivalent to P25 , Q2 equivalent
The data in increasing (ascending) order:
25 26 27 30 31 36 38 40 42 44 to P50 , Q3 equivalent to P75 , and Di equivalent
45 P ,
to i (10) where i 1, 2, , 9 .
Hence the following relative position of a
data based on percentiles, deciles and
quartiles values can be calculated.
43-45 1.4.1 stem and leaf plot Remove this topic. The stem and leaf plot is
moved to page 24 Table 1.7.
50 Example 1.12: Draw Boxplots for both schools on the same x-
Draw Boxplots for both schools in the same axis.
x-axis.
Note:
If the variable used has no negative value, then the lower limit for one-sided upper bound is zero,
(0, b).
100 ADD THE FOLLOWING EXAMPLE FOR ONE SIDED CI:
EXAMPLE 2.3:
A packet of baking powder supposed to have a mean weight of 200 g. The distribution of
weight is normal and the population standard deviation is 7 g. A random sample of nine packets
of baking powder had the following weights.
218, 207, 219, 200, 205, 221, 206, 205, 211
Construct a one sided upper-bound of a 98% confidence interval for the population mean weight
of a packet of baking powder. Interpret the result.
Solution:
X : Weight of a packet of baking powder
n 9, 200 g , 7 g , x 210.2222 g , z0.02 2.0537
A one-sided upper bound of a 98% confidence interval for the population mean,
x z
n
2.0537 7
0,210.2222
9
0,210.2222 4.7920 2
0,215.0142 g
15.0142
We are 98% confident that the population mean weight of a packet of baking powder that
contain one-sided upper bound is between 0g to 215.0142g.
101 By the end of this topic, you should be able 1. Estimate the confidence interval for the
to: difference between two population means
1. Estimate the confidence interval for the of independent samples when the
difference between two population population variances are known or
means of independent samples when the unknown.
population variances are known. 2. Estimate the difference between two
2. Estimate the difference between two population means of dependent samples
population means of dependent when the population variance of the
samples. differences is known or unknown.
101 ADD THE FOLLOWING NOTES:
Estimating the difference between two population means is equally important as the estimation
of a single population mean, . This sub-topic considers estimation procedures for the mean
difference of two independent samples 1
2
. Independent samples are measurements
made on two different sets of items which the samples are taken independently (Refer Figure
2.3).
Sample A Sample B
For example, in a factory that producing a certain chemical, it is thought that a new
process for producing this chemical is cheaper than the current process used. So, we have two
independent samples which come from the new process and the current process for producing
the certain chemical in the factory. For this example, there are two populations; the first with
mean and variance, 1 and 1 , and the second with mean and variance, 2 and 2 . A random
2 2
sample of size n1 is drawn from population 1 and the second random sample of size n2 is
drawn from population 2.
The sampling distribution of the difference between two population means is given
2 2
X 1 X 2 ~ N 1 2 , 1 2 .
n1 n2
by Based on the CLT, the test statistics can be written as
( x̄ 1 − x̄ 2 ) −( μ1 −μ2 )
z test =
2 2
σ1 σ2
√ n1 n2
+
. Thus, the (
x̄ −t
D α
s
√n2
, x̄ +t
,n−1
μ1 −μ 2 is given as follows.
s
√n )
D D
Statisticians and researchers may also use an interval estimate for a proportion. A population
proportion, ( ) is a parameter that describes a percentage value or probability of success
associated with a population. One common application involves consumer preference or
opinion polls, in which we use a random sample of n people to estimate the proportion ( ) of
people in the population who have a specified characteristic. Proportion is the same as
percentage, rate, probability or fraction of the population.
If x of sampled people have this characteristic, then the sample proportion, p can be
x
p .
used to estimate the population proportion, where n This event is called Binomial
X ~ Bin n, p
event, and the random variable X is said to have a binomial distribution which
correspond to n independent trials and p probability of successes on each trial. There are
many practical examples of the binomial random variable X (see Appendix B.2). The
outcome of the trials can be classified into two mutually exclusive and exhaustive ways, say,
success or failure (eg. female or male, life or death, non defective or defective).
P X
The Binomial random variable X has a probability distribution with mean np
np 1 p
and variance . The sampling distribution of proportion p is approximated to
π ( 1−π )
normal distribution where
P~N π ,( n ) . Based on the CLT, the test statistic can be
p−π
z test =
π ( 1−π )
written as √ n . Alike with population means, we can estimate the population
proportions using the sample proportion and the interval estimate is given as follows.
p(1 p )
p z , 1
n
0.0650 1 0.0650
0.0650 1.6449 , 1
200
0.0363,1
Interpretation: We are 95% confident that the fraction of the defective integrated circuits
produced in a photolithography process that contain one-sided lower bound is between
3.63% to 100%.
112 ADD THE FOLLOWING NOTES:
Suppose we have two population proportions. The sampling distribution for the difference
1 1 2 1 2
P1 P2 ~ N 1 2 , 1 .
n1 n2
between two proportions are given by Thus,
an interval estimate of the difference between two population proportions is given as follows.
2
n 1 s 2 2
n 1 s 2
2 , 2 ,
, ( a, ) , ( a, )
2 2
2 2
where a 0 where a, b 0
126 ME.1 Assume that a small simple random ME.1 Assume that a small simple random
sample is selected form a normally sample is selected from a normally
distributed population for which distributed population for which
population standard deviation is population standard deviation is
unknown. Since the sample size is unknown. Since the sample size is
small, the t-distribution should be small, the t-distribution should be used
used to the construct a confidence to construct a confidence interval for
interval for mean. But, does the the mean. But, does the confidence
confidence interval limit affected if interval limit affected if the normal
the normal distribution is incorrectly distribution is incorrectly used instead?
used instead?
128 P P ~ N 0.7, 0.000035 P ~ N 0.7, 0.00035
2.1.3 (2) 1 2 2.1.3 (2)
2.4.1 (1)
2.6461, 3.3574 2.6424, 3.3574
grams per litre 2.4.1 (1) hours
2.7 (1)
0.1820, 0.4420 0.4266, 0.6648
2.7 (1)
129 REMOVE Q1 and Q2
135 REMOVE Q1
139 b. Find a 94% confidence interval for the difference in the mean plant growths with
normal air atmospheric conditions and those with enriched CO2 concentrations. (Note:
Assuming equal population variances)
139 ADD THE FOLLOWING QUESTION:
SEMESTER II SESSION 2015/2016
19. A random sample of 100 bottles of a particular brand of cough syrup is selected and the
alcohol content of each bottle is determined. Suppose that the 95% confidence interval for
the population mean of the alcohol content of all bottles is between 7.8 mg and 9.4 mg.
a. What is the population under study?
b. What is the parameter?
c. Calculate the estimation error of the given confidence interval.
d. Use your answer in (c) to calculate the unbiased estimate of the population mean of the
alcohol content from the random sample of cough syrup.
e. “We are 95% confident that the interval 7.8, 9.4 mg includes the true mean of
alcohol content.” Is this statement correct? Justify your answer.
f. The pharmaceutical company that produced the cough syrup claims that the alcohol
content for a bottle of cough syrup is equal to 8.3 mg. Based on the given confidence
interval, can we accept this hypothesis?
g. Would a 90% confidence interval calculated from the same sample be narrower or
wider than the given interval? Give a reason.
142 19. a. All bottles of a particular brand of syrup
b. The population mean of alcohol content
c. E 0.8
d. x 8.6 mg
e. Yes. A confidence interval gives an estimated range of values which is likely to include an
unknown parameter with a specified probability within that interval. The confidence level
0.95 is the probability value associated with a 95% confidence interval. The probability that
the interval
7.8, 9.4
mg includes is 0.95. Hence, there is a 95% chance that the mean
alcohol content for the population of all bottles is between 7.8 and 9.4 mg.
P 7.8 9.4 0.95
f. Yes. Accept H 0 : 8.3.
g. Narrower.
4. Is there any different between the mean 5. Determine whether the mean of breaking
yields? Assume the variances population strength of composite A is similar to composite
are equal. B at 0.01 level of significance. Assume both
population variances are different.
5. Determine whether the mean of breaking
strength for composite A is similar to
composite B at 0.01 level of
significance. Assume both population
variances are different.
170 EXERCISE 3.5 2. A particular consumer association wants to
2. A particular consumer association determine whether there is a difference
wants to determine whether there is a between the population proportions of the
difference between the population two leading car manufacturers that need
proportions of the two leading car major repairs within two years of their
manufacturers that need major purchase. A sample of 400 two-year
repairs within two years of their owners of car Model 1 is contacted, and a
purchase. A sample of 400 two-year sample of 500 two-year owner of car Model
owners of car Model 1 is contacted, 2 is contacted. The number of owners for
and a sample of 500 two-year owner Model 1 and Model 2 who report that their
of Model 2 is contacted. The number cars needed major repairs within the first
of owners for Model 1 and Model 2 two years are 53 and 78, respectively.
who report that their cars needed a) Construct a 98% confidence interval for
major repairs within the first two the difference in the two population
years are 53 and 78, respectively. proportions of cars that needed major
a) Construct a 98% confidence interval repairs.
on the difference in the two b) Determine whether the population
population proportions of cars that proportion of cars for the Model 1 is less
needed major repairs. than 0.25 at 10% significance level.
b) Determine whether the population c) Test the consumer association’s
proportion of cars for Model 1 is less hypothesis at 1% significance level.
than 0.25 at 10% significance level.
c) Test the consumer association’s
hypothesis at 1% significance level.
172 EXAMPLE 3.6: EXAMPLE 3.7:
Listed are waiting times (in mins) of Listed are waiting times (in mins) of customers
customers at a bank. in a bank.
……. …….
Step 1: X : The waiting times of customers Step 1: X : The waiting times of customers in a
at a bank bank
…… …….
Step 3: Given 0.01 and the test is left- Step 3: Given 0.01 and the test is left-
tailed test, hence the critical value tailed test, hence the critical value is
2
χ 0 .01 ,5 =0 .5543 . 0.99,5
2
0.5543
is .
Step 4: Since
test
2
1.1520 0.99,5
2
0.5443 Step 4: Since
ttest
8.5 7.8 0 0.3610
1 1
3.2024
the test statistic is 6 5 .
Step 3: Given 0.05 and the test is right-tailed test, hence the critical value is
t0.05,6 5 2 t0.05,9 1.8331
.
179 Topic 3.9: Remove obj 1 then rename the section as HYPOTHESIS TESTING USING P-
VALUE APPROACH
CHAPTER 4: ANALYSIS OF VARIANCE
PAGE MISTAKES CORRECTION
223 ADD THE FOLLOWING NOTE UNDER TABLE 4.2.
Note:
1. The treatments are also known as the levels of the factor that affect the dependent variable.
2. The total number of treatment equal to the total number of levels of the factor.
3. The number of population is equal to the number of treatments, k.
225 For H0 add the following statement:
No differences between the population means
232 ADD THE FOLLOWING NOTE UNDER HYPOTHESIS FOR MARGINAL EFFECT:
Marginal effect
(If there is no interaction effect between factor A and factor B)
234 Replace Yes and No in Figure 4.2 to:
Yes (Reject H0AB)
No (Do not Reject H0AB)
235 EXAMPLE 4.2: EXAMPLE 4.2:
A chemical engineer studies the effects of A chemical engineer studies the effects of
various reagents and catalysts on the yield of various reagents and catalysts on the yield of a
a chemical process. The yield is expressed chemical process. The yield is expressed as a
as a percentage of a theoretical maximum. percentage of a theoretical maximum. Two runs
Two runs (replications) of the process are (replications) of the process are done for each
made for each combination of three reagents combination of three reagents and four catalysts.
and four catalysts. The data is given as The data is given as follows.
follows.
Shift 1 → 3: Sum →
x 2 , x, y 2 , y, xy
Shift 1 → 4: Var → n, x , y
STEP 3: Clear data → Shift 9
297 EXERCISE 5.5 (2) c) write the predicted regression model and
c) write the predicted regression model. interpret the regression coefficients.
301 Example 5.9 Example 5.8
309- ANSWER FOR EXERCISES
310
2
5.1 (e) r 0.5912 , 59.12% of the variation in thefinal exam marks can be explained by the
carry mark.
(f) Some students have good carry marks but did not perform in final exam. For example,
students no.1 and 9.
5.3(1)
a) ..
b) REMOVE
c) move to b and replace the answer (please double check)
(i) t-test=12.3765
(ii) f-test =153.1769
5.5 (2)
Wrong numbering and real answer for (a) is missing
a) r 0.7831 , there is a strong correlation between final exam marks, carry marks and hours
they spent to study.
b) replace with old (a)
c) replace with old (b)
d) replace with old (c) and (d)
5.5(3)
e) Coefficients table: The significant parameters are for income and temperature variables with
(Pvalue = 0.0225) < (α = 0.05) and (P-value = 0.0000) < (α = 0.05), respectively. The price
parameter is insignificant (P-value = 0.6261) > (α = 0.05).
In testing the GoF test for fitting of the distribution (hypothesised distribution), the step by
step using Microsoft Excel 2010 is similar as testing the Gof for free distribution. The expected
frequency should be calculated by using the probability value. While the probability value is
calculated depends on the probability mass function for Poisson distribution or Binomial
distribution and probability density function for a normal distribution.
In this module you are advised to use the following procedure to calculate the
probability value for a normal distribution.
EXAMPLE 6.10:
By considering Example 6.4, the GoF test can be solved using Microsoft Excel as follows.
Step 5: Since (P-value = 0.2839) > (α = 0.025), do not reject H 0 .
Step 6: At α = 0.025, there is enough evidence to conclude that the sugar concentration in apple
juice is normally distributed.
367 Answer to exercises
6.1.2 (2): test 50.5590 , reject H 0
2
APPENDIX B
PAGE MISTAKES CORRECTION
398 Para 3: probability density function (pdf)
probability denstiy function (pdf)
408 Standard Normal Distribution: Standard Normal Distribution:
A standard normal distribution is a normal A standard normal distribution is a normal
distribution with mean zero
0
distribution with mean zero
0
and and
variance is one
0 .
2
variance is one
1 .
2