0% found this document useful (0 votes)
207 views26 pages

List of Correction For Applied Statistics Module

This document provides a list of corrections for chapters in an applied statistics module. The corrections include fixing mistakes, adding examples, and clarifying explanations. Key corrections involve adding definitions for coefficient of variation and examples calculating it for different data sets to allow easier comparison of variability between variables with different units.

Uploaded by

Thurgah Vshiny
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
207 views26 pages

List of Correction For Applied Statistics Module

This document provides a list of corrections for chapters in an applied statistics module. The corrections include fixing mistakes, adding examples, and clarifying explanations. Key corrections involve adding definitions for coefficient of variation and examples calculating it for different data sets to allow easier comparison of variability between variables with different units.

Uploaded by

Thurgah Vshiny
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 26

LIST OF CORRECTION FOR APPLIED STATISTICS MODULE

NEW LIST OF CONTENT


Chapter 1: Introduction To Statistics
1.1 Statistical Terminologies
1.1.1 What is Statistics?
1.1.2 Why We Need Statistics?
1.1.3 Population and Sample
1.1.4 Descriptive and Inferential Statistics
1.1.5 Role of Computer in Statistics
1.2 Statistical Problem-Solving Methodology
1.2.1 Identifying the Problem or Opportunity
1.2.2 Deciding on the Method of Data Collection
1.2.3 Collecting the Data (Sampling Techniques)
1.2.4 Classifying and Summarising the Data
1.2.5 Presenting and Analysing the Data
1.2.6 Making the Decision and Conclusion
1.3 Reviews on Descriptive Statistics
1.3.1 Measures of Central Tendency
1.3.2 Measures of Variation (Dispersion)
1.3.3 Measures of Position
1.3.4 Descriptive Statistics using Microsoft Excel
1.4 Exploratory Data Analysis
1.4.1 Outliers
1.4.2 Boxplot
1.5 Normal Probability Plot

Chapter 2: Confidence Interval


2.1 Estimate, Estimation, Estimator
2.1.1 Properties of Good Point Estimator
2.1.2 Confidence Interval
2.2 Confidence Interval for Population Mean
2.3 Confidence Interval for Difference Between Two Population Means
2.3.1 Independent Samples
2.3.2 Dependent Samples
2.4 Confidence Interval for Population Proportion
2.5 Confidence Interval for Difference Between Two Population Proportions
2.6 Confidence Interval for Population Variance and Population Standard Deviation
2.7 Confidence Interval for Ratio of Two Population Variances

Chapter 3: Hypothesis Testing


3.1 Introduction to Hypothesis Testing
3.1.1 Terms and Definitions
3.1.2 Procedure in Hypothesis Testing
3.2 Hypothesis Testing for a Population Mean
3.3 Hypothesis Testing for the Difference Between Two Population Means
3.3.1 Independent Samples
3.3.2 Dependent Samples
3.4 Hypothesis Testing for Population Proportion
3.5 Hypothesis Testing for Difference Between Two Population Proportions
3.6 Hypothesis Testing for Population Variance
3.7 Hypothesis Testing for the Ratio of Two Population Variances
3.8 Hypothesis Testing Using Confidence Interval Approach
3.9 Hypothesis Testing Using P-Value Approach
3.10 Type I and Type II Errors
Chapter 4: Analysis of Variance (ANOVA)
4.1 Introduction to ANOVA
4.2 One-Way ANOVA
4.3 Two-Way ANOVA
4.4 ANOVA using Microsoft Excel

Chapter 5: Linear Regression and Correlation


5.1 Correlation
5.1.1 Scatter Diagram
5.1.2 Correlation Coefficient
5.1.3 Coefficient of Determination
5.2 Simple Linear Regression Model
5.2.1 Least-squares Estimation of the Model Parameters
5.3 Linearity Test
5.3.1 Hypothesis testing for slope using t-test
5.3.2 Hypothesis testing for slope using ANOVA
5.4 Regression Analysis Using Microsoft Excel
5.5 Multiple Linear Regression Analysis
5.5.1 Multiple Linear Regression Model
5.5.2 Multiple Linear Regression Analysis Using Microsoft Excel
5.6 Model Selection

Chapter 6: Goodness of Fit Test and Contingency Tables


6.1 Goodness of Fit Test
6.1.1 Goodness of Fit Test for Categorical Data
6.1.2 Fitting of the Distribution
6.2 Contingency Table
6.2.1 Testing for Independence Between Two Variables
6.2.2 Test of Homogeneity Proportions
6.3 Goodness of Fit Test & Contingency Table Using Microsoft Excel
CHAPTER 1: INTRODUCTION TO STATISTICS
PAGE MISTAKES CORRECTION
5 Exercise 1.1.3 (d) What is the variable of the study?
What is the variable of the study? IQ pre-
test score
17 Table 1.4 Save time
Cluster sampling advantage: safe time
18 Exercise 1.2.3 (c) Researchers or farm managers may be called in
A researcher is called to judge a particular when a crop shows a certain growing pattern or
shade of colour to be typical for a sample at when surface differences are observed for a soil.
certain sites. Then from these sites samples For example, differences may occur in soil color
are drawn. which may be the result of many factors. A
researcher is called to judge a particular shade of
colour to be typical for a sample at certain sites.
Then from these sites samples are drawn.
20 Table 1.5 Grade (A, B, C, D, E)
Grade (A, B, C, D, etc.) Judging (first place, second place, ...)
Judging (first place, second place, etc.)
21 Exercise 1.2.4.1 (1) The SuperMotor Marketing Corporation has
The SuperMotor Marketing Corporation has asked you for information about the car you
asked you for information about the car drive. For each question, identify each of the
you drive. For each question, identify each types of data requested as either attribute data or
of the types of data requested as either numeric data. When attribute data is requested,
attribute data or numeric data. When identify the variable either as nominal or ordinal.
attribute data is requested, identify the When numeric data is requested, identify the
variable either as nominal or ordinal. When variable either as discrete or continuous. Then,
numeric data is requested, identify the identify the level of measurement for each
variable either as discrete or continuous. variable.

22 Table 1.6 Quantitative (ratio scale): Histogram, Bar Chart


Graphical summary (bar representing means), stem and leaf plot,
Boxplot
Skewed Distribution: Histogram, Stem and leaf
plot, Boxplot
28 Casio fx-570ES Casio fx-570ES

Shift 1 → 4: Sum →  x , x
2
Shift 1 → 3: Sum →  x2 , x

Shift 1 → 5: Var → n, x ,  x, sx Shift 1 → 4: Var → n, x ,  x, sx


32-34 Add coefficient of variations

CVar   100%, for population

s
CVar   100%, for sample
x
‒ Coefficient of variation is the standard deviation divided by the mean.
‒ The result is expressed as percentage.
‒ A parameter/statistic that allows user to compare the standard deviations when the units
are different (the variables are different).
EXAMPLE 1.1:
Suppose the data set is 1, 6, 3, 7, 8, 5, then
‒ the calculated range is, R  8  1  7 .
‒ the calculated variance is   5.6667 and the standard deviation is   2.3805 if the
2

data is taken from the population. These values are called as parameters.
‒ the calculated variance is s  6.8 and the standard deviation is s  2.6077 if the data
2

is taken from the sample. These values are called as statistics.


‒ the calculated sample mean is, x  5 . Hence the sample coefficient of variation is
2.6077
CVar   100%  52.15%
5 .

However, if the two samples do not have the same units of measurement or the
variables are different, the variance and standard deviation for each sample cannot be compared
directly. As an example, suppose a car dealer wants to compare the variation between the
number of sales of car for a year and the commission (in RM) made by the salesperson. It is
very clear that these two variables have two different units. Hence, the best way to compare the
variability within these two variables is by using the coefficient of variation. It is means that if
 CVar  1   CVar  2 , then the variable one is less variable than the variable two.

EXAMPLE 1.2: (new)


A car dealer wants to compare the variation between the number of sales of car for a year and
the commission (in RM) made by the salesperson in Kuala Lumpur. A total of 27 salespersons
were selected randomly. The mean number of sales of cars is 95 and the standard deviation is 3.
The mean of the commission is RM 20675 and the standard deviation is RM 2823. Compare the
variation of these two variables.

Solution:
The sample coefficient of variation are;
3
 CVar  sales   100%  3.19%
number of sales of cars: 95 .
2823
 CVar  commission   100%  13.65%
commision: 20675 .

Hence, since
 CVar  sales   CVar  commission , the number of sales of cars is less variable
than the commissions.
37 Example 1.8 Example 1.9:
A manufacturer measured the volume of a A manufacturer measured the volume of a
sample of 11 bottles of chemical solvents. sample of 11 bottles of chemical solvents. The
The results are recorded (in ml) as follows. results are recorded (in ml) as follows.
40 45 38 25 42 31 30 44 26 27 40 45 38 25 42 31 30 44 26 27 36
36 Show that Q1 equivalent to P25 , Q2 equivalent
The data in increasing (ascending) order:
25 26 27 30 31 36 38 40 42 44 to P50 , Q3 equivalent to P75 , and Di equivalent
45 P ,
to i (10) where i  1, 2,  , 9 .
Hence the following relative position of a
data based on percentiles, deciles and
quartiles values can be calculated.
43-45 1.4.1 stem and leaf plot Remove this topic. The stem and leaf plot is
moved to page 24 Table 1.7.
50 Example 1.12: Draw Boxplots for both schools on the same x-
Draw Boxplots for both schools in the same axis.
x-axis.

51 (i) Shape: (i) Shape:


Based on the location of median, School Based on the location of median, School A
A has right-skewed distribution where has right-skewed distribution where most of
most of teachers’ age is concentrated at teachers’ age is concentrated at the lower age
the lower age (< 30 years old).  (< 30 years old).  However, School B has left-
However, School B has almost skewed distribution where most of teachers’
symmetric distribution where the age is greater than 42 years old.   
teachers’ age is evenly split at the
median of 42 years old.    (ii) Average:
(ii) Average: Based on the median value, 50% of teacher at
Based on the median value, 50% of School A age less than 30.5 years old,
teacher at School A age less than 30.5 whereas 50% of teacher at School B age less
years old, whereas 50% of teacher at than 42 years. On average, teachers at School
School B age less than 42 years. B is older than the teachers at School A
52 Exercise 1.4.3, Q3 part c Compare the boxplots in terms of shape, average
Compare the boxplots in terms of shape and and variability.
variability.
58 ME.8 In an analysis of the accuracy of ME.8 In an analysis of the accuracy of weather
weather forecasts, the actual high forecasts, the actual high temperature are
temperature are compared to the high compared to the high temperatures predicted one
temperatures predicted one day earlier and day earlier and the temperatures predicted five
the temperatures predicted five days earlier. days earlier. Listed below are the errors between
Listed below are the errors between the the predicted temperatures and the actual high
predicted temperatures and the actual high temperatures for 14 consecutive days in Kuala
temperatures for seven consecutive days in Lumpur.
Kuala Lumpur.
63 Answer to exercise 1.2.4.1 (1) numeric and continuous: a, d, g, h, i, l
numeric and continuous: a, d, g, h, i, l numeric and discrete: c, f, k
numeric and discrete: c, f, k attribute and nominal: b, e, j
attribute and nominal: b, e, j nominal level: b, e, j
interval level: h
ratio level: a, c, d, f, g, i, k, l
63 Answer to exercise 1.3.1 (2a) Mean = 151.3462, Median = 147.5, mode = 108,
Mean = 151.3462, Median = 148, midrange = 151.5
mode = 108,
63 Answer to exercise 1.3.2 (3)
  CVar  age  
 12.90%   CVar  income  17.63% 
63 Answer to exercise 1.4.1 remove

64 Answer to exercise 1.4.3 (2) The distribution of Company A is left-skewed


The distribution of Company A is left- meanwhile for Company B is right-skewed.
skewed meanwhile for Company B is right- Based on IQR, Company A is more variable than
skewed. Based on IQR, Company A is Company B. Springs supplied by Company B is
more variable than Company B. more flexible.

64 Answer to exercise ME.8 summary  Median s


Data 1 1.5 1.0 2.4152
Data 2 3.8333 4.5 2.4014

65 Tutorial 1 Q4 Remove all. Replace with:


4. Identify each study as being either
observational or experimental.
a. A researcher wants to conduct a study
on the busiest airports in the world.
b. One study used 40 university students
as subjects. Twenty students were given
smokeless tobacco to chew and another
twenty students were given a substance
that looked and tasted like smokeless
tobacco. The students’s blood pressure
and heart rate were measured before
and after 20 minutes they had been
chewing.
c. A study is conducted to examine the
behaviour of children in preschool.
d. A researcher finds that vitamin E is a
proven antioxidant and may help in
fighting cancer and heart disease.
69 Tutorial 1 Q20 (a) Find the sample mean, variance and standard
deviation for both machines by giving the correct
notation and appropriate rounding rule.
71 Tutorial 1 Q26 (c) Remove
72 Tutorial 1 Q28 (b&c) Remove c and d. Replace with:
Based on information obtained in (b), construct a
boxplot of these data. Determine the distribution
of these data.
75 Tutorial 1 Q35 (a (iv)) remove

76 ADD 2 question related to coefficient of variation:


1. The average number of newspapers for sale in a convenient shop is 15 and the standard
deviation is 3. The average age of the salesman is 34 years with a standard deviation of 6
years. Which data set is more variable?
  CVar  newspaper  
 20%   CVar  age  17.65% 
2. A survey showed that the number of brands of toothpaste sold in a grocery tores was 11
with a standard deviation of 2. The same survey showed the average length of time each
store was in business was 8 years with a standard deviation of 1.9 years. Which is less
variable, the number of brands or the number of years?
  CVar  brand  
 18.18%   CVar  years  23.75% 
77 Answer to tutorial 1 (Q4) observational : a, c  experimental : b, d.
77 Answer to tutorial 1 (Q9) nominal : f, m, p, q
nominal : f, m, p ordinal : b, i, o
ordinal : b, i, o interval : c, l
interval : c, l, q ratio : a, d, e, g, h, j, k, n
ratio : a, d, e, g, h, j, k, n
CHAPTER 2: CONFIDENCE INTERVAL
PAGE MISTAKES CORRECTION
83  Identify the sampling distributions for  REMOVE OBJ 1.
sample mean, sample proportion, sum  Find the confidence interval for the
and difference between two sample difference between two population means
means and two sample proportions. of dependent samples when population
 Find the confidence interval for the variance of the differences is known or
difference between two population unknown.
means of dependent samples
84 - 91 REMOVE ALL CONTENT TO APPENDIX C
84 By the end of this topic, you should be able By the end of this topic, you should be able to:
to: 1. Identify the sampling distributions of sample
1. Identify the sampling distributions for mean, sample proportion, and difference
sample mean, sample proportion, sum and between two sample means and two sample
difference between two sample means and proportions.
two sample proportions.
86 X   X ,  X2  X  N   X ,  X2 
89 EXERCISE 2.1.2 1. An elevator has a maximum occupancy of 16
1. An elevator has a maximum occupant people and a weight limit of 1150 kg. The
of 16 people and a weight limit of 1150 average weight of the population is 62.5 kg,
kg. The average weight of the with a standard deviation of 11.2 kg, and the
population is 62.5 kg, with a standard distribution of weight in the population is
deviation of 11.2 kg, and the approximately normal. Suppose that a
distribution of weight in the population random sample of size 10 people is chosen.
is approximately normal. Suppose that Find a sampling distribution of average
a random sample of size 10 people is weight of the occupant who occupies the
chosen. Find a sampling distribution elevator.
for average weight of occupant who
occupies the elevator. 2. The mean height of Species 1 is 30 cm while
the mean height of Species 2 is 26 cm. The
2. The mean height of Species 1 is 30 cm variances of the two species are 50 and 65,
while the mean height of Species 2 is 26 respectively, and the heights of both species
cm. The variances of the two species are normally distributed. You randomly
are 50 and 65, respectively, and the sample 18 members of Species 1 and 15
heights of both species are normally members of Species 2.  Find a sampling
distributed. You randomly sample 18 distribution for the difference of mean height
members of Species 1 and 15 members between the two species.
of Species 2.  Find a sampling
distribution for the sum and difference
of height between the two species.
91 EXERCISE 2.1.3 2. Previous study claims that 70% of parents in
2. In a survey, 600 mothers and fathers were Malaysia agreed that the genders are equal
asked about the importance of sports for and should have equal opportunities to
boys and girls. Of the parents interviewed, participate in sports. In a survey, 600 parents
70% agreed that the genders are equal and were asked about the importance of sports
should have equal opportunities to for boys and girls. Describe the sampling
participate in sports. Describe the distribution of the sample proportion of
sampling distribution of the sample parents who agree that the genders are equal
proportion of parents who agree that the and should have equal opportunities.
genders are equal and should have equal
opportunities.
3. Assume that 0.7 of high school graduates 3. Assume that 0.7 of high school graduates,
but only 0.35 of high school drop outs are but only 0.35 of high school dropouts are
able to pass a basic literacy test. If 25 able to pass a basic literacy test. If 25
students are sampled from students are sampled from the population of
the population of high school graduates high school graduates and 30 students are
and 30 students are sampled from the sampled from the population of high school
population of high school drop outs, find drop outs, find the sampling distribution that
the sampling distribution that the the proportion of dropouts that pass will be
proportion of drop outs that pass will be as high as the proportion of graduates? 
as high as the proportion of graduates? 
92 (ii) Interval Estimation is a process (ii) Interval Estimation is a process that
that produces range of values produces a range of values calculated
calculated from the sample data, from the sample data, forming an
forming an interval within which the interval within which the parameter is
parameter is estimated to lies, estimated to lie, a    b where
a    b where a, b   ,   . In a , b   ,  
. In Section 2.2.3, this
Section 2.2.3, this concept will be concept will be used in constructing
used in constructing confidence confidence intervals.
interval.
95 σ2 the sampling distribution of means is
(
X̄ ~ N μ ,
n) σ
2
(ii) .
(
X̄ ~ N μ ,
. n )
97 The value of estimation error is affected by The value of estimation error is affected by the
the standard deviation   or sample size standard deviation   sample size   or
  , n
 n  . Hence, a large sample size will result level of significance    . Hence, a large sample
in smaller errors and a subsequently smaller size will result in smaller errors and a
confidence interval. As an example, a 90% subsequently smaller confidence interval.
confidence level provides a smaller interval
μ However, if the variance
  2  is increased, the
estimate for as compared with a 99%
confidence limit. However, if the variance error will also increase and result in wider the
confidence interval. Meanwhile, if  is
 
2
is increased, the error will also increased, the error will decrease and result in a
increase and result in wider the confidence narrower confidence interval. As an example, a
interval. 90% confidence level provides a smaller
 interval estimate for μ as compared with a
E  Z , 99% confidence limit.
Consider 2 n

E  Z ,
 if n  E  (the width of the 2 n
Consider
confidence interval is small)
 if n  E  (the width of the
 if   E  (the width of the
confidence interval is small)
confidence interval is large)
 if   E  (the width of the
confidence interval is large)
 if   E  (the width of the
confidence interval is small)

99 ADD THE FOLLOWING EXAMPLE AFTER EXAMPLE 2.4(1)
Based on Example 2.1, investigate the effect of sample size, standard deviation and confidence
level values on the confidence limit if we
a. Change the sample size to 100.
b. Change the population standard deviation to 1 kg.
c. Change the confidence level to 95%.

99 Example 2.4 (2) A random sample of 25 bulbs was selected and


A random sample of 25 bulbs was selected the specified brightness were evaluated for each
and the specified brightness was evaluated bulb by measuring the amount of electrical
for each bulb by measuring the amount of current required. The sample mean of the
electric current required. The sample mean electrical current required for the bulbs is 280.3
of the current required is 280.3 micro amps micro amps and the sample standard deviation is
and the sample standard deviation is 10.3 10.3 micro amps. Construct a 90% confidence
micro amps. Construct a 90% confidence interval for the population mean and interpret
interval for the population mean and your answers.
interpret your answers. Solution:
X : The electrical current required for bulbs
Interpretation:
We are 90% confident that the population or
true mean of the electrical current required for
bulb lies within 276.7755 and 283.8245 micro
amps.
100 one-sided lower bound   (, a) one-sided lower bound   (a, )
one-sided upper bound   (a, ) one-sided upper bound   ( , b)
where a   where a, b  
100 ADD THE FOLLOWING NOTE:

A one-sided confidence bound for  results from replacing 2 by  and  sign by either +
or – sign in the confidence interval formula for  . In all cases, the confidence level is
approximately ( 1−α ) 100 % as follows.
….

Note:
If the variable used has no negative value, then the lower limit for one-sided upper bound is zero,
  (0, b).
100 ADD THE FOLLOWING EXAMPLE FOR ONE SIDED CI:
EXAMPLE 2.3:

A packet of baking powder supposed to have a mean weight of 200 g. The distribution of
weight is normal and the population standard deviation is 7 g. A random sample of nine packets
of baking powder had the following weights.
218, 207, 219, 200, 205, 221, 206, 205, 211
Construct a one sided upper-bound of a 98% confidence interval for the population mean weight
of a packet of baking powder. Interpret the result.

Solution:
X : Weight of a packet of baking powder
n  9,   200 g ,   7 g , x  210.2222 g , z0.02  2.0537
A one-sided upper bound of a 98% confidence interval for the population mean, 
  
  x  z 
 n
 2.0537  7  
  0,210.2222  
 9 
  0,210.2222  4.7920  2

  0,215.0142  g
15.0142

We are 98% confident that the population mean weight of a packet of baking powder that
contain one-sided upper bound is between 0g to 215.0142g.

101 By the end of this topic, you should be able 1. Estimate the confidence interval for the
to: difference between two population means
1. Estimate the confidence interval for the of independent samples when the
difference between two population population variances are known or
means of independent samples when the unknown.
population variances are known. 2. Estimate the difference between two
2. Estimate the difference between two population means of dependent samples
population means of dependent when the population variance of the
samples. differences is known or unknown.
101 ADD THE FOLLOWING NOTES:
Estimating the difference between two population means is equally important as the estimation
of a single population mean,  . This sub-topic considers estimation procedures for the mean
difference of two independent samples  1
  2 
. Independent samples are measurements
made on two different sets of items which the samples are taken independently (Refer Figure
2.3).

Sample A Sample B

Figure 2.1: Independent samples

For example, in a factory that producing a certain chemical, it is thought that a new
process for producing this chemical is cheaper than the current process used. So, we have two
independent samples which come from the new process and the current process for producing
the certain chemical in the factory. For this example, there are two populations; the first with
mean and variance, 1 and  1 , and the second with mean and variance, 2 and  2 . A random
2 2

sample of size n1 is drawn from population 1 and the second random sample of size n2 is
drawn from population 2.
The sampling distribution of the difference between two population means is given
 2 2 
X 1  X 2 ~ N  1  2 , 1  2  .
 n1 n2 
by Based on the CLT, the test statistics can be written as
( x̄ 1 − x̄ 2 ) −( μ1 −μ2 )
z test =
2 2
σ1 σ2
√ n1 n2
+
. Thus, the (
x̄ −t
D α
s
√n2
, x̄ +t
,n−1

μ1 −μ 2 is given as follows.
s
√n )
D D

confidence interval for difference


D α
2
, n−1

between two population means,


For n1 ≥30 , n2 ≥30
102  1 1 
  x1  x2   z s p  
 1 1   2 n`1 n2 
  x1  x2   z s p  
 2 n`1 n1 
103 1. Two chemical companies, Company 1. Two chemical companies, Company A and
A and Company B supply a raw material Company B supply a raw material where the
where the most important element in this most important element in this material is the
material is the concentration. The concentration. The standard deviations of
standard deviations of concentration concentration produced by both companies
produced by both companies are known are known as 5.81 and 4.70, respectively.
as 5.81 and 4.70, respectively. The mean The mean of concentration in a random
of concentration in a random sample of sample of 10 batches produced by Company
10 batches produced by Company A is A is 90.25 grams per litre, while for
90.25 grams per litre, while for Company Company B, the mean of concentration in a
B, the mean of concentration in a random random sample 15 batches is 87.54 grams
sample 15 batches yields is 87.54 grams per litre. Consider the random variable
per litre. Consider the random variable concentration is normally distributed,
concentration is normally distributed, construct a 96% confidence interval for the
construct a 96% confidence interval for difference of the population means of the
the difference mean of the concentration concentration produced by both companies.
produced by both companies. Give an Give an interpretation for the parameter
interpretation for the parameter estimate. estimate.
105 one-sided lower bound one-sided lower bound
 12  22  12  22
1  2   x1  x2   z  1  2   x1  x2   z 
n1 n2 n1 n2
1   2  (, a ) 1   2  ( a, )
one-sided upper bound one-sided upper bound
 12  22  12  22
1   2   x1  x2   z  1   2   x1  x2   z 
n1 n2 n1 n2
1  2  (a, ) 1  2  ( , b)
where a   where a, b  
105 EXERCISE 2.4.1 1. Find the 95% confidence interval for the
difference of population means of children’s
1. Find the 95% confidence interval for the sleep time and adult’s sleep time if given
difference population mean of that the variances for the children’s sleep
children’s sleep time and adult’s sleep time is 0.81 while for the adults is 0.25.
time if given that the variances for the The mean sample sleep time for 30 children
children’s sleep time is 0.81 hours while is 10 hours while for 40 adults are 7 hours.
for the adults is 0.25 hours. The mean Assume that the children’s and adult’s sleep
sample sleep time for 30 children is 10 time are normally distributed. Give a
hours while for 40 adults are 7 hours comment on the parameter estimate.
and give the comment on the parameter
estimate. Assume that the children’s and
adult’s sleep time are normally
distributed.
106 2  D2
 D : a population standard deviation of the
differences
107 A tyre manufacturer wishes to compare the A tyre manufacturer wishes to compare the
made of thread wear of tyres between the made of tread wear of tyres between the new
new and conventional materials. One tyre of and conventional materials. One tyre of each
each type is placed on each front wheel of type is placed on each front wheel of each of
each of tenth front wheel drive vehicles. The tenth front wheel drive vehicles. The choice as
choice as to which type of tyre goes on the to which type of tyre goes on the right wheel
right wheel and which goes on the left is and which goes on the left is made with the flip
made with the flip of a coin. Each car is of a coin. Each car is driven for 40 000 km,
driven for 40 000 km, then the tyres are then the tyres are removed, and the depth of the
removed, and the depth of the thread (in cm) tread (in cm) on each is measured. The results
on each is measured. The results were as were as follows.
follows.

108 one-sided lower bound one-sided lower bound


 
 D  x D  z D  D  xD  z D
n ,  D  (, a) n ,  D  (a, )
one-sided upper bound one-sided upper bound
 
 D  xD  z D  D  xD  z D
n ,  D  (a, ) n ,  D  ( , b)
where a   where a, b  
108 2. A sample of 38 diesel lorries were run 2. A sample of 38 diesel lorries was run for
for both hot and cold engines. A study both hot and cold engines. A study is
is conducted to estimate the difference conducted to estimate the difference in fuel
in fuel economy. The mean difference economy. The mean difference of fuel
for fuel mileage between hot and cold mileage between hot and cold engines is
engines is 0.250 litre per km and the 0.250 litre per km and the sample variance
sample variance of the differences is of the differences is 0.013 litre per km.
0.013 litre per km. Assume that the fuel Assume that the fuel mileage is normally
mileage is normally distributed, find the distributed, find the 98% confidence
98% confidence interval for the interval for the population mean difference
population mean difference between of fuel mileage between hot and cold
fuel mileage between hot and cold engines. Give a comment on the parameter
engines. Give comment on the estimates.
parameter estimates.
109 ADD THE FOLLOWING NOTES:

Statisticians and researchers may also use an interval estimate for a proportion. A population
proportion, (  ) is a parameter that describes a percentage value or probability of success
associated with a population. One common application involves consumer preference or
opinion polls, in which we use a random sample of n people to estimate the proportion (  ) of
people in the population who have a specified characteristic. Proportion is the same as
percentage, rate, probability or fraction of the population.

If x of sampled people have this characteristic, then the sample proportion, p can be
x
p .
used to estimate the population proportion,  where n This event is called Binomial
X ~ Bin  n, p 
event, and the random variable X is said to have a binomial distribution which
correspond to n independent trials and p probability of successes on each trial. There are
many practical examples of the binomial random variable X (see Appendix B.2). The
outcome of the trials can be classified into two mutually exclusive and exhaustive ways, say,
success or failure (eg. female or male, life or death, non defective or defective).
P X 
The Binomial random variable X has a probability distribution with mean np
np  1  p 
and variance . The sampling distribution of proportion p is approximated to
π ( 1−π )
normal distribution where
P~N π ,( n ) . Based on the CLT, the test statistic can be
p−π
z test =
π ( 1−π )
written as √ n . Alike with population means, we can estimate the population
proportions using the sample proportion and the interval estimate is given as follows.

110 EXAMPLE 2.4: EXAMPLE 2.5:


The fraction of defective integrated circuits The fraction of defective integrated circuits
produced in a photolithography process is produced in a photolithography process is being
being studied. A random sample of 200 studied. A random sample of 200 circuits is
circuits is tested, revealing 13 defects. tested, revealing 13 defects.
a) Calculate a 95% confidence interval on a) Calculate a 95% confidence interval on the
the fraction of defective circuits fraction of defective circuits produced by
produced by this process and give a this process and give a comment on the
comment on the resulted confidence resulted confidence interval.
interval. b) How large the sample would be if we wish
b) How large the sample would be if we to be at least 95% confident that the error in
wish to be at least 95% confident that estimating the fraction of the defective
the error in estimating p is less than integrated circuits produced in a
0.02? photolithography process is less than 0.02?
Solution: Solution:
X : The defective integrated circuits X : The number of defective integrated circuits
produced in a photolithography process produced in a photolithography process
111 one-sided lower bound one-sided lower bound
p (1  p ) p (1  p )
  p  z   p  z
   a,1
n ,   [0, a] n ,
one-sided upper bound one-sided upper bound
p (1  p ) p (1  p )
  p  z   p  z
   0,b 
n ,   [a,1] n ,
where a  [0,1] where a, b  [0,1]
111 ADD THE FOLLOWING EXAMPLE FOR ONE SIDED CI:
c) Construct a 95% lower bound confidence interval on the fraction of defective circuits
produced by this process. Give an interpretation of the parameter estimate.
Solution:
z  z0.05  1.6449
A 95% lower bound confidence interval for population proportion, 

 p(1  p ) 
  p  z , 1
 n 
 0.0650  1  0.0650  
   0.0650    1.6449  , 1
 200
 
  0.0363,1
Interpretation: We are 95% confident that the fraction of the defective integrated circuits
produced in a photolithography process that contain one-sided lower bound is between
3.63% to 100%.
112 ADD THE FOLLOWING NOTES:
Suppose we have two population proportions. The sampling distribution for the difference
   1  1   2  1   2  
P1  P2 ~ N   1   2 , 1  .
 n1 n2 
between two proportions are given by Thus,
an interval estimate of the difference between two population proportions is given as follows.

113 one-sided lower bound one-sided lower bound


p1  1  p1  p2  1  p2  p1  1  p1  p2  1  p2 
 1   2   p1  p2   z   1   2   p1  p2   z 
n1 n2 n1 n2
 1   2  [0, a]  1   2   a,1
one-sided upper bound one-sided upper bound
p1  1  p1  p2  1  p2  p1  1  p1  p2  1  p2 
 1   2   p1  p2   z   1   2   p1  p2   z 
n1 n2 n1 n2
 1   2  [a,1] where a  [0,1]  1   2   0,b 
where a, b  [0,1]
116 one-sided lower bound one-sided lower bound

2 
 n  1 s 2 2 
 n  1 s 2
2 ,  2 , 
,   ( a,  ) ,   ( a,  )
2 2
2 2

one-sided upper bound one-sided upper bound


 n  1 s 2  n  1 s 2
  2
2
  2
2

1 ,   2  (0, a ) 1 ,   2   0,b 


2 , where a  0 2 , where a , b  0
119 one-sided lower bound one-sided lower bound
 12 s12  12  12 s12  12
 f 1 ,  2 ,  1  (a,  )  f 1 ,  2 ,  1  (a,  )
 22 s22 ,  2
2
 22 s22 ,  2
2

one-sided upper bound one-sided upper bound


 12 s12  12  12 s12  12
 f  , v2 , v1  (0, a )  f  , v2 , v1  (0, b)
 22 s22 ,  2
2
,  22 s22 ,  2
2

where a  0 where a, b  0
126 ME.1 Assume that a small simple random ME.1 Assume that a small simple random
sample is selected form a normally sample is selected from a normally
distributed population for which distributed population for which
population standard deviation is population standard deviation is
unknown. Since the sample size is unknown. Since the sample size is
small, the t-distribution should be small, the t-distribution should be used
used to the construct a confidence to construct a confidence interval for
interval for mean. But, does the the mean. But, does the confidence
confidence interval limit affected if interval limit affected if the normal
the normal distribution is incorrectly distribution is incorrectly used instead?
used instead?
128 P  P ~ N  0.7, 0.000035 P ~ N  0.7, 0.00035
2.1.3 (2) 1 2 2.1.3 (2)
2.4.1 (1) 
2.6461, 3.3574   2.6424, 3.3574 
grams per litre 2.4.1 (1) hours
2.7 (1) 
0.1820, 0.4420   0.4266, 0.6648 
2.7 (1)
129 REMOVE Q1 and Q2
135 REMOVE Q1
139 b. Find a 94% confidence interval for the difference in the mean plant growths with
normal air atmospheric conditions and those with enriched CO2 concentrations. (Note:
Assuming equal population variances)
139 ADD THE FOLLOWING QUESTION:
SEMESTER II SESSION 2015/2016
19. A random sample of 100 bottles of a particular brand of cough syrup is selected and the
alcohol content of each bottle is determined. Suppose that the 95% confidence interval for
the population mean of the alcohol content of all bottles is between 7.8 mg and 9.4 mg.
a. What is the population under study?
b. What is the parameter?
c. Calculate the estimation error of the given confidence interval.
d. Use your answer in (c) to calculate the unbiased estimate of the population mean of the
alcohol content from the random sample of cough syrup.
e. “We are 95% confident that the interval  7.8, 9.4  mg includes the true mean of
alcohol content.” Is this statement correct? Justify your answer.
f. The pharmaceutical company that produced the cough syrup claims that the alcohol
content for a bottle of cough syrup is equal to 8.3 mg. Based on the given confidence
interval, can we accept this hypothesis?
g. Would a 90% confidence interval calculated from the same sample be narrower or
wider than the given interval? Give a reason.
142 19. a. All bottles of a particular brand of syrup
b. The population mean of alcohol content
c. E  0.8
d. x  8.6 mg
e. Yes. A confidence interval gives an estimated range of values which is likely to include an
unknown parameter with a specified probability within that interval. The confidence level
0.95 is the probability value associated with a 95% confidence interval. The probability that
the interval 
7.8, 9.4 
mg includes  is 0.95. Hence, there is a 95% chance that the mean
alcohol content for the population of all bottles is between 7.8 and 9.4 mg.
P  7.8    9.4   0.95
f. Yes. Accept H 0 :   8.3.
g. Narrower.

CHAPTER 3: HYPOTHESIS TESTING


PAG MISTAKES CORRECTION
E
143 REMOVE THE FOLLOWING OBJ:
 Calculate the P-value and apply it in the hypothesis testing.
ADD THE FOLLOWING OBJ:
 Identify the type of error in hypothesis testing.
148 Remove the paragraph after table 3.3 until pg 150, example 3.2 and exercise 3.11 number 2
to a new section 3.10 Type I and Type II Errors.
159 ADD THE FOLLOWING INFO BEFORE EXAMPLE 3.3.
For each type of hypothesis for 1   2 , the same rejection procedure is given as in Section 3.1
and Section 3.2 is applied. For the case 2 to 5, if the equality assumption of the population
variances is not given, the researcher needs to conduct a hypothesis testing for the variance
equality using the procedure given in Section 3.7 before chosing the correct formula for the
hypothesis testing of the difference between two population means of independent samples.
162 EXERCISE 3.3.1 EXERCISE 3.3.1
1. The lifetime standard deviation for the 1. The lifetime standard deviation for the
battery type A is 1 and the lifetime battery type A is 1 hour and the lifetime
standard deviation for the battery type B standard deviation for the battery type B is
is 0.7 hours. If the mean lifetime for 30 0.7 hours. If the mean lifetime for 30
battery type A is 5.3 hours while the battery type A is 5.3 hours while the mean
mean lifetime for 35 battery type B is lifetime for 35 battery type B is 4.8 hours,
4.8 hours, can we conclude that the can we conclude that the lifetime of both
lifetime for both batteries type A and batteries, type A and type B are same at
type B are same at significance level, α significance level, α = 0.05?
= 0.05?
3. A study was conducted to investigate the
3. A study was conducted to investigate the effectiveness of using music during the working
effectiveness of using music during the hours of a business. When the music was turned
working hours of a business. When the off during the working hours, the mean and
music was turned off during the working standard deviation of productivity level of 45
hours, the mean and standard deviation employees were found to be 5.2 and 2.4,
of productivity level for 45 employees respectively. On a different day, the music was
were found to be 5.2 and 2.4, turned on and the mean and standard deviation
respectively. On a different day, the of productivity level of 40 employees were 4.8
music was turned on and the mean and and 1.2, respectively. Do employees perform
standard deviation of productivity level better at work with music playing? What can we
for 40 employees were 4.8 and 1.2, conclude at the 0.05 level of significance if both
respectively. Do employees perform population variances are the same?
better at work with music playing?
What can we conclude at the 0.05 level 4. Is there any difference between the mean
of significance if both population yields? Assume the population variances are
variances are the same? equal.

4. Is there any different between the mean 5. Determine whether the mean of breaking
yields? Assume the variances population strength of composite A is similar to composite
are equal. B at 0.01 level of significance. Assume both
population variances are different.
5. Determine whether the mean of breaking
strength for composite A is similar to
composite B at 0.01 level of
significance. Assume both population
variances are different.
170 EXERCISE 3.5 2. A particular consumer association wants to
2. A particular consumer association determine whether there is a difference
wants to determine whether there is a between the population proportions of the
difference between the population two leading car manufacturers that need
proportions of the two leading car major repairs within two years of their
manufacturers that need major purchase. A sample of 400 two-year
repairs within two years of their owners of car Model 1 is contacted, and a
purchase. A sample of 400 two-year sample of 500 two-year owner of car Model
owners of car Model 1 is contacted, 2 is contacted. The number of owners for
and a sample of 500 two-year owner Model 1 and Model 2 who report that their
of Model 2 is contacted. The number cars needed major repairs within the first
of owners for Model 1 and Model 2 two years are 53 and 78, respectively.
who report that their cars needed a) Construct a 98% confidence interval for
major repairs within the first two the difference in the two population
years are 53 and 78, respectively. proportions of cars that needed major
a) Construct a 98% confidence interval repairs.
on the difference in the two b) Determine whether the population
population proportions of cars that proportion of cars for the Model 1 is less
needed major repairs. than 0.25 at 10% significance level.
b) Determine whether the population c) Test the consumer association’s
proportion of cars for Model 1 is less hypothesis at 1% significance level.
than 0.25 at 10% significance level.
c) Test the consumer association’s
hypothesis at 1% significance level.
172 EXAMPLE 3.6: EXAMPLE 3.7:
Listed are waiting times (in mins) of Listed are waiting times (in mins) of customers
customers at a bank. in a bank.
……. …….
Step 1: X : The waiting times of customers Step 1: X : The waiting times of customers in a
at a bank bank
…… …….
Step 3: Given   0.01 and the test is left- Step 3: Given   0.01 and the test is left-
tailed test, hence the critical value tailed test, hence the critical value is
2
χ 0 .01 ,5 =0 .5543 .  0.99,5
2
 0.5543
is .

Step 4: Since

 test
2
 
 1.1520   0.99,5
2
 0.5443  Step 4: Since

, then we do not reject H 0 .



 test
2
 
 1.1520   0.99,5
2
 0.5543 ,
then we do not reject H 0 .
173 EXERCISE 3.6 1. A cigarette manufacturer wishes to test the
1. A cigarette manufacturer wishes to test claim that the variance of nicotine content of
the claim that the variance of nicotine its cigarettes is 0.644. Nicotine content is
content of its cigarettes is 0.644 assumed to be normally distributed. A
milligrams. Nicotine content is assumed sample of 20 cigarettes is randomly selected
to be normally distributed. A sample of and it is found that the standard deviation is
20 cigarettes is randomly selected and it 1.75 milligram. Is there enough evidence to
is found that the standard deviation is reject the manufacturer’s claim?
1.75 milligram. Is there enough 3. In a wood cutting process to produce rulers,
evidence to reject the manufacturer’s the variance of ruler’s height is set to be
claim? equal 2 at all times. If the variance of ruler’s
3. In a wood cutting process to produce height is not equal to 2, the process will stop
rulers, the variance of ruler’s height is immediately. The height (in cm) for a sample
set to be equal 2 cm at all times. If the of 10 rulers produces by the process is given
variance of ruler’s height is not equal to as follows:
2 cm, the process will stop immediately. 100.23, 100.11, 100.42, 99.66, 99.68, 100.14,
The height for a sample of 10 rulers 100.33, 100.10, 99.50, 100.21
produces by the process is given as Can we stop the process at 0.1 significance
follows: level?
……
174 ADD THE FOLLOWING NOTES AFTER TABLE 3.12
The two-tailed test for the ratio of two population variances is often used to test for the variance
equality assumption before a hypothesis testing of the difference between two population means
of independent samples is conducted (refer Section 3.3.1).
175 EXAMPLE 3.8: EXAMPLE 3.9:
A manager in a large computer operation A manager in a large computer operation
company wants to study the use of computer company wants to study the use of computer
from two departments within the company. from two departments within the company. The
The departments are Human Resource departments are the Human Resource
Department and Research Department. The Department and Research Department. The
usage time (in seconds) for each job is usage time (in seconds) for each job is recorded
recorded as follows: as follows:
………… …………
Step 1: Step 1:
X1: The usage time for each job from for X1: The usage time for each job from Human
Human Resource Department Resource Department
X2: The usage time for each job from for X2: The usage time for each job from Research
Research Department Department
H 0 :  12   22
H 0 :  12   22
H1 :  12   22  Claim: Difference in the variability 
H 0 :  12   22  Claim: Difference in variability 

175 ADD THE FOLLOWING SUBQUESTION UNDER EXAMPLE 3.10


b. The director of this company has agreed to upgrade the computer processing time in
Research Department if the mean processing time of Research Department is greater
than the mean processing time of Human Resource Department. Is there enough
evidence to support the director’s claim at 5% level of significance?
Solution (b):
Step 1:
H 0 : 2  1
H1 : 2  1 (Claim : upgrade the computer processing time in Research department)
2 2
Step 2: Based on (a),  1 and  2 are unknown and
2 2
σ 1 =σ 2 .
n1=5, { x̄ 1 =7 .8, s 1=3.2711¿
n2 =6, { x̄ 2=8. 5, s2 =3.1464 ¿
then the pooled variance is
 6  1  3.14642    5  1  3.27112 
sp   3.2024
652

ttest 
 8.5  7.8   0  0.3610
1 1
3.2024 
the test statistic is 6 5 .
Step 3: Given   0.05 and the test is right-tailed test, hence the critical value is
t0.05,6 5 2  t0.05,9  1.8331
.

 t  0.3610    t0.05,9  1.8331 , we fail to reject H 0 .


Step 4: Since test
Step 5: At 5% significance level, there is no enough evidence to support the director’s claim.
176 EXERCISE 3.7 EXERCISE 3.7
1. Before service, a machine can packed 10 1. Before service, a machine can packed 10
packets of sugar with variance weight 64 packets of sugar with variance weight 64
g while after service the variance weight while after service the variance weight for 5
for 5 packets of sugar are 25 g. Do the packets of sugar are 25. Do the services
services improve the packaging process? improve the packaging process?
179 EXERCISE 3.8 EXERCISE 3.8
2. Consider the following Exercise. 2. Consider the following Exercise. Calculate
Calculate the corresponding confidence the corresponding confidence interval and
interval and describe the relationship describe the relationship between the
between the hypothesis test and the hypothesis test and the confidence interval
confidence interval for each problem. for each problem.
a) Exercise 3.2 number 3 a) Exercise 3.2 number 2

179 Topic 3.9: Remove obj 1 then rename the section as HYPOTHESIS TESTING USING P-
VALUE APPROACH
CHAPTER 4: ANALYSIS OF VARIANCE
PAGE MISTAKES CORRECTION
223 ADD THE FOLLOWING NOTE UNDER TABLE 4.2.
Note:
1. The treatments are also known as the levels of the factor that affect the dependent variable.
2. The total number of treatment equal to the total number of levels of the factor.
3. The number of population is equal to the number of treatments, k.
225 For H0 add the following statement:
 No differences between the population means
232 ADD THE FOLLOWING NOTE UNDER HYPOTHESIS FOR MARGINAL EFFECT:
Marginal effect
(If there is no interaction effect between factor A and factor B)
234 Replace Yes and No in Figure 4.2 to:
Yes (Reject H0AB)
No (Do not Reject H0AB)
235 EXAMPLE 4.2: EXAMPLE 4.2:
A chemical engineer studies the effects of A chemical engineer studies the effects of
various reagents and catalysts on the yield of various reagents and catalysts on the yield of a
a chemical process. The yield is expressed chemical process. The yield is expressed as a
as a percentage of a theoretical maximum. percentage of a theoretical maximum. Two runs
Two runs (replications) of the process are (replications) of the process are done for each
made for each combination of three reagents combination of three reagents and four catalysts.
and four catalysts. The data is given as The data is given as follows.
follows.

CHAPTER 5: LINEAR REGRESSION & CORRELATION


PAGE MISTAKES CORRECTION
273 Add procedure for calculator model fx 570ES

STEP 1: Insert data → MODE 3: STAT, 2: A+BX, insert data, =, AC


STEP 2: Data summary:

Shift 1 → 3: Sum →     
x 2 , x, y 2 , y, xy

Shift 1 → 4: Var → n, x , y
STEP 3: Clear data → Shift 9

274 Example 5.1 Example 5.3


279 Example 5.4 (a) ˆ  26.8922
0
ˆ0  26.8926
yˆ  26.8922  4.0593 x
yˆ  26.8926  4.0593 x
279 Example 5.4 (b), figure 5.5 yˆ  26.8922  4.0593x
y  4.0593x  26.893
280 Example 5.4 (d) When x  16,
yˆ  26.8922  4.0593  16   91.841
When x  20,
yˆ  26.8922  4.0593  20   108.0782
When x  16,
yˆ  26.8926  4.0593  16   91.8414
When x  20,
yˆ  26.8926  4.0593  20   108.0786
280 EXERCISE 5.2 (2) A study is conducted to investigate the
A study is conducted to investigate the relationship between the cost of fire damage (in
relationship between the cost of fire damage RM million) and the distance between the fire
and the distance between the fire station and station and the location involves in the fire
the location involves in the fire accident accident (in km) using regression method. The
using regression method. The data is given data is given as follows.
as follows.
281 5.3 HYPOTHESIS TESTING FOR 5.3 LINEARITY TEST
SIMPLE LINEAR REGRESSION By the end of this topic, you should be able to:
MODEL 1. Conduct hypotheses testing for slope of
By the end of this topic, you should be able simple regression model using t-test..
to: 2. Conduct hypothesis testing for slope of
1. Conduct hypotheses testing for intercept simple regression model using ANOVA.
and slope of simple regression model.
2. Conduct hypothesis testing for slope of
simple regression model using ANOVA.
3. Conduct hypothesis testing based on
Microsoft Excel output
281 In this section we will present both methods In this section we will present both methods for
for testing the significance of regression testing the linearity of the regression model
using t-test and ANOVA. using t-test and ANOVA.
281- REMOVE SECTION 5.3.1 and EXAMPLE 5.5
283
283 5.3.2 HYPOTHESIS TESTING FOR 5.3.1 HYPOTHESIS TESTING FOR SLOPE
SLOPE USING t-TEST
284 Example 5.6 Example 5.5
Step 1 Step 1
yˆ  26.8926  4.0593 x where ˆ0  26.8926 yˆ  26.8922  4.0593 x where ˆ0  26.8922
285 5.3.3 HYPOTHESIS TESTING FOR 5.3.2 HYPOTHESIS TESTING FOR SLOPE
SLOPE USING ANOVA USING ANOVA
286 Example 5.7 Example 5.6
287 Exercise 5.3 (1b) REMOVE
289 Interpretation of Regression Coefficient: REMOVE no 2
Table 5.10
Add: H 0 : 1  0, H1 : 1  0
289 Interpretation of Regression Coefficient:
Table 5.10 No 3
291 OBJ 4: choose the most significant predictors from the
choose the most significant model from the multiple linear regression analysis.
multiple linear regression analysis.
295 Example 5.8 Example 5.7
296 Replace answer for (e) as follows:
For 1  H 0 : 1  0, H1 : 1  0
 P  value  0.0399      0.05 , x1 is significant predictor for y
For  2  H 0 :  2  0, H1 :  2  0
 P  value  0.0264      0.05 , x2 is significant predictor for y
297 Table 5.12: Interpretation of regression coefficients
ˆ1  20.4921 : When x2 is held constant, the estimated y value will increase/decrease by each
unit of x1.
ˆ2  0.2805 : When x1 is held constant, the estimated y value will increase/decrease by each unit
of x2

297 EXERCISE 5.5 (2) c) write the predicted regression model and
c) write the predicted regression model. interpret the regression coefficients.
301 Example 5.9 Example 5.8
309- ANSWER FOR EXERCISES
310
2
5.1 (e) r  0.5912 , 59.12% of the variation in thefinal exam marks can be explained by the
carry mark.
(f) Some students have good carry marks but did not perform in final exam. For example,
students no.1 and 9.

5.3(1)
a) ..
b) REMOVE
c) move to b and replace the answer (please double check)
(i) t-test=12.3765
(ii) f-test =153.1769

5.3(2) REMOVE THE ANOVA TABLE.

5.5 (2)
Wrong numbering and real answer for (a) is missing
a) r  0.7831 , there is a strong correlation between final exam marks, carry marks and hours
they spent to study.
b) replace with old (a)
c) replace with old (b)
d) replace with old (c) and (d)

5.5(3)
e) Coefficients table: The significant parameters are for income and temperature variables with
(Pvalue = 0.0225) < (α = 0.05) and (P-value = 0.0000) < (α = 0.05), respectively. The price
parameter is insignificant (P-value = 0.6261) > (α = 0.05).

334- ANSWER FOR TUTORIAL (final exam questions)


340
2 (d). is answer for (c) also. The next answer is for d and e
6 (c). the best is x2 only since x3 not significance
16 (b). the best is W only
18. real answer for (a) is missing. a to d is actually answer for b to e.
18 (a).
Response variable: sales of quarterly periods including the demand for the small size economic
bottle of Dyenamo, y (in millions of bottles),
Predictor variables: the price of Dyenamo as offered by the enterprise, x1 (in RM) the average
industry price of competitors similar detergents, x2 (in RM) and its advertising expenditure to
promote Dyenamo, x3 (in RM millions).
CHAPTER 6: GOODNESS OF FIT TEST AND CONTINGENCY TABLE
PAGE MISTAKES CORRECTION
342 Obj 2: Obj 2:
Test the data whether it fits to Binomial, Test a data whether it fits to the Binomial,
Poisson or Normal distribution. Poisson or Normal distribution.
342 Assumptions for chi-squared GoF Test: Assumptions of GoF Test for Categorical Data:
343 Procedures for Chi-squared GoF Test: Procedures of GoF Test for Categorical Data:
344 The Hypothesis Testing for Chi-squared GoF The Hypothesis Testing of GoF Test for
Test for Categorical Data: Categorical Data:
344 The Test Statistic for Chi-squared GoF Test: The Test Statistic of GoF Test for Categorical
Data:
344 Last paragraph: Last paragraph:
The test called as GoF test because the The test is called as GoF test because the
hypothesis tested how good the observed hypothesis tested how well the observed
frequencies fit a given distribution. frequencies fit a given distribution.
347 Exercise 6.1.1 No. 2 At 2.5% significance level, is there any
At 2.5% significance level, is there any sufficient evidence to conclude that the claim of
sufficient evidence to conclude that the claim the recent study can be accepted?
of recent study can be accepted?
352- Example 6.4
353 P  1.3  X  1.5  P  1.25  X  1.55
 1.25  1.5 1.55  1.5 
 P Z 
 0.5 0.5 
 P  0.5  Z  0.1
 0.2313
P  1.6  X  1.8   P  1.55  X  1.85
 1.55  1.5 1.85  1.5 
 P Z 
 0.5 0.5 
 P  0.1  Z  0.7 
 0.2182
P  1.9  X  2.1  P  1.85  X  2.15 
 1.85  1.5 2.15  1.5 
 P Z 
 0.5 0.5 
 P  0.7  Z  1.3
 0.1452
354 EXERCISE 6.1.2 No. 1 Test the claim that the number of calls per hour
Test the claim that the number of calls per follows a Poisson distribution with a mean of
hour follows a Poisson distribution with a one call per hour.
mean of one hour.
354 EXERCISE 6.1.2 No. 2 Test at 2.5% significance level whether these
Test at 2.5% significance level whether these data fit to the Binomial distribution or not with
data fit to the Binomial distribution or not. p3
parameter n  4 and 4.
354 Section 6.2 Obj 1: Test two variables for independence using Chi-
1. Test two variables for independence. squared test.
360 Example 6.6 A researcher selects a sample of 50 seniors from
A researcher selects a sample of 50 seniors each of three secondary school areas and asked
from each of three secondary school areas each students, “Do you come to school on your
and asked each students, “Do you come to own?”. The data are shown as follows.
school on your own or sent by your
parents?”. The data are shown as follows. Test the claim that the proportion of students
who came to school on their own is the same
Test the claim that the proportion of students for all schools.
who came to school on their own or sent by
their parents is the same for all schools.
360 Example 6.6 step 1 H0 : 1   2 =  3
H 0 : All proportions are the same H1 :  i   j for at least one i  j
H1 : At least one proportion is different from OR
the others. H 0 : All proportions are the same
H1 : At least one proportion is different from the
others.
361 Example 6.6 step 5 At   0.05 , there is sufficient evidence to
At   0.05 , there is sufficient evidence to conclude that the proportion of students who
conclude that the proportion of student come came to school on their own is the same for all
to school on their own or sent by their schools.
parents is the same for all schools.
363 ADD THE FOLLOWING NOTES AND EXAMPLE:
Goodness of Fit Test for Fitting of the Distribution Using Microsoft Excel:

In testing the GoF test for fitting of the distribution (hypothesised distribution), the step by
step using Microsoft Excel 2010 is similar as testing the Gof for free distribution. The expected
frequency should be calculated by using the probability value. While the probability value is
calculated depends on the probability mass function for Poisson distribution or Binomial
distribution and probability density function for a normal distribution.
In this module you are advised to use the following procedure to calculate the
probability value for a normal distribution.

Goodness of Fit Test for Normal Distribution Using Microsoft Excel

STEP 1 : Apply continuity correction on class interval (if needed)

STEP 2 : Find the normal probability, Pi = NORMDIST(x, mean, standard deviation,


cumulative)
Note: Cumulative
TRUE (Cumulative distribution function, example: P (X<1.25))
FALSE (Probability density function, example: P (X=1.25))

STEP 3 : Find the expected frequency, Ei = nPi


STEP 4 : Find P-value = CHISQ.TEST(Actual range, Expected range)
STEP 5 : Make a decision on H0. Reject H0 if P-value < α.
STEP 6 : Draw a conclusion to reject or not to reject the claim.

EXAMPLE 6.10:

By considering Example 6.4, the GoF test can be solved using Microsoft Excel as follows.
Step 5: Since (P-value = 0.2839) > (α = 0.025), do not reject H 0 .
Step 6: At α = 0.025, there is enough evidence to conclude that the sugar concentration in apple
juice is normally distributed.
367 Answer to exercises
6.1.2 (2):  test  50.5590 , reject H 0
2

6.1.2 (3): KIV


ME.7:  test  11.5752 , reject H 0
2

APPENDIX B
PAGE MISTAKES CORRECTION
398 Para 3: probability density function (pdf)
probability denstiy function (pdf)
408 Standard Normal Distribution: Standard Normal Distribution:
A standard normal distribution is a normal A standard normal distribution is a normal
distribution with mean zero 
  0
distribution with mean zero 
  0
and and

variance is one
   0 .
2

variance is one
   1 .
2

415 P  Z  2.13  P  2.13  0.0166 P  Z  2.13  P  2.13  0.0166

You might also like