"Business Statistics For Managers" Unit 5

Download as pdf or txt
Download as pdf or txt
You are on page 1of 34

“BUSINESS STATISTICS FOR

MANAGERS”

UNIT 5
Some Definitions for estimates
• Population is a collection of entities having some
attributes called as characteristics.
• A sample is a finite subset of population. The
number ‘n’ of elements in the sample is called the
size of the sample.
• The population characteristics such as mean µ, the
variance σ2 etc. are known as the ‘parameters’.
Corresponding measures of the statistical constants
and s2 etc. computed on the basis of sample
observations are called ‘statistics’.
• An estimate of a population given by a single
number is called the point estimate of the
parameter. An estimate given by two numbers
between which the parameter lies is called an
interval estimate.
One-tailed and two-tailed tests: In test of level of
significance, critical region is represented by a portion of
the area under the probability curve of a sampling
distribution of the test statistic z=[θ-E(θ)]/√[Var(θ)].
If H0 is tested while H1 is one-tailed (right or left), it is
called a one-tailed test. For example, if H0: μ=μ0 is tested
against H1: μ>μ0 (right-tailed) or H1: μ<μ0 (left-tailed) is a
single tailed test. The one-tailed test provides more power
to detect an effect in one direction by not testing the
effect in the other direction.
If H0 is tested while H1 is two-tailed, it is called two-tailed
test. For example, if H0: μ=μ0 is tested against H1: μ≠μ0 ,
then the test is called two-tailed test. A two-tailed test will
test both if the mean is significantly greater than θ and if
the mean significantly less than θ. The mean is considered
significantly different from θ if the test statistic is in the
top 2.5% or bottom 2.5% of its probability distribution,
resulting in a p-value less than 0.05 (See Figure 1).
Two-tailed test for α=0.05
One-tailed test: Left-tailed test α=0.05
One-tailed test: Right-tailed test for α=0.05
P value in Right-tailed test

Note: Very unlikely observations on tails fall below area α=0.1or 0.05.
If P value for test statistic is more than this α value, H0 is accepted.
• Null Hypothesis: It is a claim (contention or
conjecture) denoted by H0 initially believed to be
true.
• Alternative hypothesis is complementary to H0
and is denoted by H1 or Ha.
• The procedure for deciding whether to accept or
reject H0 is called testing of hypothesis.
• A statistical hypothesis or a claim is made about
the values of one or more population
characteristics.
• A region in the sample space S which amounts to
rejection of H0 is called as critical region or region
of rejection. The region of S which amounts to the
acceptance of H0 is called acceptance region.
Type I error is the error made in rejecting H0 when it is
true. This is like a good product being rejected by
consumer hence also called as producer’s risk.
Type II error is the error committed in accepting H0
when it is false, also called as consumer’s risk. The
probabilities of the type I and type II errors are denoted
by α and β respectively, that is P[Reject a lot when it is
good]=α; P[Accept a lot when it is bad]=β.
Note: The probability α that a random value belongs to a
critical region is called level of significance (LOS or los).
Thus, level of significance, LOS, is denoted by α.
Test of Significance in sampling theory enables us to
decide, on the basis of the results of the samples,
whether there exists significant deviation between (i)
the observed sample statistics and hypothetical
parameter value; or (ii) two sample statistics.
Simple Linear Regression equation μY|x= β0+β1x
Let b0 and b1 be the estimates for β0 and β1 respectively;
and ei (called the residual) be the vertical distance from
a point (xi,yi) to the estimated line of regression. Then
each data point satisfies yi=b0+b1xi+ei.
Least-Squares Estimation: The sum of squares of the
errors (SSE) about the estimated regression line is:
SSE=Σi=1n ei2 = Σi=1n (yi -b0 -b1xi)2
Differentiating SSE w.r.t. b0 and b1, and equating to zero:
(SSE)/ b0=0 gives -2Σi=1n (yi -b0 -b1xi) =0
(SSE)/ b1=0 gives -2Σi=1n (yi -b0 -b1xi) xi =0
Rewriting these, we get the normal equations as:
n b0 + b1Σi xi = Σi yi ,
b0Σi xi+b1Σi xi2 =Σi xiyi
They are solved to get least-squares estimates of β0,β1as
b1=[nΣi xiyi –(Σi xi)(Σi yi )]/[nΣi xi2 – (Σi xi)2]= cxy/sX2
b0= Σi yi /n - b1 Σi xi /n.
Q. 1: A low-noise transistor for use in computing
products is being developed. It is claimed that
the mean noise level will be below the 2.5 dB
level of products currently in use.
a) Set up the appropriate null and alternative
hypothesis for verifying the claim.
b) A sample of 16 transistors yields sample
mean 1.8 with sample S.D. 0.8. Find the T
statistic value.
c) Draw the conclusions concerning the noise
level of these transistors at 5% level of
significance [LOS].
Sol.:1: a) H0: μ=2.5, H1: μ<2.5 (Left tailed test)
b) At LOS α=5%=.05, P[t15>t.05]=.05. dof n=n-1=16-1=15
So P[t15<t.05]=.95. This gives t.05=1.753 from T table.
Given =1.8, s=0.8, n=16. Test statistic is:
T15=( -μ)/(s/√n)=(1.8-2.5)/(.8/√16)=-3.5.
From T table: 3.5 lies between 2.947 and 3.733 for
dof n=16-1=15; |Tcomp|=3.5>ttable =1.753 at α=0.05.
Therefore, we reject H0 at LOS α=0.05.
c) Conclusion: Mean noise level is below 2.5 db. If a
type I error is made, we shall assume that new
product reduces noise when, in fact, it does not.
Note: P value Calculation:
Here P[|T15| > 2.94)=0.005, P[|T15| > 3.73)=0.001.
or P[T15 < -2.94)=0.005, P[T15 < -3.73)=0.001.
Thus 0.001<P<0.005. P value is very small.
Q. 2: In order to be effective, reflective highway
signs must be picked up by the automobile’s
headlights. To do so at long distances requires
that the beams be on “high.” A study conducted
by highway engineers reveals that 45 of 50
randomly selected cars in a high-traffic-volume
area have the headlights on low beam.
a) Find a point estimate for p, the proportion
of automobiles in this type area that use
low beams.
b) Find a 90% confidence interval on p.
c) How large a sample is required to estimate
p to within 0.02 with 90% confidence?
Sol.2: a) Given n=50, pe=45/50=0.9 is the point
estimate of population proportion p. we have two-
tailed test.
b)90% Confidence interval of p is : pe+zα/2√[peqe/n].
or pe+z0.05√[peqe/n] [Since 100(1-α)%=90% gives α=0.1]
Here z0.05 is the value for which P[Z>z0.05]=0.05,
i.e. P[z< z0.05]=0.95. So z0.05=1.645 [See normal table]
Thus, pe+z0.05√[peqe/n] gives 0.9+1.645√[0.9(0.1)/50]
Or 0.9+0.0698 or [0.83021, 0.96979] is 90%
confidence interval of population proportion p.
c) Given d=0.02. Sample size for estimating p for given
prior estimate pe is n=z2α/2 pq/d2
=(1.645)2 0.9(0.1)/(0.02) 2=609
Q.3: A new computer network is being designed.
The makers claim that it is compatible with
more than 99% of the equipment already in use.
a) Set up the null and alternative hypothesis
needed to get evidence to support this
claim.
b) A sample of 300 programs is run, and 298
of these run with no changes necessary.
That is, they are compatible with the new
network. Can H0 be rejected?
c) What practical conclusion can be drawn on
the basis of your test?
Sol.3: a) H0: p=0.99, H1: p>.99 (Right tailed test)
b) Given pe=298/300= 0.9934, p0=0.99, n=300.
Test statistic is:
Z=(pe-p0)/√[p0q0/n]
=(0.993 -0.99)/ √[0.99(0.01)/300]=0.57
From normal table: P[Z<0.57]=0.7157.
So P[Z>0.57]=1-0.7157=0.2843= P in right tail.
Since this P value is quite significant, H0 can not be
rejected.
Note: If the above P value had been less than
0.05, we could reject H0 at test level α=0.05.
c) Here we are not able to show that the new
network is compatible with more than 99% of the
equipment already in use.
Q. 4: It is thought that over 60% of the business
offices in the United States have a mainframe
computer as part of their equipment.
a) Set up the appropriate null and alternative
hypothesis for supporting this claim.
b) Find the critical point for an α=0.05 level test.
c) When data are gathered, it is found that 233
of the 375 offices studied have mainframe
computers. Can H0 be rejected at the α=0.05
level?
d) Explain in the context of this problem, the
practical consequences of making the type of
error to which you are subject.
Sol.4: a) H0: p=0.6, H1: p>.6 (Right tailed test)
b) For an α=0.05 level test, critical point zα=z0.05 is the
value for which P[Z>z0.05]=0.05. i.e. P[z< z0.05]=0.95.
So z0.05=1.645 [See normal table]
c) Given pe=233/375=0.6213, p0=0.6, n=375.
Test statistic is: Z=(pe-p0)/√[p0q0/n]
=(0.6213-0.6)/√[0.6(0.4)/375]=0.843
Since Z=0.843<z0.05=1.645, H0 can not be rejected at
test level α=0.05. The only option left is to accept H0.
In doing so, we may commit Type II error.
d) We can not show that more than 60% of the
business offices in the US have mainframe if, in
reality, it is true.
Q. 5: A machinist is making engine parts with axle
diameters of 0.700 inch. A random sample of 10
parts shows a mean diameter of 0.742 inch with a
standard deviation of 0.040 inch. Construct 95%
confidence limits for true mean axle diameter.
Sol. 5: Sample size n=10, = 0.742, s=0.040.
For small sample n<30, 100(1-α)% confidence
interval is given by +tα/2 s/√n. The 95%
confidence limits for two tailed test α=0.05, we
need tα/2= t0.025 value from T table for dof =n-1
=10-1=9. t0.05 is such that P[T>t0.025]=0.025 or
P[T<t0.05] =1-0.025 =0.975, we get t0.025=2.262.
Confidence limits are 0.742+2.262(0.04)/√10 or
0.742+ 0.0286 or (0.7134, 0.7706)
Q.6: Metal conduits or hollow pipes are used in electrical
wiring. In testing 1-inch pipes, these data are obtained on the
outside diameter (in inches) of the pipe:
1.281 1.288 1.292 1.289 1.291
1.293 1.293 1.291 1.289 1.288
1.287 1.291 1.290 1.286 1.289
1.286 1.295 1.296 1.291 1.286
Assume that sampling is from a normal distribution and find
(i) 95% confidence interval (ii) 90% confidence interval on the
mean outside diameter of pipes of this type.
Sol. 6: Assuming normal distribution, end values L & U of
100(1- α) % confidence interval is given by +Zα/2 s/√n.
Sample mean and sample S.D. Calculation for outside
diameter (in inches) of the pipe:
Sample mean and sample S.D. Calculation for
outside diameter (in inches) of the pipe:
No. 1 2 3 4 5 6 7 8 9 10
X 1.281 1.288 1.292 1.289 1.291 1.293 1.293 1.291 1.289 1.288
2
(x-E(x)) 0.000074 0.000003 0.000006 0.000000 0.000002 0.000012 0.000012 0.000002 0.000000 0.000003

No. 11 12 13 14 15 16 17 E(X)=
18 19 20
Σxi/n
X 1.287 1.291 1.290 1.286 1.289 1.286 1.295 1.296 1.291 1.286 1.289
2
(x-E(X))
0.000007 0.000002 0.000000 0.000013 0.00000 0.00001 0.000029 0.00004 0.000002 0.00001 0.0000123
s2
For given sample size n=20, mean and variance
are calculated as in above given table are
=E(X)=Σxi/n=1.2896, s2=Σ(xi- )2/(n-1)
=0.0000123 gives s=0.0035.
(i) The 95% confidence limits for two tailed test
α=0.05, we need Zα/2= Z0.025 value from Normal
table. Z0.025 is such that P [Z>Z0.025] = 0.025 or
P[Z<Z0.025] =1-0.025 =0.975. Normal table gives
Z0.025=1.96. So confidence limits are given by
1.2896+1.96(.0035)/√20, or 1.2896+
0.001534, or (1.288066, 1.291134).
(ii) The 90% confidence limits for two tailed test
α=0.10, we need Zα/2= Z0.05 value from Normal
table. Z0.05 is such that P [Z>Z0.05] = 0.05 or
P[Z<Z0.05] =1-0.05 =0.95 for which normal table
gives Z0.05=1.645. So confidence limits are
given by 1.2896+1.645(.0035)/√20 or
1.2896+0.001287 or (1.288313, 1.290887).
Q. 7: Find the correlation between X and Y for the
following data.
Enzyme Level (X) 95 110 118 124 145 140 185 190 205 222
Detoxification Level (Y)
108 126 102 121 118 155 158 178 159 184
Sol.7: n=10, Σy=1409, y x y2 x2 xy
Σx=1534, Σy2=206319, 108 95 11664 9025 10260
Σx2=252684, Σxy=226463.
126 110 15876 12100 13860
To find: ρXY=cXY /(sXsY)
cXY = Σ xy/n–(Σx)(Σy)/n2 102 118 10404 13924 12036
=226463/10-1534(1409)/100 121 124 14641 15376 15004
=1032.24. 118 145 13924 21025 17110
2 2 2
sX=[Σx /n– (Σx) /n ] 1/2
=[252684/10-(1534)2/100]1/2 155 140 24025 19600 21700
=41.6754 158 185 24964 34225 29230
sY=[Σy2/n– (Σy)2/n2]1/2 178 190 31684 36100 33820
2
=[206319/10-(1409) /100] 1/2
= 27.9122; 159 205 25281 42025 32595
So ρXY=cXY /sXsY 184 222 33856 49284 40848
= 1032.24/[41.6754(27.912)] 1409 1534 206319252684 226463
ρXY=0.887375
Σyi Σxi Σyi2 Σxi2 Σxiyi
Q. 8: Find correlation between X and Y for the following data.
Percent of .01 .03 .01 .02 .10 .08 .12 .15 .10 .11
copper (X)
Rockwell 58 66 55 63 58 57 69 70 65 62
hardness rate (Y)
Sol.8: n=10, Σy=623, Σx=0.73, y x y2 x2 xy
Σy2=39057, Σx2=0.0769, Σxy=46.83. 58 0.01 3364 0.0001 0.58
To find: ρXY=cXY /sXsY 66 0.03 4356 0.0009 1.98
cXY = Σ xy/n–(Σx)(Σy)/n2 55 0.01 3025 0.0001 0.55
=46.83/10-0.73(623)/100 63 0.02 3969 0.0004 1.26
=0.1351. 58 0.10 3364 0.0100 5.80
sX=[Σx2/n– (Σx)2/n2]1/2 57 0.08 3249 0.0064 4.56
=[0.0769/10-(0.73)2/100]1/2 69 0.12 4761 0.0144 8.28
=0.0486 70 0.15 4900 0.0225 10.50
2 2 2
sY=[Σy /n– (Σy) /n ] 1/2
2 1/2 65 0.10 4225 0.0100 6.50
=[39057/10-(623) /100]
62 0.11 3844 0.0121 6.82
= 4.9406 ;
So ρXY=cXY /sXsY 623 0.7339057 0.0769 46.83
Σy Σx Σy 2 Σx 2 Σxiyi
= 0.1351/[0.0486(4.9406)]=0.56276 i i i i
Q. 9: The relationship between energy consumption and
household income was studied, yielding the following
data on household income X ( in units of 1000/year) and
energy consumption Y (in units of 108 Btu/year)
a) Estimate the linear regression
equation μγ|x= β0+β1x. Energy Household
b) If (house hold income of 50,000), consumption income
estimate the average energy
(Y) (X)
consumed for households of this
income. What would your estimate 1.8 20.0
be for a single household? 3.0 30.5
c) How much would you expect the 4.8 40.0
change in consumption to be if any 5.0 55.1
household income increases 6.5 60.3
2000/year (2 units of 1000)?
d) How much would you expect 7.0 74.9
consumption to change if any 9.0 88.4
household income decreases 9.1 95.2
2000/year?
Sol. 9: Given sample size Y X X2 XY
n=8, xi and yi are given as 1.80 20.00 400.00 36.00
3.00 30.50 930.25 91.50
in the table. From the 4.80 40.00 1600.00 192.00
table we find Σxi=464.4, 5.00 55.10 3036.01 275.50
Σyi, =46.2, Σyi2=315.34, 6.50 60.30 3636.09 391.95
7.00 74.90 5610.01 524.30
Σxi2 = 32089.96 and 9.00 88.40 7814.56 795.60
Σxiyi =3173.17 9.10 95.20 9063.04 866.32
46.20 464.40 32089.96 3173.17
Using these values, get =Σyi =Σxi =Σxi2 =Σxiyi
b1=[Σi xiyi /n-(Σi xi/n)(Σi yi /n)]/[Σi xi2 /n- (Σi xi/n)2]= 0.0957;
b0= Σi yi /n - b1 Σi xi /n= 0.2177.
Using β0= 0.2177 and β1=0.0957 in μY|x = β0+ β1 x, we get
linear regression equation μY|x = 0.2177+0.0957x (1)
(b) If x=50, eq. (1) gives μY|x=0.2177+0.0957(50) =5.0043.
(c) Change in consumption for increase of 2 units is given
by ΔμY|x =0.0957Δx=0.0957(2)= 0.1915.
(d) Similarly decrease in consumption for decrease of 2
units is given by ΔμY|x =0.0957Δx=0.0957(2)= 0.1915.
Q. 10: The following data represent
carbon dioxide (CO2) emissions from
coal-fired boilers (in units of 1000 tons)
over a period of years between 2010 and
2016. The independent variable (year) has
been standardized to yield the following
table:
Year (x) 1 2 3 4 5 6 7
CO2 emission(Y) 910 680 520 450 370 380 340
a. Estimate the linear regression equation
y= β0+β1x.
b. Estimate the average CO2 emission from coal-
fired boilers for the year 2018.
Sol. 10: (a) Given sample size n=7, xi and yi are given as
in the table. From the table we find Σxi=28,
Σyi=3650, Σyi2=2160300, Σxi2 =140 and Σxiyi =12140.
Using the above values, we
get b1=[Σxy/n-(Σx/n)(Σy/n)]/ S.N. Year Y X X2 XY
[Σx2/n-(Σx/n)2]=cxy/sX2 1 2010 910 1 1 910
=-87.857; 2 2011 680 2 4 1360
3 2012 520 3 9 1560
b0 = Σy/n - b1 Σx/n= 872.86 .
4 2013 450 4 16 1800
Using β0= 872.86 and
β1=-87.857 in μY|x = β0+ β1 x, 5 2014 370 5 25 1850
we get linear regression 6 2015 380 6 36 2280
equation 7 2016 340 7 49 2380
μY|x = 872.86 -87.857x (1) Σyi Σxi Σxi2 Σxiyi
(b) For year 2018, we have 1214
x=9, for which eq. (1) gives 3650 28 140 0
μY|x= 872.86 -87.857×9
= 82.143.
Q. 11: Compute the least square regression equation of Y on
X and estimate the blood pressure when the age is 45 years.
Age (X) 36 38 42 47 53 60 65
Blood pressure (Y) 118 115 125 128 147 140 150
Sol.11: For given sample size n=7, xi
and yi are as in the table, we find Y X X2 XY
Σxi=341, Σyi=923, Σyi2=122867, Σxi2 118 36 1296 4248
=17347 and Σxiyi =45825. 115 38 1444 4370
Using the above values: b1=cxy/sX2 125 42 1764 5250
=[Σxy/n-(Σx/n)(Σy/n)]/[Σx2/n-(Σx/n)2] 128 47 2209 6016
=1.1717; 147 53 2809 7791
b0 = Σy/n - b1 Σx/n= 74.78 . 140 60 3600 8400
Using β0= 74.78 & β1=1.1717 in μY|x = 150 65 4225 9750
β0+ β1 x, we get linear regression
equation μY|x =74.78+1.1717x (1) Σyi Σxi Σxi2 Σxiyi
(b) For age 45 years, we have x=45. So 923 341 17347 45825
eq. (1) gives μY|x= 74.78+1.1717× 45
= 127.5 as blood pressure at 45 years

You might also like