Excel Statistical Analysis
Excel Statistical Analysis
Conjoint Analysis is used by marketers to tell which product attributes of a product are most important to a consumer and to what degree is each important to the consumer.
Step 1 - Make a list of product attributes to be evaluated by consumer. Brand Color Price A Red $50 B Blue $100 C $150 Step 2 - Make a complete list of all possible attribute combinations. Card Brand Color Price 1 A Red 50 2 A Red 100 3 A Red 150 4 A Blue 50 5 A Blue 100 6 A Blue 150 7 B Red 50 8 B Red 100 9 B Red 150 10 B Blue 50 11 B Blue 100 12 B Blue 150 13 C Red 50 14 C Red 100 15 C Red 150 16 C Blue 50 17 C Blue 100 18 C Blue 150 Step 3 - Have the consumer rank each combination on a scale of 1 (worst) to 10 (best). Card Brand Color Price 1 1 1 50 2 1 1 100 3 1 1 150 4 1 2 50 5 1 2 100 6 1 2 150 7 2 1 50 8 2 1 100 9 2 1 150 10 2 2 50 11 2 2 100 12 2 2 150 13 3 1 50 14 3 1 100 15 3 1 150 16 3 2 50 17 3 2 100 18 3 2 150
Step 4 - Final data preparation step prior to running regression - Remove 1 variable from each set of variables with more than 1 choice. Removal of these variables removes the predictability of the other variables. Card A B C Red Blue $50 $100 $150 1 1 0 0 1 0 1 0 0 2 1 0 0 1 0 0 1 0 3 1 0 0 1 0 0 0 1 4 1 0 0 0 1 1 0 0 5 1 0 0 0 1 0 1 0 6 1 0 0 0 1 0 0 1 7 0 1 0 1 0 1 0 0 8 0 1 0 1 0 0 1 0 9 0 1 0 1 0 0 0 1 10 0 1 0 0 1 1 0 0
11 12 13 14 15 16 17 18
0 0 0 0 0 0 0 0
1 1 0 0 0 0 0 0
0 0 1 1 1 1 1 1
0 0 1 1 1 0 0 0
1 1 0 0 0 1 1 1
0 0 1 0 0 1 0 0
1 0 0 1 0 0 1 0
0 1 0 0 1 0 0 1
Card 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
B 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0
C 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1
Blue 0 0 0 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1
$100 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0
$150 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1
Conjoint
Conjoint is an analysis that provides a marketer with a method to predict how much more or less a co one combination of product attributes over another combination of product attributes. The degree that a product attribute is called the "utility" of that attribute. For example, a product might come in three br at three levels of price. Each color, brand, and price level will have its own utility caluculated during th Conjoint is done using Multiple Regression. Each product attribute variation will assigned as one of th to the Multiple Regression equation. For example, the color red will be represented by one independe blue will be presented by another independent variable. The resulting regression equation assigns a c variable. These coefficients are the utilities of each of the attributes. The more positive an individual c highly valued is the associated product attribute. The coefficients can be interrpretted as the utilities o
In this conjoint exercise, we are going to determine the utilities of eight product attributes. They are as
There are 18 possible combinations of these attributes (3 brands x two colors x three prices). The on a scale of 0 to 10 (10 being the best). The consumer test results are modified for the regression eq The resulting regression analysis calculates a coefficient for each independent variable as part of the Each coefficient is the measure of value that the consumer places on the product attribute associated
The chart on the left side provides the choices that the consumer had to analyze. The consumer was provided with 18 separate cards. Each card contained one of the 18 possible variations of product attributes. The consumer had to rate their overall preference of each combination of attributes on a scale of 1 to 10.
The chart on the right shows the consumer's stated preference for each combination of attributes. Non-numerical attributes were assigned numbers. Brand A and Red are shown as 1's in their respect respective columns. Brand C was assigned a 3 in its respective column.
The chart is now further prepared for Regression Analysis. Each individual product attribute is given its own column. Each product attribute now has either the value of 1 or 0.
6 5 10 7 5 9 7 8
One problem must be corrected before this data can be submitted for Regression Analysis. Independent variables or combinations of independent variables should not be able to predict each other. Using independent variables that are highly correlated to each other (either positively or negatively) produce a regression error known as co-linearity. For example, if the color is either red or blue, knowing the state of one of the color (if the state of Blue = 1, the state of Red must = 0), we know the state of the other color. This error condition also occurs when there are 3 variables. If you know the states of 2, you know the state of the remaining one. These error conditions are solved by removing one column of data from each type of variation. Information about Brand A, Red, and Price level $50 were removed. We will see below that this has no effect on the accuracy of the Regression output.
Preference 5 5 0 8 5 2 7 5 3 9 6 5 10 7 5 9 7 8
SUMMARY OUTPUT
Regression Statistics Multiple R 0.933190299 R Square 0.870844134 Adjusted R Square 0.812136922 Standard Error 1.141319161 Observations 17 ANOVA df Regression Residual Total 5 11 16 Coefficients 5.916666667 1.513888889 3.347222222 1.231481481 -2.319444444 -4.319444444 SS MS 96.61247277 19.3224946 14.3287037 1.30260943 110.9411765 Standard Error 0.807034518 0.698912395 0.698912395 0.559992106 0.698912395 0.698912395 t Stat 7.33136753 2.16606387 4.7891871 2.19910507 -3.31864832 -6.18023729
For example, Price level $50 has the highest preference with with a utility of 0 while Price level $150 has the lowest utility of -4.319444444. Blue has a utility of 1.231481481, which is that much hgiher than the utility of red, which was 0. Brand C was the most liked brand with a utility of 3.347222222 with Brand A is liked the least with a utility of 0. The resulting Regression Equation still does a good job of predicting overall preference. For example, the consumer rated the combination of attributes on card 13 with a 10. Here the predicted Combination Preference for card 13 attribute combination is: (5.9166) + (3.3472)(1) = 9.263 which is very close to the consumer's rating of 10.
The regression appears to be a good one because Adjusted R Squared is high (close to 1). Adjusted R Square = Explained variance over unexplained variance. Here, Adjusted R Square is 8.12 Each of the variables has a low p-Value and is therefore a significant predictor. The absolute value of the coefficients indicates the effect that each has on the consumer's overall liking of product. For example, Brand C (coefficient = 3.347) produced the highest positive influence while the $150 price (coefficient = -4.319) reduces consumer liking the most.
The overall low significance of the regressions F statistic indicates that the regression, overall, is valid
o predict how much more or less a consumer will value of product attributes. The degree that a consumer likes mple, a product might come in three brands, two colors, and ve its own utility caluculated during the conjoint analysis. ute variation will assigned as one of the independent variable inputs will be represented by one independent variable while the color ulting regression equation assigns a coefficient to each independent utes. The more positive an individual coefficient is, the more s can be interrpretted as the utilities of the variables.
ds x two colors x three prices). The consumer rates each combination ults are modified for the regression equation and then run through the regression. ch independent variable as part of the regression output equation. es on the product attribute associated with that utiliy.
er had to analyze. The consumer of the 18 possible variations of ence of each combination of attributes
for each combination of attributes. Red are shown as 1's in their respective columns. Brand B and Blue were shown as 2's in their
ed for Regression t variables should hat are highly correlated on error known as co-linearity.
Regression output.
Lower 95% Upper 95% 4.140395669 7.692937664 -0.024406919 3.052184697 1.808926414 4.88551803 -0.001052832 2.464015795 -3.857740252 -0.78114864 -5.857740252 -2.78114864
th a utility of 0 while Price ity of 1.231481481, and C was the most liked brand
ficant predictor.
ach has on the consumer's 347) produced the highest uces consumer liking the most.
Regression
Regression is a statistical techniques that is used to create predictive models. The models receive input (independent the outcome of the dependent variable.
When performing Multiple Regression, Correlation Analysis should be performed on a independent and dependent va
S&P Viacom 0.8799 0.7541 7.5187 14.9701 5.558 11.9792 1.3716 7.907 -1.6289 -5.1724 2.4171 3.4091
Regression Statistics
Regression
Residual Total
p-Values for each variable - The Viacom returns are a good predi AT&T and GM returns are much The small coefficients of these tw
Correlation Analysis
Tools / Data Analysis / Correlation
S&P S&P Viacom AT&T GM Coke 1 0.938661647 0.128558379 0.470349107 Viacom 1 -0.098932814 0.350437967 0.342337358 AT&T GM
0.255052662
1 -0.26371086 -0.501490208
1 0.627513676
Coke has a low correlation with the S&P and is therefore not a good predictor of the S&P Also, if two of the independent variables above are highly correctlated with each other, only one of the be used in the Multiple Regression below. This is not the case here because none of the variables ab a high correlation with another variable. Using highly correlated variables as inputs to a Multiple Regr causes an error called Multicollinearity and should be avoided. Multiple Regressions should be built u new independent variable at a time and evaluating results. Good new independent variables noticeab and lower Standard Error without causing much change to Coefficients. Poor new independent variab R-Square much but have unpredictable effects on Coefficients. Build regressions up one variable at a evaluate after adding each new variable.
Multiple Regression
Coke was not used because it has a low correlation with S&P and is therefore not a good predictor of the S&P All others (Viacom, AT&T, GM) were used because they had a relatively high correction with S&P and low corrections Regressions are Predictive, not Forecasting. All new independent variables must be chosen from within the range of t
SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations ANOVA df Regression 3 0.987732311 0.975615119 0.939037796 0.821001266 6
The high coefficient of Viacom indicates that it is The standard error of regression is used to determ 95% confidence interval = Predicted S&P Value +/ MS (Model Significance) shows high ratio of expl F Ratio = Explained variance (17.9) / Unexplained SS MS F 53.9356009 17.97853363 26.67267746
Residual Total
0.674043079
Low signifiance of the F statistic - indicates that, overall, the regession output is statistically significant (valid), at leas
p-Values for each variable - The lower the p-Value, the better predictor the variable was. Viacom returns are a good predictor of the S&P AT&T and GM returns are much less effective predictors of the S&P return (higher p-Values) - These would not be vali The small coefficients of these two company returns also indicate that they are lesss valid predictors. Adding new independent variables to a regression equation always increases R Square.
Adjusted R Square is increased only when newly added independent variable increase predictability of the dependent
Coke
dictor of the S&P th each other, only one of them should use none of the variables above have as inputs to a Multiple Regression Regressions should be built up by adding one dependent variables noticeably raise R-Square Poor new independent variables don't change ressions up one variable at a time and
er investments
redictor of the S&P h S&P and low corrections with each other n from within the range of the previously sampled independent variable,
ates that 94% of the variance of the S&P return is explained by the model - This is good.
Viacom indicates that it is the biggest predictor of the S&P. It's high correlation indicates this as well. egression is used to determine confidence intervals. al = Predicted S&P Value +/- z(95%) * (Standard Error) e) shows high ratio of explained (regression) over unexplained (residual) variance. Low p value (Significance of F) shows regressi riance (17.9) / Unexplained variance (0.67) = 26.6 - This is high and is good. A low P value shows that this is significant. Significance F 0.036353424
nce. Low p value (Significance of F) shows regression model is statistically significant w P value shows that this is significant.
Testing to determine if a change has occurred, for example, after an ad co using the Confidence Interval
BEFORE Average Daily DEALER A B C D E F G H I J K L M N O P Q R S T U V W X Y Z A1 B1 C1 D1 Sales 100 130 120 140 155 200 300 260 190 185 100 130 120 140 155 200 300 260 190 185 100 130 120 140 155 200 300 260 190 185 AFTER Average Daily Sales 110 135 122 157 160 206 309 283 202 192 110 135 122 157 160 206 309 283 202 192 110 135 122 157 160 206 309 283 202 192 = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
Sampling the same thing before and after to determine if somethin Trying to determine if the "after" samples are statistically different 30 Samples should always be taken, unless population is known t (Here only 6 samples are taken for brevity)
In this case, we want to determine with 95% certainty whether or n a change from before to after. Null hypothesis is 0 and =
Here, because is less than both P values, we cannot reject the N in either case. The null Hypothesis states that there is no change
Problem:
A tire manufacturer wants to determine if a new rubber formulation will improve tire wear. 12 sets of tires were created with the old rubber formula and 12 sets of news with the new rubber formulation. They were placed on the following cars and driven until they wore out. Determine at a 0.05 level of significance whether the new rubber produces longer tread life.
Car 1 2 3 4 5 6
Tire Location Front Rear Front Rear Front Rear Front Rear Front Rear Front Rear
Old Rubber 37661 42342 31108 41239 32903 42658 29829 39616 34625 42650 31923 39990
New Rubber 31902 41203 38816 43305 35375 52353 30883 49424 38724 43234 34565 43861
The NULL Hypothesis here is that the mean tread wear of the old rubber equals the mean tread wear of the The p-Value for both one-tailed test and two-0tailed test is less than the level of significance (0.05) so the N is rejected - Therefore, we have a 95% certainty that the new rubber compund increases tread wear.
Problem:
Evaluate the returns of these two stocks to determine if there is a real difference. Use a 0.05
Viacom 0.7541 14.9701 11.9792 7.907 -5.1724 3.4091 0.7541 14.9701 11.9792 7.907 -5.1724 3.4091 0.7541 14.9701 11.9792 7.907 -5.1724 3.4091 3.4091
GM -4.6296 18.986 -1.7226 -0.5535 6.679 1.8261 -4.6296 18.986 -1.7226 -0.5535 6.679 1.8261 -4.6296 18.986 -1.7226 -0.5535 6.679 1.8261 1.8261
p-Values for both one and two tailed tests are greater than th so it can be stated with 95% certainty that there is a differenc
The NULL Hypothesis that the means of both returns are equ
0.7541 14.9701 11.9792 7.907 -5.1724 3.4091 0.7541 14.9701 11.9792 7.907 -5.1724
-4.6296 18.986 -1.7226 -0.5535 6.679 1.8261 -4.6296 18.986 -1.7226 -0.5535 6.679
Problem:
A company is testing light bulbs from 2 suppliers. Below is listed the hours of usage before e Determine using a 0.05 level of significance whether the new supplier's light bulbs really last old supplier's.
The one-tailed p-value (one-tailed because we are only testin the stated level of significance (0.05) so we cannot reject the the means light bulb life for both suppliers is the same.
e If Change Occurred
Difference 10 5 2 17 5 6 9 23 12 7 10 5 2 17 5 6 9 23 12 7 10 5 2 17 5 6 9 23 12 7
In this case, we want to determine with 95% certainty whether an advertising campa to our large dealer network. To determine this, we must take Before and After sampl The keys to success of this sampling are the following:
1) At least 30 dealers must be sampled. 2) Before and After samples must be taken from the same dealers 3) The samples must be AVERAGE sales, for example, average daily sales over a we 4) The dealer's sampled must be random and representative of the overall populatio
We are trying to determine whether the Mean Difference falls inside or outside the 95 If the Mean Difference falls within this 95% Confidence Interval, We say that there is If the Mean Difference Falls outside this Confidence Interval, there is a 95% chance t
We can state with 95% certainly that there has been no significant change if the Ave the 95% Confidence Interval of this mean being 0. To determine the 95% Confidence Sample size (COUNT) = Sample Standard Deviation (STDEV) = Sample Standard Error = Sample Mean (AVERAGE) = (1 - Confidence Interval) = 30 6.11 1.11 9.60 0.05
The 95% confidence interval will contain 95% of the area under the Normal curve. The rem The Z Score represents the right outer edge of the confidence interval. Total area under th a 95% two-tailed confidence interval is 97.5%. The z Score for this is 1.96. This means tha is to the left of 1.96 Standard deviations to the right of the mean.
1.96
NORMSINV(0.975)
The 95% Confidence Interval around a Sample Mean of 0 = 0 +/- (Z Score for 95% CI) 0 +/- (1.96) x (1.11) The 95% Confidence Interval for the Mean = 0 is from -2.18 to +2.18
If the Sample Mean (9.60) is outside of the 95% Confidence Interval for the Mean Differenc We can say with 95% certainty that Average Daily sales throughout the entire population of has increased.
This is the case because Mean of 9.60 is outside of the confidence interval of -2.18 to +2.1
We can now state with 95% certainty that the advertising campaign has caused a change i
d after to determine if something has changed mples are statistically different than the "before"sample n, unless population is known to be normally distributed
with 95% certainty whether or not there has been l hypothesis is 0 and = 0.05 (1 - 0.95)
P values, we cannot reject the Null Hypothesis states that there is no change in the mean.
on will improve tire wear. sets of news with the new driven until they wore out. produces longer tread life.
s the mean tread wear of the new rubber. f significance (0.05) so the NULL Hypothesis ncreases tread wear.
iled tests are greater than the stated level of significance (0.05) ainty that there is a difference in the returns of these companies.
d the hours of usage before each sample burned out. pplier's light bulbs really last longer than the
d because we are only testing if one is better) is very close to .05) so we cannot reject the NULL Hypothesis, which states that suppliers is the same.
whether an advertising campaign increased average daily sales st take Before and After samples of average daily sales at least 30 dealers.
average daily sales over a week or a month. It cannot just be one sample of one day's sales tative of the overall population.
ce falls inside or outside the 95% Confidence Interval that the Mean Difference is 0. Interval, We say that there is a 95% that the Mean Difference is 0 and No change occurred. terval, there is a 95% chance that average daily sales for the whole network has changed.
o significant change if the Average (Mean) Difference is within determine the 95% Confidence Interval for a 0 Mean, we need the following information:
ple Standard Error = (Sample Standard Deviation) / ( Square Root of Sample Size)
nder the Normal curve. The remaining 5% () will be split between each outer tail on the Normal curve. nce interval. Total area under the Normal curve to the left of this Z value for e for this is 1.96. This means that 97.5% of the total area under the Normal curve
e Interval for the Mean Difference being 0, roughout the entire population of dealers
ampaign has caused a change in the daily sales of the dealer network.
ANOVA is a technique for testing the equality of different population means. ANOVA is very useful because it can be extened to any number of populations. All ANOVA test the NULL Hypothesis - that is - all samples drawn have the sam
ANOVA is often used by markets to tests whether different marketing campaigns with multiple varying elements actua
The NULL Hypothesis is rejected - that is - there are real differences between the means - if the p-Value pertaining to t item being evaluated is less than the desired level of significance. For example, in the 1st ANOVA below, the p-Value petaining to "Between Methods (Groups) is less than the desired lever of significance - So there is a difference betwee
Students Problem: 3 different sale training methods are used. Three groups of four randomly chosen new saleppeople are chosen. Each group is trained using one of the methods. After the course is completed, sales totals of each salesperson over the next two weeks is collected. Determine within a 0.05 level of significance whether there is a difference in the effectiveness of the courses. 1 2 3 4
Count 4 4 4
Sum 68 80 92
The p-Value for Methods (Between Groups, which are the Methods) (0.011419201) is much less than the level of signi so there is a difference between the effectiveness of the teaching methods..
The p-Value calculated by Excel agrees with the hand-calculated p-Value, which is less than the level of significance. T
Problem:
Here are 3 different types of typing keyboards. 5 Typists each got to use all three keyboards. Here are the typing speeds of each typist on of of the 3 keyboard types. Determine at a 0.01 level of significance (99% certainty) whether typing speed differs between the 3 keyboard type.
In this example, the two factors that influence the speed of typing are 1) the keyboard, and 2) the typing ability of each
Anova: Two-Factor Without Replication SUMMARY Typist 1 Typist 2 Typist 3 Typist 4 Typist 5 Keyboard A Keyboard B Keyboard C Count 3 3 3 3 3 5 5 5 Sum 180 338 141 303 216 375 379 424
ANOVA Source of Variation Rows Columns Error Total SS 9151.066667 296.1333333 94.53333333 9541.733333 df 4 2 8 14
The p-Value for the Rows (5.42004E-08) is much less than the level of significance (0.05) so there is a difference betwe
The p-Value for columns (0.003428581) is much less than the level of significance (0.05) so there is a difference betwe
Two factors are being evaluated and the tests are performed more than once (in this case, each test is performed in tw
Problem
A Perfume company was testing a product using 3 different advertising focuses (Sophisticated, Athletic, Popular), Design 1 3 different package Designs, and testing 2 separate markets. Using a 0.05 level of significance, Design 2 determine 1) Advertising Focus, 2) Package Design, or 3) the Interaction between them had any affect Design 3 on sales. The chart shows the sales with each combination in each of the two markets.
2 4 9 17
The p-Value for Sample (0.076062) is more than the level of significance (0.05). We cannot reject the NULL Hypothesis
The p-value for Columns (0.00037339) is less than the level of significance (0.05). This indicates that that overall adver The p-Value for Interaction (0.022409) is less than the level of significance. This indicates that different combinations
Column Total Column Mean Grand Mean = (17 + 20 + 23) / 3 Grand Mean = Column Mean - Grand Mean (Column Mean - Grand Mean)^2 # Rows x [ (Column Mean - Grand Mean)^2 ]
Method 1 16 21 18 13 68 17
20 -3 9 36
72
Degrees of Freedom Between Groups DOF = # groups - 1 = c - 1 = 3 - 1 = Within Groups DOF = C(r-1) = 3 (4 - 1) = Total Degrees of Freedom = 2 9 11
Sum of Squares Between Groups Sum of the Squares Sum of Squares Within Groups Total Sum of the Squares 72 46 118
F Statistic F Statistic = (MS Between Group) / (MS Within Groups) F Statistic = 36 / 5.111111 =
7.043478261
p Value p-Value = FDIST(F Statistic,DOF Between Groups,DOF Within Groups) = p-Value = FDIST(7.043478,2,9) = 0.014419203
The p-value of 0.014419 is less than the designated level of significance of 0.05. This indicates if there was no difference in effectiveness between the courses. Therefore, there is at least 95%
he p-Value pertaining to that NOVA below, the p-Value here is a difference between the groups.
by Excel
his worksheet
Method 1 16 21 18 13
Average
F crit 4.256494729
Keyboard A 51 109 47 98 70
Keyboard B 57 112 43 98 69
Variance 60 117 112.6666667 16.3333333 47 16 101 27 72 19 75 75.8 84.8 767.5 819.7 724.2
Average
Popular
Total
MS 0.403605556
F crit 4.256494729
4.256494729 3.633088512
ject the NULL Hypothesis that states that the package does not affect sales.
by Hand
Method 2 19 20 21 20 80 20
Method 3 24 21 22 25 92 23
0 0 0
3 9 36
MS 36 5.111111111
The p-Value represents the proportion of area under the F Distribution curve to the right of the given F value. If this p-Value is less than the stated level of significance, this demonstrates that there is a difference in the objects or process being analyzed. - in other words, there is a difference in the variances.
nce of 0.05. This indicates that there is less than a 5% chance that this result could have occurred refore, there is at least 95% certainty that there is a real difference in effectiveness of the courses.
Method 1 16 21 18 13 68 17
Method 2 Method 3 19 24 20 21 21 22 20 25 80 92 20 23
Method 1 16 - 17 21-17 18 - 17 13 - 17
Method 2 Method 3 19 - 20 24 - 23 20 - 20 21 - 23 21 - 20 22 - 23 20 - 20 25 - 23
Method 1 -1 4 1 -4
Method 1 1 16 1 16 34 46
Method 2 Method 3 1 1 0 4 1 1 0 4 2 10
The Chi Square Distribution is used to determine if a population's variance has been changed. The Chi Squre Distribution is sk curve occuring at the point on the x axis that equals the number of degrees of freedom (n-1 --> Sample Size - 1). The total area The area under the curve to the left or right of outer limits determines wihether it can be said with a certain degree of confidenc If the area outside the Chi Square Statistic (the p value) is less than the desired level of significance, then the population varian
If Sample Standard Deviation, s, is greater than Population Standard Deviation, , then the Chi Squared Statistic will be to the r and the p value produced by CHIDIST(ChiSquare Statistic, degrees of freedom) will be the p value of the right tail.
If Sample Standard Deviation, s, is less than Population Standard Deviation, , then the Chi Squared Statistic will be to the left and the p value produced by CHIDIST(ChiSquare Statistic, degrees of freedom) will still be the area under the Chi Square curv To get the area under the left tail (are to the left of the Chi Square point), the p-value = 1 - CHIDIST(Chi Square Statistic, degre
A manufacturer wants to check if the variance on a process has changed. A machine drills a hole as part o The standard deviation of the hole diameter has historically been 1.6 ml. A random sample of 50 hole diameters were checked in one batch. The measured sample standard deviatio At an 0.05 level of significance, has the population standard deviation increased above 1.6 ml? Givens: n= Degrees of Freedom= n-1 Level of Significance, , = Population Standard Deviation, , = Sample Standard Deviation, s, =
Use the Chi Squared Test to determine if there has been a change in variance. 1) Calculate Chi Square Statistic, = [ (n-1)*(s*s) ] / (*) = 2) Obtain p value from Chi Square Statistic Upper p value = CHIDIST(69.09766,49) = 0.030749 69.09766
This p value states the portion of total area under the Chi Square distribution curve for 49 degree of freedom to the The Chi Square Statistic is caluculated from sample size (n - 1), population standard deviation, and sample standar If the p value ( the area under the Chi Square distribution curve to the right of the Chi Square Statistic on that curve) greater than the level of significance value we are evaluating ( = 0.05 on a one-tailed test), then we accept the NUL
In the case the p value (0.030749) is less than the desired level of significance ( = 0.05), and we reject the N It appears that the population variance has increased above 1.6 ml.
A manufacturer wants to check if the variance on a process has changed. A machine drills a hole as part o The standard deviation of the hole diameter has historically been 1.6 ml. The engineers believe that they ha A random sample of 50 hole diameters were checked in one batch. The measured sample standard deviatio At an 0.05 level of significance, has the population standard deviation decreased 1.6 ml? Givens: n= Degrees of Freedom= n-1 Level of Significance, , = Population Standard Deviation, , = Sample Standard Deviation, s, =
Use the Chi Squared Test to determine if there has been a change in variance. 1) Calculate Chi Square Statistic, = [ (n-1)*(s*s) ] / (*) = 2) Obtain p value from Chi Square Statistic Area under curve to right = CHIDIST(69.09766,49) = p value = Area to the left of Chi Square point = 1 - CHIDIST () = 0.912951 0.087049 36.18774
This p value states the portion of total area under the Chi Square distribution curve for 49 degree of freedom to the The Chi Square Statistic is calculated from sample size (n - 1), population standard deviation, and sample standard If the p value ( the area under the Chi Square distribution curve to the right of the Chi Square Statistic on that curve) greater than the level of significance value we are evaluating ( = 0.05 on a one-tailed test), then we accept the NUL
In the case the p value (0.087049) is greater than the desired level of significance ( = 0.05), and we do not r It appears that the population variance has not decreased below 1.6 ml.
The Chi Squre Distribution is skewed with the high point of the Sample Size - 1). The total area under the Chi Squared curve is 1.0. ith a certain degree of confidence that the population variance has changed. ance, then the population variance has changed.
Squared Statistic will be to the right (greater than) the degree of freedom point alue of the right tail.
quared Statistic will be to the left (less than) the degree of freedom point area under the Chi Square curve to the right of the Chi Square Statistic point.. DIST(Chi Square Statistic, degrees of freedom)
ured sample standard deviation was 1.9 ml. sed above 1.6 ml?
for 49 degree of freedom to the left of the Chi Square Statistic d deviation, and sample standard deviation. hi Square Statistic on that curve) is ed test), then we accept the NULL Hypothesis.
machine drills a hole as part of the manufacturing process. engineers believe that they have improved the process. ured sample standard deviation was 1.35 ml.
for 49 degree of freedom to the left of the Chi Square Statistic deviation, and sample standard deviation. hi Square Statistic on that curve) is ed test), then we accept the NULL Hypothesis.
nce ( = 0.05), and we do not reject the NULL Hypothesis that there has been no change.
Normal Distribution
Any Normal distribution can be identified by two variables - the mean and standard deviation The area under the entire density function = 1.
The Normal distribution is a continuous distribution, as oppoed to a discrete distribution such as the binomial distribu
Most problems involving the Normal distribution fall into two categories: 1) Determining the probability of a normally distributed random variable having a value within a given interval
2) Determining a Confidence Interval - that is - Determining an interval within which the value of a normally distribute
To be able to apply the Normal distribution, It is extremely important that the underlying population can be
For any population, whether Normally distributed or not, the distribution of x bar (th Normally distributed if sample size is large (30 or more). This a basic tenant of the Central Limit Theorem - Statistics' most fundamental rule.
It is important to note that the problems on this page do not deal with samples. These problems only use parameters
z = number of standard deviations that a points lies from the mean Population Mean = = "mu" Population Standard Deviation = = "sigma" z=(x-)/ = ( x - mean ) / ( Length of 1 Standard Deviation )
The z distribution, sometimes called the standard normal distribution, is a normal distirbution with the mean, , = 0 and the stan
Population parameters are generally described with Greek letters, such as (population mean) and (population standard dev while Sample parameters are genearlly described with Roman letters, such as x bar (sample mean) and s (sample standard de
Statistical Function NORMSDIST(z) tells what percentage of total area of standardized normal curve (mean = 0 and standard d is to the left of a point z standard deviations from the mean, which is 0. NORMSDIST(0) = NORMSDIST(1.96) = 0.5 0.975
This means that half of the area under the standardized normal curve exists t
This means that 97.5% of the total area under that staandardized normal curv This point of z = 1.96 is often used to calculate the 95% Confidence interval. T standard deviations to the left of the mena and extends to 1.96 standard devia 95% of the total area under the bell shaped Normal curve.
Statistical Function NORMSINV() tells how many standard deviations a point on a normal curve is to the left of the mean that th will equal the percentage given as the argument for the function.
NORMSINV(0.0975) =
1.96
This means that 97.5% of the total area under the normal curve is to the left o
Statisical Function NORMDIST(x, mean, standard dev, TRUE) will calculate the area under the curve to the left of point x on a The TRUE stated to provide Cumulative area - This is nearly always TRUE) NORMDIST(1.96,0,1,TRUE) = 0.975
Problem: A store has normally distributed daily sales. The average daily sales = $2,000 and the daily sales standard de What is the probability that the sales of one random day will be below $1,000? Population Mean = = "mu" = $2,000 Population Standard Deviation = = "sigma" = = $500 x = $1,000 NORMDIST(1000,2000,500,TRUE) = 0.02275 2.28% This can be interpreted by saying the only 2.28% of the total area
Problem: A brand of car has a mean fuel consumption of 27 mpg with a standard deviation of 5 mpg. What percentage of the cars can be expected to have a fuel consumption of between 25 mpg and 30 mpg? Fuel consumption is normally distributed for this population. Percentage of cars with fuel efficiency between 25 mpg and 30 mpg = Percentage of cars with fuel efficiency less than 30% - Percentage of cars with fuel efficiency less than 25% = NORMDIST(30,27,5,TRUE) - NORMDIST(25,27,5,TRUE) = 0.725747 0.344578 =
For the regular Normal curve, x = + z The standardized Normal curve has = 0 and = 1.
Statistical Function NORMSINV() tells how many standard deviations a point on a normal curve is to the left of the mean that th will equal the percentage given as the argument for the function. NORMINV(0.975,0,1) = 1.96
This means that 97.5% of the total area under the normal curve is to the left o
Problem: A company's package delivery time is normally distributed with a mean of 10 hours and a standard deviation What delivery time will be beaten by only 2.5% of all deliveries? = 10 =3 NORMINV(0.025,10,3) = 4.12
Meaning that only 2.5% of all package delivery times will be quicke
Problem: A tire company makes a tire with a normally distributed tread life that has a mean of 39,000 miles and standa What tread life would be exceeded by 98% of all tires? = 39,000 = 5,000 NORMINV(0.02,39000,5300) = 28115
Meaning that only 2% of all tires will wear out before 28,115 miles.
Problem: A tire company makes a tire with a normally distributed tread life that has a mean of 39,000 miles and standa What would the range of tread life be that 95% of all tires would wear out in? = 39,000 = 5,000 Calculation of the left boundary: NORMINV(0.025,39000,5300) = 28612
Meaning that only 2.5% of all tires will wear out before 28,115 mile
Meaning that only 2.5% of all tires will wear out after 49,388 miles.
So, 95% of tires will wear out in the range of 28,612 miles to 49,388 miles.
e value of a normally distributed random variable will fall with a given probability
nderlying population can be proven to be normally distributed. This is often not the case.
andardized normal curve exists to the left of z when z = 0 (z is exactly on top of the mean, that is, 0 standard deviations away from the mean
er that staandardized normal curve is to the left of the z when z is 1.96 standard deviations from the mean. ate the 95% Confidence interval. That is, the section under the normal curve that starts a 1.96 nd extends to 1.96 standard deviations to the right of the normal curve will contain Normal curve.
e is to the left of the mean that the stated total area under the normal curve
er the normal curve is to the left of the point 1.96 standard deviations from the mean
e curve to the left of point x on a normal curve with the given mean and standard deviation.
g the only 2.28% of the total area under this particular Normal curve falls to the left of x = 1,000
e is to the left of the mean that the stated total area under the normal curve
er the normal curve is to the left of the point 1.96 standard deviations from the mean
Confidence Intervals
Collection of 40 individual test scores
210 340 490 610
Calculate with 95% certainty an interval in which the population me based upon a random sample of 40 test scores taken from that pop
In other words, calculate a 95% Confidence Interval for the population mean
Sample size (COUNT) = Sample Standard Deviation (STDEV) = (1 - Confidence Interval) = Mean (AVERAGE) =
Sample size must be at least 30 and must be random and representative of the populatio
Excel calculates the Confidence Interval to be 49.42 using the following statistical function: CONFIDENCE (alpha, s Input for this function are CONFIDENCE(0.05,159.48,40) =
Let's see how Excel's calculation holds up to the correct, manual calculation of Confidence Interval calculated from (Excel hits it just about right on)
The 95% Confidence Interval around a Sample Mean of 0 = 0 +/- (Z Score for 95% Confidence Interval) * (Samp Z Score for 95% Confidence Interval (two sided) = Z(0.975) = 1.96 Sample Standard Error = (Sample Standard Deviation) / ( Square Root of Sample Size) Sample Standard Error = (159.48) / (Square Root [40] ) = 25.21 Confidence Interval = Sample Mean +/- Z Score(95% Confidence Interval) *(Sample Standard Error) Confidence Interval = 473.5 +/- (1.96) x (25.21) = 473.5 +/-
49.41
This means that there is a 95% chance that the mean of the entire popultation is between the endpoints of this 95% Confidence Interval
Statistically this is written as: Confidence Interval = Sample Mean +/- Z/2 * (Sample Standard Deviation / Square root of Sample Size)
Determining Sample Size (n) for a Given Confidence Level and Bound (B)
n = number of sample needed to establish a specified confidence interval of of width B on either side of mean e.g. How many samples must be taken to estimate the population diameter (of, for example, holes drilled by a machine) to within 0.05 mm. of the mean sample diameter with 99% confidence. Standard deviation (determined from previous sampling) is 0.75 mm ?. n = [ (Z score of two-tailed 99% confidence)**2 x (sample standard deviation)**2 ] / [Interval**2] n = [ (2.575)**2 x (0.75)**2 ] / [ (0.05)**2 ] = 1,492 NORMSINV(0.995)=
Problem: A restaurant owner wants to estimate within $2.00 the average amount that customers spend during lunch. For experience, the standard deviation of the population is $5.00. How many samples need to be taken to get a sample that is 92% certain of being within $2.00 of the population mean Z score of two-tailed 92% confidence = NORMSINV(0.96) = Population Standard Deviation = 5.00 Interval = 2.00 n = [ (Z score of two-tailed 92% confidence)**2 x (sample standard deviation)**2 ] / [Interval**2]
n interval in which the population mean must fall of 40 test scores taken from that population.
49.42
n of 0 = 0 +/- (Z Score for 95% Confidence Interval) * (Sample Standard Error) 1.96 Insert / NORMSINV(0.975) Function
Z(0.975) = 1.96
40] ) = 25.21
49.41
32 to 223.16
dence Interval
area in each tail.
1.96
0.975
dence Interval
1.64
0.95
2.576
0 the average amount that customers spend during lunch. $5.00. How many samples need to be taken to get a sample average mean expenditure during lunch?
1.751
19
Samples
Although 30 samples shold be the minimum taken unless you know for certain that the underlying population is normally distributed.
Inveral, = 0.05)
ORMSINV(0.975)
ulation of 49.41)
Binomial Distribution
Binomial distributions are are collections of discrete values as opposed to, for example, the Normal distribution, whic
Any binomial distribution can be identified the value of two of its variables - the number of trials (n) and the probabilit
In this case, generate 5 random numbers, Each with possible outcomes of 2 or 3. Each event has a 20% probability of a "2" ou (You could easily do the same thing with outputs of 1 and 0 - measuring something occuring or not occurring) 3 3 3 2 2 Number of variable = 1 (The value of the 1 variable is 1 or 0) Number of random variables = 5 Distribution type is Discrete Value in input range - the Yellow highlighted Ouput range - Highlight the tan range Outcome Probability 2 0.2 3 0.8
Sum of 2's = 2 Statistical function COUNTIF - Select the range of outputs to be c The sum is the number of successes in 5 random trials, each having a 0.20 chance of a "2" outcome.
Calculating the probability of a certain number of a given outcome to occur in a certain number of trials if the probability of that outcome on a single trial is known.
Problem: What is the probability of 3 successful outcomes in 5 trials if the probability of a succes
s = number of successes = n = number of trials = p = probability of successful outcome = on 1 trial Find Cumulative distribution (NO) - Use 0 3 5 0.2
Probability of this is = 0.0512 Statistical Function / BINOMDIST (in this case, you don't want cumulative distribution - Use 0 as that last argument)
5.12%
Problem - In 12 trials (n = 12), what is the probability that at least 10 of them (Sum of the probabilities that s = 10, s = 11, and s will have the 1 of the 2 possible outcomes that has a probability of occuring of 65%? The probabilities of each outcome need to be added up. 10 11 12 0.65 12 0 0.108846 0.036753 0.005688009 0.151288 This represents a combined probability of Statistical function BINOMDIST(s,p,n,FALSE) BINOMDIST(10,12,0.65,0) + BINOMDIST(11,12,0.65,0) + BINOMDIST(12,12,0.65,0)
Equals
BINOMDIST(3,10,0.5,1)
Problem - If 10% of products require servicing, what is probability that less than 15
The problem actually asks what is the probability that up to 14 products will need servicing. Therefore, you are solving for the cumulative probability that up to 14 products need servicing s = 14 p = 0.10 n = 200 TRUE = 1 BINOMDIST(14,200,0.10,1) = 0.092946 9.29%
, the Normal distribution, which is continuous. of trials (n) and the probability of success on a single trial (p)
has a 20% probability of a "2" outcome and an 80% of a "3" outcome. . r not occurring)
This = p - This is the probability that the outcome of the event will be "1" and not "0" This = q - This is the probabability that the outcome of the event will be "0" and not "1"
elect the range of outputs to be counted and then select the cell that has the output to be counted, (Where outcome = 2) 0.20 chance of a "2" outcome.
ome to occur
15.13%
0.65625 65.63%
outcome = 2)
Population Proportions
When sample of size n is used to estimate a population proportion, e.g. a proportion of a population who would vote f it can be analyzed using the binomial distribution The population proportion of success will be the same as p, the probability of success of a single trial. The following relationships hold true for population proportions: The mean of sample proportions = = p The standard deviation of sample proportions = = SQRT { [ p (1 - p) ] / n } The confidence interval of a population proportion would be = z = p zSQRT { [ p (1 - p) ] / n }
Problem: A random sample of 350 people was chosen and each person was asked if they recognized a particular bran 112 people recognized the brand. Calculate a 95% confidence interval of the proportion of the total population who recognize the brand. Givens: n= p= 112 / 350 = Confidence level
350 0.32
0.95 - This means that 2.5% of area under Normal curve exists in each tail above and belo
z = NORMSINV(0.975) =
1.96
- 97.5% of the total area under the normal curve is to the left of a point 1.96 standard
The confidence interval = z = p zSQRT { [ p (1 - p) ] / n } = The confidence interval = The confidence interval = 0.32 0.27113 to 0.04887 0.36887
Which means that there is a 95% chance that be are aware of the brand.
The minimum number of sample needed, n, to obtian a confidence interval of a certain width, e (or given sample error
It is better to use the binomial distribution to calculate the p value when dealing wit
The p value is the area under the Normal curve outside of x - NOT the probability of a successful trial)
Problem: A manufacturer of circuit boards wants to keep the proportion of defective boards at 0.098. The manufactur tested 156 randomly chosen boards and found 20 to be defective. Determine with a 95% certainty (0.05 level of significance) the defective proportion has not increased above 0 n= p= x= 156 0.098 19
1 - the probability that19 or less are defective = 1 - Cumulative probability of 19 defective = 1 - BINOMDIST(19,256,0.0 10.870142 = 0.129858
This p-value of 0.129858 is greater than (0.05 - the level of significance - the proportion of area under the Normal curve to the We therefore conclude that the large x value could have happened by chance and we fail to reject the NULL Hypothesis.
To determine whether a known population has changed, take a sample of the population and use the binomial distribu calculate the probability of that sampling event (the number of successes, x, per given sample size,n, given p - the pre and compare that probabiilty to the desired level of significance. If this probability is less than the level of significance you have established ( for a one-tailed test and /2 for a two-ta then the NULL Hypothesis is rejected.
here is a 95% chance that between 27.1% and 36.9% of the total population
cessful trial)
= 1 - BINOMDIST(19,256,0.098,1)
and use the binomial distribution to mple size,n, given p - the previously know probability of success in a single trial)
Females
14,974 15,580 16,285 17,000 17,593 17,957 17,492 18,266 19,456 19,591 20,093 20,455 20,689 21,608 21,758 22,134 22,734 23,351 24,043 25,003 25,642 26,770 27,954 28,810 29,580 30,148 31,491 32,972 34,214 35,399 37,323 38,959 40,747 41,866 42,952 44,255 44,994 46,740 47,852 49,085 50,436 51,996
2nd - In the 2nd step of creating the chart, click the Series t
Descriptive Statistics - Tools / Data Analysis / Descriptive S Males Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum Sum Count
1990 1991 1992 1993 1994 1995 1996 1997 1998 1999
64,805 65,149 65,767 66,329 66,788 67,516 67,434 68,884 69,547 70,295
52,925 53,328 54,356 54,982 56,322 56,871 57,503 58,788 59,583 60,718
# of points Statistical Function COUNT n n-1 Sum Arithmetic Function SUM Mean Statistical Function AVERAGE
6 5 708
118
39117.2
197.7807
Descriptive Statistics
Median Value Owner Occupied Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum Sum Count
Histogram
Bin Range Requested By Histogram (in Yellow) Interval 1 2 3 4 5 6 7 8
Frequency
New Hampshire New Jersey New Mexico New York North Carolina North Dakota Ohio Oklahoma Oregon Pennsylvania Rhode Island South Carolina South Dakota Tennessee Texas Utah Vermont Virginia Washington West Virginia Wisconsin Wyoming
$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $
129,400 162,300 70,100 131,600 65,800 50,800 63,500 48,100 67,100 69,700 133,500 61,100 45,200 58,400 59,600 68,900 95,500 91,000 93,400 47,900 62,500 61,600
Histogram - Tools / Data Analysis / Histogram 45000 70000 95000 120000 145000 170000 195000 220000 More
Sorted Data
Gross Domestic Product Per Capita using Purchasing Power Parity 1991 Country Turkey Greece Portugal Ireland Spain New Zealand
France Germany Greece Iceland Ireland Italy Japan Luxembourg Netherlands New Zealand Norway Portugal Spain Sweden Switzerland Turkey United Kingdom United States
$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $
18,227 19,500 7,775 17,237 11,507 16,896 19,107 21,372 16,530 13,883 16,904 9,191 12,719 16,729 21,747 3,491 15,720 22,204
United Kingdom Finland Australia Netherlands Sweden Italy Norway Iceland Austria Belgium Denmark France Japan Canada Germany Luxembourg Switzerland United States
ght the Males and Females column of data to create the chart. Do not highlight the year column.
e 2nd step of creating the chart, click the Series tab and highlight the Year column as the x-axis.
e Statistics - Tools / Data Analysis / Descriptive Statistics Females 52371.30769 1362.939367 50125.5 #N/A 9828.295549 96595393.39 -1.329434402 0.41216714 29676 40619 70295 2723308 52 Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum Sum Count 34646.6 2051.745 30819.5 #N/A 14795.34 2.19E+08 -1.36515 0.341443 45744 14974 60718 1801623 52
and Variance
x - x bar -98 -88 -76 -78 -63 403 (x - x bar)2 9604 7744 5776 6084 3969 162409 195586 5
n-1 =
Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum Sum Count
ptive Statistics
84209.80392 6018.541452 68900 #N/A 42980.98303 1847364902 3.556208617 1.84961606 200100 45200 245300 4294700 51
e Requested By Histogram (in Yellow) More than .. 45000 70000 95000 120000 145000 170000 195000 220000
But not more than.. 70000 95000 120000 145000 170000 195000 220000 245000
Median Income
Frequency
The data needs to be copied here and then sorted Data / Sort Domestic Product Per Capita urchasing Power Parity 1991 per capita GDP (dollars) $ 3,491 $ 7,775 $ 9,191 $ 11,507 $ 12,719 $ 13,883
Histogram
Bin 3491 8169.25 12847.5 17525.75 More Frequency 1 1 3 11 8
$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $
15,720 15,997 16,085 16,530 16,729 16,896 16,904 17,237 17,280 17,454 17,621 18,227 19,107 19,178 19,500 21,372 21,747 22,204
Males Females
Histogram
Frequency 12 10 8 6 4 2 0
Frequency
The first hypothesis, the NULL Hypothesis, is usually stated in terms such as "There has been no change in the popul This will normally involve an equal sign.
The second hypothesis, the Alternative Hypothesis, states that the population mean has changed in one of three ways 1) The population mean has changed (increased OR decreased) - This involves a two-tailed test 2) The population mean has decreased - This involves a one-tailed test with the left tail 3) The population mean has increased - This involves a one-tailed test with the right tail. In summary, hypothesis testing involves: 1) Determining the NULL hypothesis, determining the level of certainty to which that NULL Hypothesis
1) Determining the NULL hypothesis. This is normally that the original population mean has not changed. 2) Determining the level of certainty to which that NULL Hypothesis will be tested. If you want to establish a 95% certainty level, 3) Take a sample of the population. 4) Calculate the sample mean. This value will be called x. 5) Graph this sample mean on the normal curve created from the original population mean 6) The NULL Hypothesis is accepted or rejected based upon the results of either of the following tests (which are both equivale
6a) The critical value test - The level of certainty, , is converted to a "critical value." This "critical value" is the number of stand the level of certianty is from the mean. For example, on a two-tailed test, an of 0.05 translates to a 95% level of certainty. On a two-tailed test, this would result in 2.5% of the total area under the Normal curve to be greater than the right critical va and 2.5% of the area under the Normal curve to be less than the left critical value. Each critical value is 1.96 standard devia from the mean on the normal curve - NORMSINV(0.975) = 1.96 The z value of the sample mean is calculated. The z-value is the number of standard deviations that the sample mean is fro on a Normal curve derived from the population mean. If the z-value of the sample is farther away from the mean than the critical value (the z value of that level of certainty), then
6b) The p-value test - This is equivalent to the above test A Normal curve is constructed based upon the population mean. The is the significance level. The significance level represents that percentage of the area under the normal curve that is For example, on a two-tailed test with a 95% required level of certainty, = 0.05. The test is two-tailed so 2.5% of the total a and 2.5% of the area under the normal curve will be below the 95% confidence area. The p value is equal to the percentage of area under the normal curve that is outside of x on the normal curve. If the p value is less than the the percentage of the area under the normal curve corresponding to , the NULL Hypothesis i
Problem: A manufacturer claims that the average thickness of metal sheets is 15 mls. And that the population standar 50 sheets are sample having a sample mean of 14.982 mls. At the 0.05 significance level (95% confidence leve
the manufacturer's claim that the average thickness of 15 mls. is correct. Givens: n= = = x= =
The ALTERNATE Hypothesis is that 15 mls. (Since we are testing whether a difference exists in either direction, this is a tw 1) Calculate Sample Standard Error 2) Calculate z value for sample Sample Standard Error = / SQRT(n) = Z value = (x - ) / (Sample Standard Error)= 0.014142 -1.27279
3) Calculate p value - the area under the Normal curve outside the sample z value. NORMSDIST(1.272792) = This states that 10.154% of the total area under the Normal curve is lies outside a point 1.27 standard deviations from the m THE P TEST CAN BE PERFORMED AT THIS POINT The NULL Hypothesis is rejected if the p-value (the percentage of area under the Normal curve ouside point x) is less than /2
The p-value = 0.101546 and is much larger than /2 (0.025) so the NULL Hypothesis is not rejected - The manufacturer's claim
TO PERFORM THE EQUIVALENT CRITICAL VALUE TEST, DO THE FOLLOWING; 1) Calculate the critical value of - NORMSINV(0.975)= 1.96
This states that of 0.05 on a two-tailed test produces a confidence interval that goes from 1.96 standard deviations above the If x is outside of this range (the z value for z is greater than 1.96), then the NULL Hypothesis is rejected.
In this case, the z value of x (1.27279) is less than the critical value (1.96) and therefore x is closer to the mean than the critical
Problem: A furniture company states that its average delivery time is 15 days with a (population) standard deviation o A random sample of 50 deliveries showed an average delivery time of 17 days. Determine within 98% certainty (0.02 significance level) whether delivery time has increased. Givens: n= = = x= =
50 0.02 4 17 15
This is a one-tailed test because we are checking whether delivery time increased. NULL Hypothesis - = 15 ALTERNATE Hypothesis - > 15
Using the P-test, we will determine if the p value (area above x under the normal curve) is less than (since this is a one-tailed 1) Calculate Sample Standard Error 2) Calculate z value for sample Sample Standard Error = / SQRT(n) = Z value = (x - ) / (Sample Standard Error)= 0.565685 3.535534
3) Calculate p value - the area under the Normal curve outside the sample z value = 1 - NORMSDIST(3.535534) = This states that 0.000203 of the total area under the Normal curve is lies above the point 3.535534 standard deviations abov
This p-value (0.000203) is less than (0.02) so the NULL Hypothesis is rejected - It appears likely that delievery time has inc
s not changed. ant to establish a 95% certainty level, then , "alpha" , = 0.05
"critical value" is the number of standard deviations that translates to a 95% level of certainty. to be greater than the right critical value h critical value is 1.96 standard deviations
value of that level of certainty), then the NULL hypothesis is normally rejected
area under the normal curve that is outside the required level of certainty. est is two-tailed so 2.5% of the total area will be in one tail above the 95% certainty level
s. And that the population standard deviation, , is 0.1 mls. icance level (95% confidence level) whether
NORMSDIST(1.272792) = 0.101546 1.27 standard deviations from the mean on either side (tail) of the Normal curve.
curve ouside point x) is less than /2 (in a two-talied test) or (in a one-tailed test)
m 1.96 standard deviations above the mean to 1.96 standard deviations below the mean.
is is rejected.
s closer to the mean than the critical value, and we do not reject the NULL Hypothesis.
me has increased.
0.000203
Discrete Variables
Calculating Means, Standard Deviations, and Variances of their distributions of Disrete Variables.
x
Grade 4 3 2 1 0
P(x)
Probability 0.1 0.2 0.35 0.25 0.1 1
x * P(x)
0.4 0.6 0.7 0.25 0 1.95 = 1.95
x
Grade 4 3 2 1 0 Mean 1.95 1.95 1.95 1.95 1.95 ( x - Mean ) 2.05 1.05 0.05 -0.95 -1.95 Square of (x - Mean ) 4.2025 1.1025 0.0025 0.9025 3.8025 Variance =
P(x)
Probability 0.1 0.2 0.35 0.25 0.1
1.116915