0% found this document useful (0 votes)
11 views54 pages

Midterm

The document presents statistical analyses including chi-square tests, regression analysis, and confidence intervals related to various datasets. It concludes that there is a significant relationship between student population and quarterly sales, with a regression equation indicating that an increase in student population correlates with increased sales. Additionally, the chi-square tests suggest that certain attitudes are not independent of age, and various p-values indicate the acceptance or rejection of hypotheses based on significance levels.

Uploaded by

Fatima Faisal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views54 pages

Midterm

The document presents statistical analyses including chi-square tests, regression analysis, and confidence intervals related to various datasets. It concludes that there is a significant relationship between student population and quarterly sales, with a regression equation indicating that an increase in student population correlates with increased sales. Additionally, the chi-square tests suggest that certain attitudes are not independent of age, and various p-values indicate the acceptance or rejection of hypotheses based on significance levels.

Uploaded by

Fatima Faisal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
You are on page 1/ 54

A B C D F

observed fre 18 30 40 22 10
expected fre 12 36 48 18 6

p-value = CHITEST(observed freq range, expected freq range)


0.06393753 > 0.05
not reject
chi-square va 8.88888889
critical valu 9.48772904
not reject
chi-sequare value = CHIINV(probability, degree of freedom)

8.88888889
18-30 31-44 45-58 over 58 total
Jessica Chastain 51 50 41 42 184
Jennifer Lawrence 63 55 37 50 205
Emmanunuelle Riva 15 44 56 74 189
Quvenzhane Wallis 48 25 22 31 126
Naomi Watts 36 65 62 33 196
total 213 239 218 230 900

expected freq

43.546667 48.862222 44.568889 47.022222 184


48.516667 54.438889 49.655556 52.388889 205
44.73 50.19 45.78 48.3 189
29.82 33.46 30.52 32.2 126
46.386667 52.048889 47.475556 50.088889 196
213 239 218 230 900

difference

7.4533333 1.1377778 -3.568889 -5.022222 0


14.483333 0.5611111 -12.65556 -2.388889 0
-29.73 -6.19 10.22 25.7 0
18.18 -8.46 -8.52 -1.2 0
-10.38667 12.951111 14.524444 -17.08889 0
0 0 0 0 0
squared difference

55.552178 1.2945383 12.736968 25.222716 0


209.76694 0.3148457 160.16309 5.7067901 0
883.8729 38.3161 104.4484 660.49 0
330.5124 71.5716 72.5904 1.44 0
107.88284 167.73128 210.95949 292.03012 0
0 0 0 0 0

sq diff divided by expected freq

1.275693 0.0264936 0.2857816 0.5363999 2.1243681


4.3236059 0.0057835 3.2254817 0.1089313 7.6638024
19.760181 0.763421 2.2815291 13.674741 36.479872
11.083581 2.1390197 2.3784535 0.0447205 15.645775
2.3257296 3.2225718 4.4435391 5.8302376 15.822078
38.768791 6.1572896 12.614785 20.195031 77.735896
degree of freedom (5-1)(4-1)= 12

chi-square at 0.05= 21.026 <77.74


reject hypotheses
the attitude is not independent of age
proportion
0.2044444444 M&M candyblue
0.2277777778 105
0.21 overall prop 0.24
0.14
0.2177777778 expected fr 120

p-value 0.3208841
7.7760998E-08 <0.05 reject hypotheses not reject

chi-square 1.875
degree of freedom
(6-1) = 5
5.85
not reject

18 20 22 27
25 22 27 25
26 23 20 24
27 25 19 21
26 25 31 29
25 28 26 28
mean 24.5
standard de3.0143336

weekly demand at markets


n = 30
k = n/5 = 30/5 = 6
expected fre 30/6 = 5
interval limi(1/k)100% 0.1666667
(5/n)100% 0.1666667
expected freq = n(5/n) = 5
30(5/30) =
5
z-score the upper 16.67% = 0.97
the lower 16.67% = -0.97

percentagez score test score = mean+stdev(z-scor


16.67% -0.97 21.576096
33.34% -0.43 23.203837
50.01% 0 24.5
66.68% 0.43 25.796163
83.35% 0.97 27.423904

intervals
<21.6 21.6-23.2 23.2-24.5
observed fr 5 4 3
expected fr 5 5 5
chi-square 0 0.2 0.8
degree of f 6-2-1 = 3 chi-square = 6.251
not reject

p-value 0.7307865 >0.1


not reject
brown green orange red yellow total
72 89 84 70 80 500
0.13 0.2 0.16 0.13 0.14 1

65 100 80 65 70 500

>0.05

0.7538462 1.21 0.2 0.3846154 1.4285714 5.852033


gree of freedom

<11.070

22
24
26
25
25
24

mean+stdev(z-score)
24.5-25.8 25.8-27.4 >27.4 total
7 7 4 30
5 5 5 30
0.8 0.8 0.2 2.8 < 6.251
Least Squares Method

Calculation for the least squares estimated regression equation


sources Xi Yi Xi-meanX Yi-meanY (Xi-meanX)(Yi-meanY) (Xi-meanX)^2
total

the estimated regression equation:

Plot the grapd of the estimated regression equation


resraurant student population (1000s) Xi quarterly sales ($1000s) Yi
1 2 58 Scatter diagram
2 6 105 250
3 8 88
4 8 118
5 12 117 200
6 16 137 f(x) = 5 x + 60

quarterly sales ($1000s) Yi


7 20 157
150
8 20 169
9 22 149
10 26 202
100

least squares methods


yi^ = b0 + bixi 50
yi^ = predicted value of quarterly sales
b0 = the y-intercept of estimated regression line
bi = the slope of the estimated regression line 0
2 7
xi = size of the student population for the restraurant

resraurant xi yi xi-mean x
1 2 58 -12
2 6 105 -8
3 8 88 -6
4 8 118 -6
5 12 117 -2
6 16 137 2
7 20 157 6
8 20 169 6
9 22 149 8
10 26 202 12
total 140 1300
mean 14 130

bi = sum of (xi-mean x)(yi-meany)/sum of (xi-mean x)^2


bi = 5

b0 = mean y-bi*mean x
b0 = 60

the estimated regreassion equation is


y^ = 60 + 5x
the slope of the estimated regression equation is positive, implying that as student poluation increase, sales increases.
In conclusion, an increase in the student population of 1000 is associated with an increase of $5000 in expected sales, whic

Mean square error (estimate of variance)


s^2 = MSE = SSE/(n-2) = 1530/(10-2) = 191.25

standard error of the estimate


s = sqrt(MSE) = sqrt(SSE/(n-2)) = 13.8293166859393

t Test
estimated standard deviation of b1
s(b1) = s/sqrt(sum(xi-mean x)^2) = 0.580265238041082

a = 0.01 level of significance


t = b1/s(b1) 8.61674915574731 > 2.896 (t distribution table at df = 8 and a=0.01)
reject the null hypothesis
concluded that beta1 is not equal to zero
A significant relationship exists between student population & quarterly
F test
MSR = SSR/regression degree of freedom = 14200
= SSR/# of independent variables

F = MSR/MSE = 74.2483660130719

p-value = 2.54886628529355E-05 <0.01

ANOVA
source of vara SS DF MS
regression SSR = 14200 # of independent variable = 1 MSR = SSR/1 = 14200
error SSE = 1530 n-2 = 8 MSE = SSE/n-2 = 191.25
total SST = 15730 n-1 = 9

Confidence interval for the mean value of y

s(y^*) = s*sqrt((1/n)+(x*-mean x)^2/sum(x-mean x)^2) = 4.9509922180975


confidence interval: y^* ± ta/2 s(y^*) t(a/2) = 2.306
110 ± 2.306*4.951 11.4169880549328
110 ± 11.416

Predicted interval of an individual value of y


s^2(pred) = s^2 + S^2(y^*)
s(pred) = s*sqrt(1+1/n+(x*-mean x)^2/sum(x-mean x)^2)
s(pred) = 14.6888503274988

predicted interval: y^* ± t(a/2)*s(pred)


110 ± 2.306*14.69 33.87514
110 ± 33.87514
Scatter diagrame of student population & Quaterly sales for Armand's pizza parlors

60

7 12 17 22 27 32
student population (1000s) Xi

yi-mean y (xi-mean x)(y(xi-mean x)^2 predicted sales (y^=60+5x)


-72 864 144 70
-25 200 64 90
-42 252 36 100
-12 72 36 100
-13 26 4 120
7 14 4 140
27 162 36 160
39 234 36 160
19 152 64 170
72 864 144 190
2840 568

coefficient of determination

correlation coefficient

, sales increases.
expected sales, which mean quaterly sales are expected to increase by $5 per student.

which means the calculated p-value is less than 0.01

pulation & quarterly sales.


F P
F = MSR/MSE = 74.24836601 74.248366
SUMMARY OUTPUT

回归统计
Multiple R 0.9501229552
R Square 0.90273363001
Adjusted R Squ 0.89057533376
标准误差 13.8293166859
观测值 10

方差分析
df SS MS F Significance F
回归分析 1 14200 14200 74.248366 2.54887E-05
残差 8 1530 191.25
总计 9 15730

Coefficients 标准误差 t Stat P-value Lower 95%


Intercept 60 9.22603480970342 6.50333553 0.00018744 38.7247256
student populat 5 0.580265238041082 8.61674916 2.54887E-05 3.66190596

error (yi-y^) squared error squared deviation (yi-mean y)^2 squared due to regression (y^-mean y)^2
-12 144 5184 3600
15 225 625 1600
-12 144 1764 900
18 324 144 900
-3 9 169 100
-3 9 49 100
-3 9 729 900
9 81 1521 900
-21 441 361 1600
12 144 5184 3600
SSE = 1530 SST = 15730 SSR = 14200SSR = SST-SSE
1530 15730 14200

r^2 = SSR/SST
r^2 = 0.90273363001
90.27% of the total sum of square can be explained by using the estimated regression equation to predict q
rxy = (sign of bi)sqrt(r^2)
rxy = 0.9501229552
A strong positive linear association exists between x and y.
ficance F

Upper 95% 下限 95.0% 上限 95.0%


81.2752744 38.7247256 81.2752744
6.33809404 3.66190596 6.33809404

mean y)^2
equation to predict quaterly sales.
R^2 in shoe sales prediction

y^2 = 25 + 10x1 + 8x2


n = 10
SST = 16000
SSR = 12000

d)
R^2 = SSR/SST = 0.75

e)
Ra^2 = 1-(1-R^2)*(n-1/n-p-1) = 0.67857143

f)
Since R^2 = 0.75, it can be concluded that 75% of the variability in y is explained by
the estimated multiple regression equation with x1 and x2 as the independent
variables.
After adjusting for the number of independent variable in the model, only 67.86%
of the variability in y has been accounted for.
The estimated regression equation may not provide a good fit.

y^2 = -18.37 + 2.01x1 + 4.74x2

SST = 15182.9
SSR = 14052.2
s(b1) = 0.2471
s(b2) = 0.9484

a)
MSR = SSR/p = 7026.1
SSE = SST-SSR = 1130.7
MSE = SSE/(n-p-1) = 161.528571
F = MSR/MSE = 43.4975679

F value in the table at a=0.05 is 4.74. <43.5


The test statistic value is greater than the critical value, so reject the null hypothesis.
Therefore, it can be conclueded that a significant relationship is present bewteen y and 2 inde

b)
t = b1/s1 = 8.13435856
df = n-p-1 = 7
a = 0.05
t critical value: t = 2.365 < 8.134

The test statistic value is greater than the critical value, so reject the null hypothesis.
Therefore, it can be concluede that the parameter beta1 is statistically significant.

c)
t = b2/s2 = 4.99789119
df = 7
a = 0.05
t critical value: t = 2.365 < 4.997
reject null hypothesis
the parameter beta2 is statistically significant.
Estimated regression equation: y^2 = 20385.25 - 0.03739x1 - 686.34x2

b)
The R square is 0.951, and the correlation coefficient is 0.9752
The obtained corrlation coefficient is greater than +0.7, so the multicollinearity will be a prob

c)
F test statistic value = 290.8454
a = 0.05
df = 2 and 30
F critical value = 3.32 < 290.8454

The test statistic value is greater than the critical value, so the null hypothesis will not reject.
Thterefore, the is a significant relationship exist between y and two independent variables.

d)
t = b1/s1 = #REF! < -2.042
t = b2/s2 = #REF! < -2.042

ta/2 = 0.025
df = 30
t critical value = 2.042

The test statistic value is greater than the critical value, so reject the null hypothesis.
Therefore, it can be concluede that mileage and age variables both are statistically significan

b)
F test at a = 0.05
F = 6.2519
df = 2 and 4
the critical value is 6.94 > 6.25

The test statistic value is less than the c


Thterefore, the is NO significant relation

c)
t = b1/s1 = #DIV/0!
t = b2/s2 = #DIV/0!

a = 0.05
df = 4
ull hypothesis. the critical t value is 2.776
t bewteen y and 2 independent variables.

ull hypothesis.
significant.

cally significant.
1 - 686.34x2

linearity will be a problem for the model.

othesis will not reject.


ependent variables.

ull hypothesis.
statistically significant.

s 6.94 > 6.25

value is less than the critical value, so the null hypothesis will not reject.
NO significant relationship exist between y and two independent variables.

< 2.776 not reject NOT significant


> -2.776 reject the x2 independent variable (the average number of yards given u
is 2.776
umber of yards given up per game on defense is significant

You might also like