0% found this document useful (0 votes)
8 views11 pages

CS 700 Midterm

The document contains a midterm exam for CS 700, authored by Saad Muhammad Abdul Ghani, which includes various statistical calculations and analyses related to performance metrics of vendors, confidence intervals, hypothesis testing, and regression analysis. Each section provides detailed calculations, results, and conclusions based on the data analyzed. The document demonstrates the application of statistical methods to evaluate and compare different vendors and server performance metrics.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views11 pages

CS 700 Midterm

The document contains a midterm exam for CS 700, authored by Saad Muhammad Abdul Ghani, which includes various statistical calculations and analyses related to performance metrics of vendors, confidence intervals, hypothesis testing, and regression analysis. Each section provides detailed calculations, results, and conclusions based on the data analyzed. The document demonstrates the application of statistical methods to evaluate and compare different vendors and server performance metrics.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

CS 700 – Midterm

Saad Muhammad Abdul Ghani G01374351

All numerical answers are rounded to 4 decimal places

1a)
V1 vs V2 = (40/50 + 50/60 + 60/35)/3 = 1.1159
V1 vs V3 = (40/60 + 50/38 + 60/50)/3 = 1.0608

V2 vs V1 = (50/40 + 60/50 + 35/60)/3 = 1.0111


V2 vs V3 = (50/60 + 60/38 + 35/50)/3 = 1.0374

V3 vs V1 = (60/40 + 38/50 + 50/60)/3 = 1.0311


V3 vs V2 = (60/50 + 38/60 + 50/35)/3 = 1.0873

Each vendor would conclude that their performance is better than


everyone else's performance as each of their results are greater than 1.

1b)
Harmonic mean would be used as the values are rate based metrics.

Calculation of performance metrics:

Performance metric for V1: 3/( (1/40) +(1/50) + (1/60)) = 48.6486


Performance metric for V2: 3/( (1/50) +(1/60) + (1/35)) = 45.9854
Performance metric for V3: 3/( (1/60) +(1/38) + (1/50)) = 47.6323

Vendor V1 has the highest performance and therefore is better than the
other two.
2) For this question, I have calculated A-B (and not B-A).

2a) mean (A-B) = -24.6667


sample standard deviation (A-B) = 35.598

2b) ⍺ = 0.05
n=9
t[1-alpha/2; n-1] = 2.30600

Confidence Interval for the mean QoS difference:


(-51.7697, 2.4364)

2c) No we cannot say which one is better at the 95% significant level because
the confidence interval contains 0.

Therefore we must change the significant level (⍺)

When we increase the ⍺, the interval increases. Therefore, the ⍺ should


be such that the upper bound of the confidence interval should be less
than 0 (as I am calculating A-B. If I were to calculate B-A, then the lower
bound of the confidence interval should be greater than 0).

Calculation:
𝑠
𝑥̅ + 𝑡[1−⍺/2;𝑛−1] <0
√𝑛

plugging in values of 𝑥̅ = −24.6667, 𝑠 = 35.598, 𝑛 = 9 we get:

𝑡[1−⍺/2;𝑛−1] < 2.0987075

using Excel's T.Dist.2T formula we can find out ⍺:


=T.DIST.2T(2.0987075, 8) = 0.06908

⍺ = 0.07

2d) Using trial and error, we may find out what value of B is required to ensure
that B is better than A. I calculate the mean, sample standard deviation
at different values of B using the previous value of t-statistic (2.306)

B: 400 B: 450 B:420 B: 415 B: 416 B:417


mean -22.2 -27.2 -24.2 -23.7 -23.8 -23.9
sample std
deviation 34.1461 34.1949 33.2760 33.3835 33.3560 33.3315
Lower CI -46.6266 -51.6615 -48.0042 -47.5811 -47.6614 -47.7439
Upper CI 2.2266 -2.7385 -0.3958 0.1811 0.0614 -0.0561

We can see that the at B = 417, the confidence interval does not contain
0 and therefore, 417 is the minimum QoS value needed by B to be better
than A.
3a) 𝑝𝑎 = 0.7
n = 100
⍺ = 0.05

To ensure we can use normal distribution approximation, np > 10


since np_a = 70 which is greater than 10, we can use normal distribution
to find the confidence interval.

Formula for finding confidence intervals for proportions:

𝑝𝑎 (1−𝑝𝑎 )
𝑝𝑎 ± 𝑧1−⍺/2 √ 𝑛
𝑧1−⍺/2 = 1.96 𝑓𝑜𝑟 ⍺ = 0.05

Plugging in values we find the confidence intervals:


(0.6102, 0.7898)

3b) To ensure that B is better than A, the lower bound of B's confidence
interval must be greater than the upper bound of A's confidence
interval.

Calculation:
𝑝𝑏 (1−𝑝𝑏 )
𝑝𝑏 ± 𝑧1−⍺/2 √ > 0.7898
𝑛
Plugging in values for n, and z-statistic and manipulating the equation
we get the quadratic equation:

1.038416𝑝𝑏 2 − 1.5796𝑝𝑏 + 0.62378 > 0

Then using the quadratic formula we can solve for pb. The values of pb
we get are:
𝑝𝑏 = 0.858203 𝑜𝑟 𝑝𝑏 = 0.699955

since 𝑝𝑏 is supposed to be greater than 𝑝𝑎 we reject the latter value and


accept the former one therefore:

𝒑𝒃 = 𝟎. 𝟖𝟓𝟖𝟐
3c) To ensure that A is better than C, the lower bound of A's confidence
interval must be greater than the upper bound of C's confidence
interval.

Calculation:
𝑝𝑐 (1−𝑝𝑐 )
𝑝𝑐 ± 𝑧1−⍺/2 √ < 0.6102
𝑛
Plugging in values for n, and z-statistic and manipulating the equation
we get the quadratic equation:

1.038416𝑝𝑐 2 − 1.258816𝑝𝑐 + 0.3723 < 0

Then using the quadratic formula we can solve for pb. The values of pb
we get are:
𝑝𝑐 = 0.512004 𝑜𝑟 𝑝𝑐 = 0.700243

since 𝑝𝑐 is supposed to be less than 𝑝𝑎 we reject the latter value and


accept the former one therefore:

𝒑𝒄 = 𝟎. 𝟓𝟏𝟐𝟎
4a) Null Hypothesis: mean response time (µ) ≥ 6.4
Alternate Hypothesis: mean response time (µ) < 6.4

⍺ = 0.1
n=64

We choose Z-statistic as n > 30

The region of rejection is:


Z-statistic < -1.2815516

To find the critical mean response time (Rc) threshold, we use the region
of rejection and make a formula for Rc:

Formula:

𝑅𝑐−µ
< −1.2815516
𝜎/√𝑛

Plugging in values of 𝜎, n and µ we get:

Rc < 6.2077

Therefore, Rc = 6.208

4b) F1, F4 will be accepted for deployment as F1's value (6.1) and F's value
(6.2) are less than Rc (6.208). The other two - F2 (6.3) and F3 (6.25) -
have values greater than Rc and so are not deployed.

4c) To find the p-value we first find the Z-statistic:

Formula:
𝑥̅ − µ
𝑍=
𝜎/√𝑛

Here 𝑥̅ is referring to F3's value. ie 𝑥̅ = 6.25. The rest is the same and
so:
Z = -1

Then using Excel's NORMDIST formula we find the p-value:


=NORMDIST(-1, 0, 1, TRUE) = 0.15866
p-value = 0.1587

Since p-value > ⍺, the null hypothesis is accepted for F3, and therefore
F3 is not deployed.

4d) To ensure all servers are accepted we can use similar formulation as in
4a.

𝑅𝑐−µ
< 𝑛𝑜𝑟𝑚𝑖𝑛𝑣(𝛼, 0,1)
𝜎/√𝑛
We set Rc to be the highest value of the server's ie, 6.3 (of F2) and the r
est is the same. We get:

−0.66667 < 𝑛𝑜𝑟𝑚𝑖𝑛𝑣(𝛼, 0,1)

The inverse of norminv is normdist on excel and therefore we can solve


for ⍺ by:

normdist(-0.6667, 0, 1, True) < ⍺

Which gives us:


0.2525 < ⍺

ie. ⍺ = 0.26
5a)
Error (E) vs Rate (Khz) y = -1.8776x + 3.6766
R² = 0.5112
7
6
5
4
Error (E)

3
2
1
0
0 0.5 1 1.5 2 2.5 3
-1
-2
Rate (Khz)

Clearly the trend is not linear. Plotting the linear trend line and
calculating the R2 value (0.5112) further proves that the trend is not
linear.

5b)
H (1/E) vs Rate (Khz) y = 1.4397x - 0.2493
R² = 0.9922
4
3.5
3
2.5
H (1/E)

2
1.5
1
0.5
0
0 0.5 1 1.5 2 2.5 3
Rate (Khz)

The trend is very close to linear. Plotting the linear trend line and
calculating the R2 value (0.9922) further proves that the trend is linear.
5c)
To calculate the values of k0 and k1, the below values are calculated:

x = Rate
(Khz) y = H (1/E)
∑x 14.3 ∑y 17.59621
∑(x^2) 21.87 ∑(y^2) 35.89133
x_bar 1.19167 y_bar 1.46635
∑xy 27.92150 n 12

The formulae for k0 and k1 are:

∑(𝑥𝑦)−𝑛𝑥̅ 𝑦̅
𝑘1 = ̅̅̅̅
∑(𝑥 2 )−𝑛𝑥 2 𝑘0 = 𝑦̅ − 𝑘1 𝑥̅

After plugging in values from the table above we get:


𝒌𝟎 = −𝟎. 𝟐𝟒𝟗𝟑
𝒌𝟏 = 𝟏. 𝟒𝟑𝟗𝟕

5d)
To calculate coefficient of determination (R2) we calculate SSE and SST in
advance:

SST 10.0891
SSE 0.0791

Formulae for SSE, SST and R2 are:


SSE = ∑y2 - k0∑y -k1∑xy

SST = ∑ 𝑦 2 − 𝑛𝑦̅ 2

𝑆𝑆𝐸
𝑅 2 = 1 − 𝑆𝑆𝑇

Then:
R2 = 0.9922

To calculate the confidence intervals for k0 and k1 the values below are
calculated:
se^2 0.00791 se 0.08896
sb0 0.05465 sb1 0.04048
alpha 0.05 t[1-alpha/2; n-2] 2.22814

The formulae for se, sb0 and sb1 are:

And the formula for the confidence intervals of k0 and k1 are


respectively:

𝑘0 ± 𝑡[1−𝛼/2;𝑛−2] 𝑠𝑏0

𝑘1 ± 𝑡[1−𝛼/2;𝑛−2] 𝑠𝑏1

Finally this gives us the confidence intervals:


(-0.3711, -0.1276) for k0
(1.3495, 1.5299) for k1

5e)
SST and SSE are given above. SSR is calculated as follows:
SSR = 10.01

The formula for SSR is:


SSR = SST - SSE
5f) To calculate confidence interval for yp (the predicted value of y given
x=0.55) we calculate the below values using the se of 5d:

symp 0.03653
yp 0.5425

We calculate our yp using the given x (0.55) and our previous values of k0
and k1.

The formulae for calculating symp and the confidence intervals around
yp are:

Then the confidence interval for yp:


(0.4611, 0.6239)

You might also like