0% found this document useful (0 votes)
72 views

Assignment#4: A) Draw The Scatter Plot of The Data, If You Can Plot Via R Would Also Be Acceptable

The document discusses fitting linear and quadratic regression models to data relating house area (x) and price (y). [1] A scatter plot of the data is generated and linear and quadratic regression equations are derived. [2] Solving the normal equations gives coefficients a = 96.15 and b = 0.0633 for the linear model, and coefficients a = 294.9714, b1 = -0.20338, and b2 = 0.0000834 for the quadratic model. [3] The quadratic model is a better fit as it accounts for the nonlinear relationship between area and price. The linear model assumes a direct proportional relationship that does not reflect the curvature in the data

Uploaded by

Anooshay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views

Assignment#4: A) Draw The Scatter Plot of The Data, If You Can Plot Via R Would Also Be Acceptable

The document discusses fitting linear and quadratic regression models to data relating house area (x) and price (y). [1] A scatter plot of the data is generated and linear and quadratic regression equations are derived. [2] Solving the normal equations gives coefficients a = 96.15 and b = 0.0633 for the linear model, and coefficients a = 294.9714, b1 = -0.20338, and b2 = 0.0000834 for the quadratic model. [3] The quadratic model is a better fit as it accounts for the nonlinear relationship between area and price. The linear model assumes a direct proportional relationship that does not reflect the curvature in the data

Uploaded by

Anooshay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

Assignment#4

4/25/2020

---------------------------------------------------------------------------------------------------------------------

a) Draw the scatter plot of the data, if you can plot via R would also be
acceptable.

# Reading data from Excel CSV file


xyData<-read.csv(file.choose(),header = T)

attach(xyData)
plot(Area_Sq_Feet,Price_Dollars_1000,main = "Scatter Plot: Area and Selling

Price")

*SCATTER PLOT USING EXCEL*


A
r
e
a
S
q.
F
e Pric
e e($ x-
t 100 sq
b) Derive the normal (x 0)- ua equation by least square fit
method. ) (y) xy re

1
6
Normal equations by least 1 9 10 square fit method are given
below; 0 0 00
0 0 00
0 169 0 0
To form above two equations, we need the
following; 1
9
1 0 12
Sum of x, Sum of y, Sum of xy, Sum of x-square
1 3 10
0 0 00
Here: 0 173 0 0

y = Price (Dependent 2 variable) and x = Area


(Independent variable) 1
1 1 14
2 2 40
0 0 00
0 176 0 0

2
4
1 9 19
4 2 60
0 0 00
0 178 0 0

2
9
1 7 25
6 6 60
0 0 00
0 186 0 0

3
5
1 4 32
8 6 40
0 0 00
0 197 0 0

3
8
1 5 36
9 7 10
0 0 00
0 203 0 0

3
9
1 5 36
Formulate above (1) and (2) Normal equations;

11*a + b*18300 = 2216 ..................................(1)

18300 * a + b * 32290000 = 38033400 ................ (2)

c) Solve the normal equations derived from step 2 simultaneously and


compute the unknown parameters.

By solving (1) and (2), we get unknown parameters 'a' and 'b'.

To cancel parameter 'a' in the above equations, we do the following operations;

(1) x 18300

(2) x 11

we get:

''a'' ''b''
Coefficient Coefficient RHS

1st
Equation 201300 334890000 40552800

2nd
Equation 201300 355190000 41837400

(2) - (1) 0 20300000 1284600

Solving the table, we get:

b = 0.0633

Substitute this b in (1) we get,

a = 96.15
d) Find sum of square error (SSE): ∑ (Y − Ŷ )

Area
Sq.Feet
(x) Price (y) xy x-square Y-hat (Y - Y-hat)

1000 169 169000 1000000 159.45 9.55

1100 173 190300 1210000 165.78 7.22

1200 176 211200 1440000 172.11 3.89

1400 178 249200 1960000 184.77 -6.77

1600 186 297600 2560000 197.43 -11.43

1800 197 354600 3240000 210.09 -13.09

1900 203 385700 3610000 216.42 -13.42

1900 208 395200 3610000 216.42 -8.42

2100 229 480900 4410000 229.08 -0.08

2100 237 497700 4410000 229.08 7.92

2200 260 572000 4840000 235.41 24.59

Total 18300 2216 3803400 32290000 -0.04

∑ (Y − Ŷ) = -0.04

e) Predict the price which you have an 1850 square feet area plot.
To predicted value of Y when X=1850, we need to find the regression equation:
Y=a1+b1X+b2X^2

Table for calculation:

Observati
ons X Y X^2 X^3 X^4 XY X^2*Y

10000000 10000000 16900000


1 1000 169 1000000 00 00000 169000 0

13310000 14641000 20933000


2 1100 173 1210000 00 00000 190300 0

17280000 20736000 25344000


3 1200 176 1440000 00 00000 211200 0

27440000 38416000 34888000


4 1400 178 1960000 00 00000 249200 0

40960000 65536000 47616000


5 1600 186 2560000 00 00000 297600 0

58320000 10497600 63828000


6 1800 197 3240000 00 000000 354600 0

68590000 13032100 73283000


7 1900 203 3610000 00 000000 385700 0

68590000 13032100 75088000


8 1900 208 3610000 00 000000 395200 0

92610000 19448100 10098900


9 2100 229 4410000 00 000000 480900 00

92610000 19448100 10451700


10 2100 237 4410000 00 000000 497700 00

10648000 23425600 12584000


11 2200 260 4840000 000 000000 572000 00

59619000 11381650 68922600


Total 18300 2216 32290000 000 0000000 3803400 00
The normal equations for the quadratic equation are as follows

b2 ∑xi^4 + b1 ∑xi^3 + a∑xi^2 = ∑xi^2yi

b2 ∑xi^3 + b1 ∑xi^2 + a∑xi = ∑xiyi

b2 ∑xi^2 + b1 ∑xi + an = ∑yi

So, putting the values we have:

113816500000000 b2 +59619000000 b1+ 32290000 a =6892260000 ...... (1)

59619000000 b2+32290000 b1 + 18300 a = 3803400......(2)

32290000 b2 + 18300 b1 + 11 a = 2216......(3)

Dividing both sides of (1) by 10000 we get,

11381650000 b2 + 5961900 b1 + 3229 a = 689226...... (4)

Dividing both sides of (2) by 100 we get,

596190000 b2 + 322900 b1 +183 a = 38034......(5)

So, solving the final equations (3), (4) and (5)

32290000 b2 + 18300 b1 + 11 a = 2216......(3)

11381650000 b2 + 5961900 b1 + 3229 a = 689226...... (4)

596190000 b2 + 322900 b1 +183 a = 38034......(5)

Solving we get,

Solving a, b and c we have,

b2 = 1523/18260000=0.0000834

b1 = -215397/1059080=-0.20338

a = 7809958/26477 = 294.9714

So, the regression equation is:

Y = 294.97 - 0.20338X + 0.0000834X^2

The predicted value of Y when X = 1850 is


Y = 294.97 - 0.20338 * 1850 + 0.0000834 * 1850^2 = 204.1535

So, Y=204 approximately (in $1000)

f) Verify that you get the following equation at the end.


Y = 294.97 − 203.38X + 83.40X^2

As Y's units was 1000, we have the modified equation as:

Y = 294.9714 - 0.20338 * 1000X + 0.0000834 * 1000^2 * X^2

Y = 294.9714 - 203.38X + 83.40X^2 is the final equation

g) In this case, what will be the impact, when you fit a linear regression
instead of polynomial? Justify your answer in points.

Now, we fit a linear regression equation: Y = a + b * X

b = slope of the regression = r * SD(y) / SD(X)

a = ybar – b * xbar = intercept

ybar = mean of Y = ∑yi / n

xbar = mean of X = ∑xi / n

r = correlation coefficient = ∑(x-xbar) (y-ybar) /√ {(∑(x-xbar) ^2*∑ (y-ybar) ^2}

SD(x) = Standard deviation of x = √(∑(x-xbar) ^2/(n-1)

SD(y) = Standard deviation of y = √(∑(y-ybar) ^2/(n-1)

Table for calculation:

Observa X Y (X-Xbar) (Y-Ybar) (X- (X- (Y-


tions Xbar)*(Y Xbar)^2 Ybar)^2
-Ybar)

21537.9 440412. 1053.29


1 1000 169 -663.636 -32.4545 7456 7405 457

16037.9 317685. 809.658


2 1100 173 -563.636 -28.4545 8056 5405 5703

11801.6 214958. 647.931


3 1200 176 -463.636 -25.4545 2256 3405 5703

6183.45 69503.9 550.113


4 1400 178 -263.636 -23.4545 0562 405 5703

983.462 4049.54 238.841


5 1600 186 -63.636 -15.4545 562 0496 5703

-
607.433 18595.1 19.8425
6 1800 197 136.364 -4.4545 438 405 7025

365.300 55867.9 2.38857


7 1900 203 236.364 1.5455 562 405 025

1547.12 55867.9 42.8435


8 1900 208 236.364 6.5455 0562 405 7025

12019.8 190413. 758.754


9 2100 229 436.364 27.5455 6456 5405 5703

15510.7 190413. 1263.48


10 2100 237 436.364 35.5455 7656 5405 257

31401.6 287686. 3427.57


11 2200 260 536.364 58.5455 9856 3405 557

Xbar=16 Ybar=20 116781. 1845454 8814.72


Total 18300 2216 63.636 1.4545 8182 .545 7273

So b=r.sy/sx

Now r =∑(x-xbar)(y-ybar)/√{( ∑(x-xbar)^2*∑ (y-


ybar)^2}=116781.8181/√(1845454.5455*8814.7273)

=0.9156
Sy=SD(y) =√(∑(y-ybar)^2/(n-1) =√(8814.7273/10)=29.6896

Sx= SD (x)=√(∑(x-xbar)^2/(n-1)=√(1845454.545/10)=429.5875

b=0.9156*429.5875/29.6896=0.06327

a=ybar-b*xbar=201.4545-0.06327*1663.636=96.1963

So, the Linear Regression is

Y=96.1963+0.06327*X
The predicted value of Y for X=1850 in the linear regression is:
Y=96.1963+0.06327*1850=213.2458=213 approximately (in $1000)

You might also like