0% found this document useful (0 votes)
0 views

Assignment-1 (2)

The document presents a regression analysis using data from 'hprice3.csv' to model house prices based on various factors such as area, number of rooms, bathrooms, and age. It includes statistical results from the regression, tests for significance of coefficients, confidence intervals, and normality tests for residuals. The analysis concludes that certain variables significantly affect house prices and provides methods for testing hypotheses related to the regression model.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

Assignment-1 (2)

The document presents a regression analysis using data from 'hprice3.csv' to model house prices based on various factors such as area, number of rooms, bathrooms, and age. It includes statistical results from the regression, tests for significance of coefficients, confidence intervals, and normality tests for residuals. The analysis concludes that certain variables significantly affect house prices and provides methods for testing hypotheses related to the regression model.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

ASSIGNMENT 1

Using data in the file hprice3.csv, we have some variables:


 price: house selling price;
 area: square footage of house (feet2);
 rooms: number of rooms;
 baths: number of bathrooms;
 age: age of house;

A regression’s result from R package is as follows:

> hq1 <- lm(log(price) ~ log(area) + rooms + baths + age, data=hprice)


> summary(hq1)

Call:
lm(formula = log(price) ~ log(area) + rooms + baths + age, data = hprice)

Residuals:
Min 1Q Median 3Q Max
-1.3856 -0.1901 0.0122 0.1992 0.8413

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.7588037 0.4649094 14.538 < 2e-16 ***
log(area) 0.5288392 0.0694604 7.614 3.11e-13 ***
rooms 0.0593313 0.0231439 2.564 0.010822 *
baths 0.1190959 0.0348483 3.418 0.000715 ***
age -0.0037630 0.0005464 -6.887 3.09e-11 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2846 on 316 degrees of freedom


Multiple R-squared: 0.5834, Adjusted R-squared: 0.5781
F-statistic: 110.6 on 4 and 316 DF, p-value: < 2.2e-16

1) Present the sample regression function?


2) What is the meaning of the slope coefficient of log(area)?
3) Does the number of bathrooms affect house price at the 3% significant level?
4) Find the 98% confidence interval of the population slope coefficient of rooms?
5) Do you believe that the higher the age of a house is, the slower the price of that house
is, holding all other factors fixed? Give a conclusion at the 2% level.
6) It is believed that when we compare two houses with the same square footage, same
number of bathrooms and same age, but house A is 1 room higher that house B, then
we can predict that house A’s price is 8% higher than house B’s price. Justify your
answer at the 1% level.
7) Is either log(area) or rooms individually significant at the 1% level?
8) To test whether log(area) and rooms are jointly significant, what is the restricted model
which should be utilized? Suppose that when reaching the restricted model, we obtain
R R=0,4847 . Are log(area) and rooms jointly significant at the 1% level?
2

9) Test the null hypothesis H0: none of the explanatory variables has an effect on the
dependent variable, at the 1% level?
INSTRUCTIONS
1) The sample regression function:
^
log ⁡( price)=6.7588+0,5288 log ⁡(area)+0.0593 rooms+ 0.1191 baths−0.0038 age
log ( price )=6.7588+0.5288 log ( area ) +0.0593 rooms +0.1191 baths−0.0038 age+ u^
^
2) β 1=0.5288: Holding all other factors fixed, if the square footage is increased by 1%,
then the house price will increase by 0.5288%.
3) Test H0: baths = 0 against H1: baths ≠ 0
Method 1: tbaths = 3.418
Given a significant level  = 3%  /2 = 0.015
Since n – (k+1) = 316 is high, then t/2(n-(k+1))  z/2 = z0.015 = 2.17
Because |t| = 3.418 > z/2 = 2.17, we reject H0.
So, the number of bathrooms influences the house price.
Method 2: p-value(baths) = 0.000715 <  = 3%  Reject H0.
So, the number of bathrooms influences the house price.
^ n −( k+1 )
. se ( β^ baths ) ; ^β baths +t n−(k+1 )
. se ( β^ baths ) )
Method 3: β baths ∈( β baths −t α α
2 2
Given a confidence level (1-) = 1 – 0.03 = 0.97  /2 = 0.015
Since n – (k+1) = 316 is high, then t/2(n-(k+1))  z/2 = z0.015 = 2.17
Therefore:
β baths ∈(0.1191−2.17 × 0.0348 ; 0.1191+2.17 × 0.0348)
So: β baths ∈ ¿0.0436; 0.1946)
Since 0  (0.0436; 0.1946), we reject H0.
So, the number of bathrooms influences the house price.
4) The confidence interval is:
β rooms ∈( β^ rooms−t n−(k
α
+1)
. se ( ^β rooms) ; β^ rooms +t n−(k+1
α
)
. se ( β^ rooms ) )
2 2
Given a confidence level (1-) = 98%  /2 = 0.01
 t/2(n-(k+1)) = t0.01(316) = 2.326
Since n – (k+1) = 316 is high, then t/2(n-(k+1))  z/2 = z0.01 = 2.325
Therefore: β rooms ∈(0.0593−2.326 × 0.0231; 0.0593+ 2.326 ×0.0231)
So: β rooms ∈ (0.0055694; 0.1130306)
5) Test H0: age = 0 against H1: age < 0
Method 1: tage = -6.887
Given a significant level  = 2%: Since n – (k+1) = 316 is high, then t(n-(k+1))  z =
z0.02 = 2.055
Because t = -6.887 < -z = -2.055, we reject H0.
So, the higher the age of a house is, the slower the price of that house is.
Method 2: p-value(age) = 3.09x10-11/2 <  = 2%  Reject H0.
So, the higher the age of a house is, the slower the price of that house is.
6) Test H0: rooms = 0.08 against H1: rooms ≠ 0.08
Method 1: t = (0.0593 – 0.08) / 0.0231 = -0.896
Given a significant level  = 1%  /2 = 0.005  t/2(n-(k+1)) = t0.005(316) = 2.576
Since n – (k+1) = 316 is high, then t/2(n-(k+1))  z/2 = z0.005 = 2.575
Since |t| = 0.896 < t/2 = 2.576, we cannot reject H0.
Hence, when we compare two houses with the same square footage, same number of
bathrooms and same age, but house A is 1 room higher that house B, then we can predict
that house A’s price is 8% higher than house B’s price. (We agree with this statement)
Method 2: The confidence interval of β rooms is: ….
7) Test H0: log(area) = 0 against H1: log(area) ≠ 0
Test H0: rooms = 0 against H1: rooms ≠ 0
8) Testing exclusion restrictions: H0: log(area) = rooms = 0.
H1: There exists log(area) ≠ 0 or rooms ≠ 0

We have:
Given a significant level  = 1%  F(q; (n-(k+1))) = F0.01(2; 316) = 4.61
Since F = 37.433 > F = 4.61, we reject H0.
So, log(area) and rooms are jointly significant.
9) Test H0: R2 = 0 against H1: R2 > 0 (Test of overall significance of a regression)

Method 1:
Given a significant level  = 1%  F(k; (n-(k+1))) = F0.01(4; 316) = 3.32
Since = 110.63 > F = 3.32, we reject H0.
Thus, we can conclude that all the explanatory variables in the regression explain
variation in log(price).
Method 2:
p-value(F) < 2.2x10-16 <  = 1%  Reject H0.
Thus, we can conclude that all the explanatory variables in the regression explain
variation in log(price).
PRACTICE IN R

 Import data into R:


Use the formula read.csv(“link”, header = T), name the data “hprice”.

 Find the regression:


> hq1 <- lm(log(price) ~ log(area) + rooms + baths + age, data=hprice)
> summary(hq1)

Call:
lm(formula = log(price) ~ log(area) + rooms + baths + age, data = hprice)

Residuals:
Min 1Q Median 3Q Max
-1.3856 -0.1901 0.0122 0.1992 0.8413

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.7588037 0.4649094 14.538 < 2e-16 ***
log(area) 0.5288392 0.0694604 7.614 3.11e-13 ***
rooms 0.0593313 0.0231439 2.564 0.010822 *
baths 0.1190959 0.0348483 3.418 0.000715 ***
age -0.0037630 0.0005464 -6.887 3.09e-11 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2846 on 316 = (n-(k+1)) degrees of freedom


Multiple R-squared: 0.5834, Adjusted R-squared: 0.5781
F-statistic: 110.6 on 4 and 316 DF, p-value: < 2.2e-16

 Test if the error term ui is normally distributed:


Method 1: Using Jarque-Bera test:
> #Take the residuals of the above regression:
> phandu1 <- resid(hq1)

> #Download and install package “fBasics”.


> install.packages("fBasics")
> library(fBasics)
Loading required package: timeDate
Loading required package: timeSeries

> #Conduct Jarque-Bera test:


> jarqueberaTest(phandu1)

Title:
Jarque - Bera Normalality Test

Test Results:
STATISTIC:
X-squared: 33.3374
P VALUE:
Asymptotic p Value: 5.766e-08
Null hypothesis H0: The error term ui follows a normal distribution.
We have: p-value = 5,766.10-8 <  = 0,05  Reject H0.
Thus, the error term ui is not normal distributed.

Method 2: Using Shapiro-Will test:


> shapiro.test(phandu1)

Shapiro-Wilk normality test

data: phandu1
W = 0.9838, p-value = 0.001114
Instruction: look at p-value to conclude.

Method 3: Using Anderson – Darling test:


> #Download and install package “nortest”.
> install.packages("nortest")

> #Conduct Anderson-Darling test:


> library(nortest)
> ad.test(phandu1)

Anderson-Darling normality test

data: phandu1
A = 0.58913, p-value = 0.1236
Instruction: look at p-value to conclude.

Method 4: Using Kolmogorov-Smirnov test:


> lillie.test(phandu1)

Lilliefors (Kolmogorov-Smirnov) normality test

data: phandu1
D = 0.041139, p-value = 0.2066
Instruction: look at p-value to conclude.

 Find the confidence interval:


> confint(hq1,level = 0.95)
2.5 % 97.5 %
(Intercept) 5.844094645 7.673512770
log(area) 0.392175878 0.665502615
rooms 0.013795727 0.104866814
baths 0.050531961 0.187659796
age -0.004838062 -0.002687915
The 95% confidence interval of population coefficient of rooms is: (0,0138; 0,1049)

 Test exclusion restrictions: (Question 8)


> #Download and install package “car”.
> install.packages("car")

> #Thực hiện kiểm định F:


> library(car)
> linearHypothesis(hq1,c("log(area)=0","rooms=0"))
Linear hypothesis test

Hypothesis:
log(area) = 0
rooms = 0

Model 1: restricted model


Model 2: log(price) ~ log(area) + rooms + baths + age

Res.Df RSS Df Sum of Sq F Pr(>F)


1 318 31.663
2 316 25.595 2 6.0678 37.457 2.521e-15 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Instruction: look at p-value to conclude.

You might also like