0% found this document useful (0 votes)

166 views10 pages

Leslie Salt Property Project Report

Q1. What is the nature of each of the variables? Which variable is dependent variable and what are the independent variables in the model? - Price is the dependent variable and all other variables are independent. Q2. Check whether the variables require any transformation individually - The independent variables Flood and County should be factor variables and not integer. Converted them as factor variables while doing the project. Q3. Set up a regression equation, run the model and discuss your

Uploaded by

Agnish Kar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

166 views10 pages

Leslie Salt Property Project Report

Uploaded by

Agnish Kar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Advance Statictics Project – Problem 2 -

Leslie Salt Data Set

Q2. Check whether the variables require any transformation individually - The independent
variables Flood and County should be factor variables and not integer. Converted them as factor
variables while doing the project.

Q3. Set up a regression equation, run the model and discuss your results –

Price of the property if sold in next 3 months:

1st month - $17688.22/acre

2nd month - $17819.05/acre

3rd month - $17949.89/acre

Source Code:

##Leslie Salt Data Set

##Load and analyse the structure of the dataset

library(readxl)

LSdata = read_excel("Dataset_LeslieSalt.xlsx")

str(LSdata)

summary(LSdata)

#Nature of Variables - Price is the dependent variable and all other variables are
independent.

#Transformation - The independent variables Flood and County should be factor variables
and not integer.

#Converting Flood & County variables to factor

LSdata$County = factor(LSdata$County,

levels=c("0","1"),

labels=c("San Mateo", "Santa Clara"))

LSdata$Flood = factor(LSdata$Flood,

levels=c("0","1"),

labels=c("No", "Yes" ))

str(LSdata)

summary(LSdata)

# Verify the data for Null Values

sapply(LSdata,function(x){sum(is.na(x))})

#Analyse Price using plots for identifying outliers and correlations

boxplot(LSdata$Price)

#Removing the outlier

LSdata = LSdata[-26,]

boxplot(LSdata$Price)

#Checking the corelation

LSmatrix <- as.matrix(dplyr::select_if(LSdata, is.numeric))

corrplot(cor(LSmatrix), method = "circle",

type="full",

order = "hclust",

tl.col = "black")

#Loading the corrplot library as it was not loaded previously

library(corrplot)

#Corelation observations:

##Price has a positive correlation with Elevation and Date.

##Price has a negative correlation with Sewer.

##Price has negligible correlation with Size and Distance.

#First Model with all independent variables

LSdatamodel1 = lm(Price ~., data = LSdata)

summary(LSdatamodel1)

##As p-value is very less, this model is a valid one.

##When analyzing the p-values, it is observed that the variables County, Size, Sewer and
Distance

##have a high p-value. Therefore, we will ignore these variables in next regression model.

#second Model without County, Size, Sewer and Distance variables

LSdatamodel2 = lm(Price ~.-County-Size-Sewer-Distance, data = LSdata)

summary(LSdatamodel2)

##As p-value is very less, this model is a valid one.

##However, R-squared value has decreased compared to the previous model, therefore this
model is rejected.

##Distance and Size variables will be removed from the model as they are correlated as per
the corrplot

##and this creates a problem of Multicollinearity.

#Third Model without Size and Distance variables.

LSdatamodel3 = lm(Price ~.-Distance -Size, data = LSdata)

summary(LSdatamodel3)

##As p-value is very less, this model is a valid one. Also the R square value is higher than
Secound Model.

##Hence we will go with this model.

#Predicting the price of the Leslie Salt property

##County = Santa Clara, Size = 246.8, Elevation = 0 (property at sea level),Sewer = 0(no data
provided),
##Date = 6 (assuming property will be sold in next 6 months), Flood = 0 (property diked),
Distance = 0

##(as distance is relative to Leslie Salt property)

LeslieProperty = data.frame(Price = 0, County = "Santa Clara", Size = 246.8, Elevation = 0,

Sewer = 0, Date = c(1,2,3), Flood = "No" , Distance = 0)

LeslieProperty$PredictedPrice <- predict(LSdatamodel3,LeslieProperty) * 1000

LeslieProperty$PredictedPrice

=================================Output:

> getwd()
[1] "E:/Analytics/R/Advance Statistics Project"
> ##Leslie Salt Data Set
> ##Load and analyse the structure of the dataset
> LSdata <- read_excel("Dataset_LeslieSalt.xlsx")
Error in read_excel("Dataset_LeslieSalt.xlsx") :
could not find function "read_excel"
> ##Leslie Salt Data Set
> ##Load and analyse the structure of the dataset
> library(readxl)
> LSdata <- read_excel("Dataset_LeslieSalt.xlsx")
> str(LSdata)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 31 obs. of 8 variables:
$ Price : num 4.5 10.6 1.7 5 5 3.3 5.7 6.2 19.4 3.2 ...
$ County : num 1 1 0 0 0 1 1 1 1 1 ...
$ Size : num 138.4 52 16.1 1695.2 845 ...
$ Elevation: num 10 4 0 1 1 2 4 4 20 0 ...
$ Sewer : num 3000 0 2640 3500 1000 10000 0 0 1300 6000 ...
$ Date : num -103 -103 -98 -93 -92 -86 -68 -64 -63 -62 ...
$ Flood : num 0 0 1 0 1 0 0 0 0 0 ...
$ Distance : num 0.3 2.5 10.3 14 14 0 0 0 1.2 0 ...
> summary(LSdata)
Price County Size Elevation Sewer Date Flood
Min. : 1.70 Min. :0.0000 Min. : 6.90 Min. : 0.000 Min. : 0 Min. :-103.00 Min.
:0.0000
1st Qu.: 5.35 1st Qu.:0.0000 1st Qu.: 20.35 1st Qu.: 2.000 1st Qu.: 0 1st Qu.: -63.50 1st
Qu.:0.0000
Median :11.70 Median :1.0000 Median : 51.40 Median : 4.000 Median : 900 Median : -
59.00 Median :0.0000
Mean :11.95 Mean :0.6129 Mean : 139.97 Mean : 4.645 Mean : 1981 Mean : -
58.65 Mean :0.1613
3rd Qu.:16.05 3rd Qu.:1.0000 3rd Qu.: 104.10 3rd Qu.: 7.000 3rd Qu.: 3450 3rd Qu.: -
51.00 3rd Qu.:0.0000
Max. :37.20 Max. :1.0000 Max. :1695.20 Max. :20.000 Max. :10000 Max. : -4.00
Max. :1.0000
Distance
Min. : 0.000
1st Qu.: 0.850
Median : 4.900
Mean : 5.132
3rd Qu.: 5.500
Max. :16.500
> # Verify the data for Null Values
> sapply(LSdata,function(x){sum(is.na(x))})
Price County Size Elevation Sewer Date Flood Distance
0 0 0 0 0 0 0 0
> #Nature of Variables - Price is the dependent variable and all other variables are independent.
> #Transformation - The independent variables Flood and County should be factor variables and
not integer.
> LSdata$County <- factor(LSdata$County,
+ levels=c("0","1"),
+ labels=c("San Mateo", "Santa Clara"))
> LSdata$Flood <- factor(LSdata$Flood,
+ levels=c("0","1"),
+ labels=c("No", "Yes" ))
> str(LSdata)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 31 obs. of 8 variables:
$ Price : num 4.5 10.6 1.7 5 5 3.3 5.7 6.2 19.4 3.2 ...
$ County : Factor w/ 2 levels "San Mateo","Santa Clara": 2 2 1 1 1 2 2 2 2 2 ...
$ Size : num 138.4 52 16.1 1695.2 845 ...
$ Elevation: num 10 4 0 1 1 2 4 4 20 0 ...
$ Sewer : num 3000 0 2640 3500 1000 10000 0 0 1300 6000 ...
$ Date : num -103 -103 -98 -93 -92 -86 -68 -64 -63 -62 ...
$ Flood : Factor w/ 2 levels "No","Yes": 1 1 2 1 2 1 1 1 1 1 ...
$ Distance : num 0.3 2.5 10.3 14 14 0 0 0 1.2 0 ...
> summary(LSdata)
Price County Size Elevation Sewer Date Flood
Min. : 1.70 San Mateo :12 Min. : 6.90 Min. : 0.000 Min. : 0 Min. :-103.00 No :26
1st Qu.: 5.35 Santa Clara:19 1st Qu.: 20.35 1st Qu.: 2.000 1st Qu.: 0 1st Qu.: -63.50
Yes: 5
Median :11.70 Median : 51.40 Median : 4.000 Median : 900 Median : -59.00
Mean :11.95 Mean : 139.97 Mean : 4.645 Mean : 1981 Mean : -58.65
3rd Qu.:16.05 3rd Qu.: 104.10 3rd Qu.: 7.000 3rd Qu.: 3450 3rd Qu.: -51.00
Max. :37.20 Max. :1695.20 Max. :20.000 Max. :10000 Max. : -4.00
Distance
Min. : 0.000
1st Qu.: 0.850
Median : 4.900
Mean : 5.132
3rd Qu.: 5.500
Max. :16.500
> #Analyse Price using plots for identifying outliers and correlations
> boxplot(LeslieSaltData$Price)
Error in boxplot(LeslieSaltData$Price) :
object 'LeslieSaltData' not found
> #Analyse Price using plots for identifying outliers and correlations
> boxplot(LSdata$Price)
> LSData <- LSData[-26,]
Error: object 'LSData' not found
> LSdata = LSdata[-26,]
> boxplot(LSdata$Price)
> #Checking the corelation
> LSmatrix <- as.matrix(dplyr::select_if(LSdata, is.numeric))
> corrplot(cor(LSmatrix), method = "circle",
+ type="full",
+ order = "hclust",
+ tl.col = "black")
Error in corrplot(cor(LSmatrix), method = "circle", type = "full", order = "hclust", :
could not find function "corrplot"
> library(corrplot)
corrplot 0.84 loaded
> corrplot(cor(LSmatrix), method = "circle",
+ type="full",
+ order = "hclust",
+ tl.col = "black")
> #Corelation observations:
> ##Price has a positive correlation with Elevation and Date.
> ##Price has a negative correlation with Sewer.
> ##Price has negligible correlation with Size and Distance.
> #First Model with all independent variables
> LSdatamodel1 = lm(Price ~., data = LSdata)
> summary(LSdatamodel1)

Call:
lm(formula = Price ~ ., data = LSdata)

Residuals:
Min 1Q Median 3Q Max
-3.7059 -2.6043 -0.3876 2.2315 4.7774

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 18.6267827 2.9067195 6.408 1.9e-06 ***
CountySanta Clara -2.6365930 2.8842949 -0.914 0.37056
Size -0.0034320 0.0025420 -1.350 0.19070
Elevation 0.5407713 0.1693998 3.192 0.00421 **
Sewer -0.0005078 0.0003100 -1.638 0.11563
Date 0.1279277 0.0356334 3.590 0.00163 **
FloodYes -7.8400025 2.2885764 -3.426 0.00242 **
Distance 0.4097406 0.2453188 1.670 0.10904
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.145 on 22 degrees of freedom

Multiple R-squared: 0.8069, Adjusted R-squared: 0.7454
F-statistic: 13.13 on 7 and 22 DF, p-value: 1.493e-06

> ##As p-value is very less, this model is a valid one.

> ##When analyzing the p-values, it is observed that the variables County, Size, Sewer and
Distance
> ##have a high p-value. Therefore, we will ignore these variables in next regression model.
> #second Model without County, Size, Sewer and Distance variables
> LSdatamodel2 = lm(Price ~.-County-Size-Sewer-Distance, data = LSdata)
> summary(LSdatamodel2)

Call:
lm(formula = Price ~ . - County - Size - Sewer - Distance, data = LSdata)

Residuals:
Min 1Q Median 3Q Max
-5.5172 -2.8233 -0.2048 2.6765 6.6460

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 19.2331 2.0181 9.530 5.72e-10 ***
Elevation 0.5477 0.1698 3.226 0.00338 **
Date 0.1696 0.0283 5.994 2.50e-06 ***
FloodYes -3.6172 1.9813 -1.826 0.07941 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.752 on 26 degrees of freedom
Multiple R-squared: 0.6751, Adjusted R-squared: 0.6376
F-statistic: 18.01 on 3 and 26 DF, p-value: 1.57e-06

> ##As p-value is very less, this model is a valid one.

> ##However, R-squared value has decreased compared to the previous model, therefore this
model is rejected.
> ##Distance and Size variables will be removed from the model as they are correlated as per
the corrplot
> ##and this creates a problem of Multicollinearity.
> #Third Model without Size and Distance variables.
> LSdatamodel3 = lm(Price ~.-Distance -Size, data = LSdata)
> summary(LSdatamodel3)

Call:
lm(formula = Price ~ . - Distance - Size, data = LSdata)

Residuals:
Min 1Q Median 3Q Max
-5.0186 -2.2651 -0.3114 2.1549 5.1596

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 22.0187525 1.9634490 11.214 5.01e-11 ***
CountySanta Clara -4.4613706 1.8189990 -2.453 0.02183 *
Elevation 0.5086667 0.1726287 2.947 0.00704 **
Sewer -0.0006846 0.0002789 -2.455 0.02173 *
Date 0.1308357 0.0276699 4.728 8.28e-05 ***
FloodYes -7.6795702 2.1524916 -3.568 0.00156 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.252 on 24 degrees of freedom

Multiple R-squared: 0.7747, Adjusted R-squared: 0.7278
F-statistic: 16.51 on 5 and 24 DF, p-value: 4.372e-07

> ##As p-value is very less, this model is a valid one. Also the R square value is higher than
Secound Model.
> ##Hence we will go with this model.
> #Predicting the price of the Leslie Salt property
> ##County = Santa Clara, Size = 246.8, Elevation = 0 (property at sea level),Sewer = 0(no data
provided),
> ##Date = 6 (assuming property will be sold in next 6 months), Flood = 0 (property diked),
Distance = 0
> ##(as distance is relative to Leslie Salt property)
> leslie_salt = data.frame(0,"Santa Clara",246.8,0,0,6,"No",0)
> colnames(leslie_salt) = c("Price", "County", "Size", "Elevation", "Sewer", "Date", "Flood",
"Distance")
> data=rbind(data,leslie_salt)
Error in rep(xi, length.out = nvar) :
attempt to replicate an object of type 'closure'
> leslie_salt_price = predict(LSdatamodel3, newdata = data[32,])
Error in data[32, ] : object of type 'closure' is not subsettable
> leslie_salt_price
Error: object 'leslie_salt_price' not found
> data = rbind(data,leslie_salt)
Error in rep(xi, length.out = nvar) :
attempt to replicate an object of type 'closure'
> LSdata = rbind(LSdata,leslie_salt)
> leslie_salt_price = predict(LSdatamodel3, newdata = data[32,])
Error in data[32, ] : object of type 'closure' is not subsettable
> leslie_salt_price = predict(LSdatamodel3, newdata = LSdata[32,])
> leslie_salt_price
1
NA
> leslie_salt_price
1
NA
> ##As p-value is very less, this model is a valid one. Also the R square value is higher than
Secound Model.
> ##Hence we will go with this model.
> #Predicting the price of the Leslie Salt property
> ##County = Santa Clara, Size = 246.8, Elevation = 0 (property at sea level),Sewer = 0(no data
provided),
> ##Date = 6 (assuming property will be sold in next 6 months), Flood = 0 (property diked),
Distance = 0
> ##(as distance is relative to Leslie Salt property)
> LeslieProperty = data.frame(Price = 0, County = "Santa Clara", Size = 246.8, Elevation = 0,
Sewer = 0, Date = c(1,2,3), Flood = "No" , Distance = 0)
> LeslieProperty$PredictedPrice <- predict(LSdatamodel3,LeslieProperty) * 1000
> LeslieProperty$PredictedPrice
[1] 17688.22 17819.05 17949.89

Dentistry 101
No ratings yet
Dentistry 101
141 pages
R - Programming - Fundamentals - PPT 1
No ratings yet
R - Programming - Fundamentals - PPT 1
14 pages
Faculty of Science: Jadavpur University
No ratings yet
Faculty of Science: Jadavpur University
57 pages
Startup Finance
No ratings yet
Startup Finance
72 pages
Stastistics and Probability With R Programming Language: Lab Report
50% (2)
Stastistics and Probability With R Programming Language: Lab Report
44 pages
Ecological PDF
No ratings yet
Ecological PDF
10 pages
Aiims Gun Shot Sample Papers & Solutions (1 - 6)
No ratings yet
Aiims Gun Shot Sample Papers & Solutions (1 - 6)
182 pages
Andhra PDF
No ratings yet
Andhra PDF
687 pages
SRB's Surgery For Dental Students
No ratings yet
SRB's Surgery For Dental Students
471 pages
Statistics Cheat Sheet-Harvard
100% (1)
Statistics Cheat Sheet-Harvard
14 pages
NPA: Asset Classification and Provisioning Norms
No ratings yet
NPA: Asset Classification and Provisioning Norms
54 pages
Ch. 5 Linear Models & Matrix Algebra
No ratings yet
Ch. 5 Linear Models & Matrix Algebra
61 pages
New Cooperative Business Management Curriculum
No ratings yet
New Cooperative Business Management Curriculum
159 pages
Net 2020
No ratings yet
Net 2020
48 pages
Atlas of Periodontology
No ratings yet
Atlas of Periodontology
68 pages
Training & Support - Exocad
No ratings yet
Training & Support - Exocad
1 page
Introduction To Strategic Marketing & Corporate Business and Marketing Strategy
No ratings yet
Introduction To Strategic Marketing & Corporate Business and Marketing Strategy
97 pages
Atlas of Minor Oral Surgery
No ratings yet
Atlas of Minor Oral Surgery
141 pages
Kenotes - ComDent, PH, Prostho
No ratings yet
Kenotes - ComDent, PH, Prostho
30 pages
Jyotsna Rao - QRS 4th Year - Community Dentistry - WWW - Thedentalhub.org - in
No ratings yet
Jyotsna Rao - QRS 4th Year - Community Dentistry - WWW - Thedentalhub.org - in
64 pages
Multiple Linear Regression Housing Case Study PDF
No ratings yet
Multiple Linear Regression Housing Case Study PDF
151 pages
BMRCL English Annual Report - 2021-22
No ratings yet
BMRCL English Annual Report - 2021-22
180 pages
1.MDS Final Entrance Model Questions 2078
No ratings yet
1.MDS Final Entrance Model Questions 2078
16 pages
Chapter 16. Simultaneous Equations Models
No ratings yet
Chapter 16. Simultaneous Equations Models
23 pages
Neet Mds 2023 - Recall and Strike Rate
No ratings yet
Neet Mds 2023 - Recall and Strike Rate
560 pages
The Valuation and Characteristics of Bonds
100% (1)
The Valuation and Characteristics of Bonds
56 pages
Multiple Regression Analysis
100% (1)
Multiple Regression Analysis
27 pages
Dental Indices
100% (4)
Dental Indices
71 pages
Lesllie Salt Company
No ratings yet
Lesllie Salt Company
15 pages
Chapter 1 To 3 HH Edited
No ratings yet
Chapter 1 To 3 HH Edited
17 pages
Adobe Flash Lecture
No ratings yet
Adobe Flash Lecture
10 pages
2 Lecture2 Codenotes
No ratings yet
2 Lecture2 Codenotes
11 pages
Problem Set 3: General Guideline
No ratings yet
Problem Set 3: General Guideline
12 pages
07exercise Solution
No ratings yet
07exercise Solution
9 pages
Squared Ranks For Variance
No ratings yet
Squared Ranks For Variance
7 pages
Converted R
No ratings yet
Converted R
8 pages
Big-O Performance Analysis: - Computer: - Compiler: - Data
No ratings yet
Big-O Performance Analysis: - Computer: - Compiler: - Data
13 pages
INDEX Merged
No ratings yet
INDEX Merged
24 pages
Ortho-Pedia Sem 2
No ratings yet
Ortho-Pedia Sem 2
6 pages
Gdpforecast.r: Rehanshu Vij 2020-12-10
No ratings yet
Gdpforecast.r: Rehanshu Vij 2020-12-10
10 pages
Project 2
No ratings yet
Project 2
5 pages
BAYE's Theorm
No ratings yet
BAYE's Theorm
27 pages
1 11 Dent Mat
No ratings yet
1 11 Dent Mat
33 pages
Analysis Course HW1
No ratings yet
Analysis Course HW1
5 pages
3akxzvpy1 WC
100% (1)
3akxzvpy1 WC
16 pages
Tlpcs Scheme Under Coctrasi
No ratings yet
Tlpcs Scheme Under Coctrasi
4 pages
Data Warehousing and Data Mining Lab Manual
0% (1)
Data Warehousing and Data Mining Lab Manual
30 pages
Oral Path Cram
No ratings yet
Oral Path Cram
10 pages
3rd BDS Books List
No ratings yet
3rd BDS Books List
7 pages
Analysis of Statistical Software With Special Reference To Statistical Package For Social Sciences (SPSS)
No ratings yet
Analysis of Statistical Software With Special Reference To Statistical Package For Social Sciences (SPSS)
29 pages
As 2
No ratings yet
As 2
10 pages
Toc PDF
No ratings yet
Toc PDF
13 pages
Assignment 1
No ratings yet
Assignment 1
3 pages
Unit 2 Time Value of Money
No ratings yet
Unit 2 Time Value of Money
4 pages
R Cheat Sheet (Updated)
No ratings yet
R Cheat Sheet (Updated)
13 pages
Teeth Discolouration: Dr. Afrah Adnan Aldeliaimi
No ratings yet
Teeth Discolouration: Dr. Afrah Adnan Aldeliaimi
16 pages
Practical Research 2 Diagnostic Test
No ratings yet
Practical Research 2 Diagnostic Test
4 pages
Text Book Answers Unit 11
100% (2)
Text Book Answers Unit 11
16 pages
Oral Health Assessment Form - SHCN
No ratings yet
Oral Health Assessment Form - SHCN
4 pages
Data Scales and Representation: Prof. Asim Tewari IIT Bombay
No ratings yet
Data Scales and Representation: Prof. Asim Tewari IIT Bombay
27 pages
NAT Reviewer Statistics and Probability For Printing
No ratings yet
NAT Reviewer Statistics and Probability For Printing
6 pages
Multiple Choice Questions Introduction To Number Theory
No ratings yet
Multiple Choice Questions Introduction To Number Theory
4 pages
Reference Books BDS 280508
100% (1)
Reference Books BDS 280508
5 pages
Introduction To Oral Histology
No ratings yet
Introduction To Oral Histology
30 pages
Using Deep Learning For Predictive Maintenance Slides
100% (1)
Using Deep Learning For Predictive Maintenance Slides
12 pages
Assignment Exercise 1 2 Hijada
No ratings yet
Assignment Exercise 1 2 Hijada
4 pages
A Need To Balance Between Human Behaviour & Artificial Intelligence
No ratings yet
A Need To Balance Between Human Behaviour & Artificial Intelligence
128 pages
AIPG 2006 (Dental)
0% (1)
AIPG 2006 (Dental)
14 pages
Research Paper On English Learning Language by Secondary School Students
No ratings yet
Research Paper On English Learning Language by Secondary School Students
10 pages
Course-8 (B) Mathematics (Part-1)
No ratings yet
Course-8 (B) Mathematics (Part-1)
116 pages
Unit Guide: Mat102 Statistics For Business Trimester 3 2021
No ratings yet
Unit Guide: Mat102 Statistics For Business Trimester 3 2021
12 pages
Research Methodology
No ratings yet
Research Methodology
104 pages
4TH QUARTER EXAM - Students
No ratings yet
4TH QUARTER EXAM - Students
3 pages
Assignment SQQS 2013
100% (1)
Assignment SQQS 2013
4 pages
PR2 Lesson 1
No ratings yet
PR2 Lesson 1
5 pages
Annotated-Part20skittles 20project
No ratings yet
Annotated-Part20skittles 20project
2 pages
Influence of Social Media Marketing On Brand Image of Mamaearth
No ratings yet
Influence of Social Media Marketing On Brand Image of Mamaearth
50 pages
Topic 6 - Confidence Interval Slides
No ratings yet
Topic 6 - Confidence Interval Slides
34 pages
UNSW Master of Data Science
No ratings yet
UNSW Master of Data Science
20 pages
Source Code Attractiveness PDF
No ratings yet
Source Code Attractiveness PDF
10 pages
SK Learn 1
No ratings yet
SK Learn 1
11 pages
Effects of Resistance Training in Children and Adolescents A Meta-Analysis
No ratings yet
Effects of Resistance Training in Children and Adolescents A Meta-Analysis
14 pages
Forward Backward Chaining
No ratings yet
Forward Backward Chaining
5 pages
Liu Et Al. 2018 The Effect of Sample Size On Distribution Models
No ratings yet
Liu Et Al. 2018 The Effect of Sample Size On Distribution Models
14 pages
SSRN Id3449848 PDF
No ratings yet
SSRN Id3449848 PDF
40 pages
Globalization and Perceptions of Policy Maker Competence Evidence From France
No ratings yet
Globalization and Perceptions of Policy Maker Competence Evidence From France
14 pages
Practical 2 - T-Test - Practical
No ratings yet
Practical 2 - T-Test - Practical
4 pages
Coursework Brief
No ratings yet
Coursework Brief
3 pages
Nonlinearity Test Summary - Bima
No ratings yet
Nonlinearity Test Summary - Bima
4 pages
Matrices with MATLAB (Taken from "MATLAB for Beginners: A Gentle Approach")
From Everand
Matrices with MATLAB (Taken from "MATLAB for Beginners: A Gentle Approach")
Peter Kattan
3/5 (4)

Leslie Salt Property Project Report

Uploaded by

Leslie Salt Property Project Report

Uploaded by

Advance Statictics Project – Problem 2 -

Leslie Salt Data Set

Price of the property if sold in next 3 months:

1st month - $17688.22/acre

2nd month - $17819.05/acre

3rd month - $17949.89/acre

##Leslie Salt Data Set

##Load and analyse the structure of the dataset

#Converting Flood & County variables to factor

labels=c("San Mateo", "Santa Clara"))

# Verify the data for Null Values

#Analyse Price using plots for identifying outliers and correlations

#Removing the outlier

#Checking the corelation

LSmatrix <- as.matrix(dplyr::select_if(LSdata, is.numeric))

corrplot(cor(LSmatrix), method = "circle",

#Loading the corrplot library as it was not loaded previously

##Price has a positive correlation with Elevation and Date.

##Price has a negative correlation with Sewer.

##Price has negligible correlation with Size and Distance.

LSdatamodel1 = lm(Price ~., data = LSdata)

##As p-value is very less, this model is a valid one.

#second Model without County, Size, Sewer and Distance variables

LSdatamodel2 = lm(Price ~.-County-Size-Sewer-Distance, data = LSdata)

##As p-value is very less, this model is a valid one.

##and this creates a problem of Multicollinearity.

#Third Model without Size and Distance variables.

LSdatamodel3 = lm(Price ~.-Distance -Size, data = LSdata)

##Hence we will go with this model.

#Predicting the price of the Leslie Salt property

##(as distance is relative to Leslie Salt property)

LeslieProperty = data.frame(Price = 0, County = "Santa Clara", Size = 246.8, Elevation = 0,

LeslieProperty$PredictedPrice <- predict(LSdatamodel3,LeslieProperty) * 1000

Residual standard error: 3.145 on 22 degrees of freedom

> ##As p-value is very less, this model is a valid one.

> ##As p-value is very less, this model is a valid one.

Residual standard error: 3.252 on 24 degrees of freedom

You might also like