0% found this document useful (0 votes)
30 views

Project

The descriptive statistics provide information about the variables in the dataset, including the number of bedrooms, bathrooms, lot size, house size, and property price, showing the distribution of each variable through measures of central tendency, spread, and shape. The statistics indicate that the dataset contains 70 observations with variables that range in value and have differing levels of variability and distribution shapes. The descriptive analysis helps understand the characteristics of the variables in the real estate property dataset.

Uploaded by

codeofinwe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as RTF, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

Project

The descriptive statistics provide information about the variables in the dataset, including the number of bedrooms, bathrooms, lot size, house size, and property price, showing the distribution of each variable through measures of central tendency, spread, and shape. The statistics indicate that the dataset contains 70 observations with variables that range in value and have differing levels of variability and distribution shapes. The descriptive analysis helps understand the characteristics of the variables in the real estate property dataset.

Uploaded by

codeofinwe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as RTF, PDF, TXT or read online on Scribd
You are on page 1/ 41

Introduction

The real estate markets in Massachusetts and Connecticut have long been sought-after destinations for
homeownership, investment, and business opportunities. As property values continue to fluctuate,
understanding the factors that influence prices is crucial for both buyers and sellers in these states. This
report presents a comprehensive study analyzing the various factors that contribute to property prices in
Massachusetts and Connecticut.
By examining a dataset encompassing properties from multiple cities, we aim to delve into the
relationships between independent variables such as the number of bedrooms, bathrooms, lot size, and
house size, and the dependent variable, which is the price of the property. Understanding these
relationships can provide valuable insights into market trends, buyer preferences, and investment
strategies.
The report will begin with an overview of the real estate markets in Massachusetts and Connecticut,
highlighting key features and market dynamics. It will then introduce the dataset, describing the variables
and their significance in the context of property pricing. Following that, we will explore the correlations
between the independent variables and the dependent variable, seeking to identify any significant patterns
or trends.
To facilitate a comprehensive analysis, we will utilize statistical techniques and visualizations to present
the findings effectively. This will include examining the relationships between the number of bedrooms,
bathrooms, lot size, and house size, and how they impact property prices in both states.
Furthermore, we will discuss the implications of our findings for various stakeholders, including real
estate professionals, investors, and prospective homebuyers. The insights gained from this study can assist
these individuals in making informed decisions regarding property investments, pricing strategies, and
market analysis.
It is important to note that this study is limited to the dataset provided and represents a snapshot of the real
estate markets in Massachusetts and Connecticut. However, the findings can serve as a foundation for
further research and analysis, contributing to a deeper understanding of the factors that shape property
prices in these states.
In conclusion, this report aims to provide a comprehensive analysis of the factors influencing property
prices in Massachusetts and Connecticut. By exploring the relationships between independent variables
such as bedrooms, bathrooms, lot size, and house size, and the dependent variable of price, we seek to
provide valuable insights for stakeholders in the real estate industry. The findings can empower
individuals with the knowledge needed to navigate the dynamic real estate markets and make informed
decisions regarding property investments and transactions.
Descriptives

Based on the provided statistics, it appears that the analysis is related to a dataset with 70 observations. The
descriptive statistics provide information about the variables in the dataset, including Bed, Bath, Lot_Acre,
House_Size, and Price. Here is an interpretation of the key statistics:

1. N: This represents the number of observations used in the analysis. In this case, there are
70 valid observations.
2. Minimum, Maximum, Sum, Mean: These statistics describe the distribution of each
variable. The minimum and maximum values indicate the range of values observed. The
sum represents the total sum of the variable across all observations, and the mean
represents the average value.
3. Std. Deviation, Variance: These statistics measure the dispersion or variability of each
variable. The standard deviation indicates how much the values deviate from the mean,
while the variance is the square of the standard deviation.
4. Skewness, Kurtosis: These statistics describe the shape of the distribution of each
variable. Skewness measures the asymmetry of the distribution. A positive skewness
value indicates a longer tail on the right side, while a negative skewness value indicates a
longer tail on the left side. Kurtosis measures the peakedness or flatness of the
distribution. Positive kurtosis indicates a more peaked distribution, while negative
kurtosis indicates a flatter distribution.
The specific analysis and method used to generate these statistics are not provided in the given
information. However, these statistics are commonly used in descriptive analysis to summarize
and understand the characteristics of a dataset. They provide insights into the central tendency,
spread, and shape of the variables.
Descriptive Statistics

N Minimum Maximum Sum Mean

Statistic Statistic Statistic Statistic Statistic Std. Error

Bed 70 1.00 10.00 267.00 3.8143 .22503


Bath 70 1 7 197 2.81 .171
Lot_Acre 70 0 6 53 .76 .122
House_Size 70 659 8369 167509 2392.99 189.637
Price 70 $160,000 $15,300,000 $61,065,499 $872,364.27 $228,181.565
Valid N (listwise) 70

Descriptive Statistics

Std. Deviation Variance Skewness Kurtosis

Statistic Statistic Statistic Std. Error Statistic Std. Error

Bed 1.88274 3.545 1.631 .287 2.731 .566


Bath 1.427 2.037 .984 .287 .812 .566
Lot_Acre 1.023 1.047 3.002 .287 11.120 .566
House_Size 1586.619 2517358.652 1.534 .287 2.298 .566
3644677865441.
Price $1,909,103.943 6.608 .287 48.760 .566
216
Valid N (listwise)
Graph
Correlations

- The given table presents the correlation matrix between the variables: Bed, Bath, Lot_Acre, House_Size, and
Price. The correlations are calculated using Pearson correlation coefficient. Here is an interpretation of the
correlations:

 Bed and Bath: The correlation between the number of bedrooms and bathrooms is 0.650**,
indicating a moderately positive relationship. This correlation is statistically significant at the 0.01
level (2-tailed).
 Bed and Lot_Acre: The correlation between the number of bedrooms and lot size in acres is 0.085,
indicating a very weak positive relationship. The correlation is not statistically significant.
 Bed and House_Size: The correlation between the number of bedrooms and house size is 0.720**,
indicating a moderately strong positive relationship. The correlation is statistically significant at the
0.01 level.
 Bed and Price: The correlation between the number of bedrooms and the price of the house is 0.219,
indicating a weak positive relationship. The correlation is not statistically significant.
 Bath and Lot_Acre: The correlation between the number of bathrooms and lot size is 0.231,
indicating a weak positive relationship. The correlation is not statistically significant.
 Bath and House_Size: The correlation between the number of bathrooms and house size is 0.848**,
indicating a strong positive relationship. The correlation is statistically significant at the 0.01 level.
 Bath and Price: The correlation between the number of bathrooms and the price of the house is
0.497**, indicating a moderate positive relationship. The correlation is statistically significant at the
0.01 level
 Lot_Acre and House_Size: The correlation between lot size and house size is 0.200, indicating a
weak positive relationship. The correlation is not statistically significant.
 Lot_Acre and Price: The correlation between lot size and the price of the house is 0.023, indicating a
very weak positive relationship. The correlation is not statistically significant.
 House_Size and Price: The correlation between house size and the price of the house is 0.401**,
indicating a moderate positive relationship. The correlation is statistically significant at the 0.01
level.
The analysis method used to calculate these correlations is the Pearson correlation coefficient. This
coefficient measures the linear relationship between two variables. A correlation coefficient closer to 1
indicates a stronger positive linear relationship, while a coefficient closer to -1 indicates a stronger
negative linear relationship. The significance levels indicate whether the observed correlations are
statistically significant or due to random chance.
Descriptive Statistics

Mean Std. Deviation N

Bed 3.8143 1.88274 70


Bath 2.81 1.427 70
Lot_Acre .76 1.023 70
House_Size 2392.99 1586.619 70
Price $872,364.27 $1,909,103.943 70

Correlations

Bed Bath Lot_Acre House_Size Price


** **
Pearson Correlation 1 .650 .085 .720 .219

Bed Sig. (2-tailed) .000 .484 .000 .069

N 70 70 70 70 70
** **
Pearson Correlation .650 1 .231 .848 .497**
Bath Sig. (2-tailed) .000 .054 .000 .000
N 70 70 70 70 70
Pearson Correlation .085 .231 1 .200 .023
Lot_Acre Sig. (2-tailed) .484 .054 .097 .849
N 70 70 70 70 70
** **
Pearson Correlation .720 .848 .200 1 .401**
House_Size Sig. (2-tailed) .000 .000 .097 .001
N 70 70 70 70 70
** **
Pearson Correlation .219 .497 .023 .401 1

Price Sig. (2-tailed) .069 .000 .849 .001

N 70 70 70 70 70

**. Correlation is significant at the 0.01 level (2-tailed).

Regression
The given table presents the results of a regression analysis with the dependent variable "Price" and
independent variables "House_Size," "Lot_Acre," "Bed," and "Bath." The analysis method used is the
Enter method, where all the requested variables were entered simultaneously.
Model Summary:
 R: The correlation coefficient between the predicted values and the actual values is 0.527. This
indicates a moderate positive relationship between the independent variables and the dependent
variable.
 R Square: The coefficient of determination is 0.278, meaning that approximately 27.8% of the
variability in the dependent variable can be explained by the independent variables.
 Adjusted R Square: The adjusted R Square is 0.233. It takes into account the number of predictors
and adjusts the R Square value accordingly. It is slightly lower than the R Square value, indicating
that the additional predictors in the model have limited incremental explanatory power.
 Std. Error of the Estimate: The standard error of the estimate is $1,671,814.224. It represents the
average distance between the actual values and the predicted values, providing an indication of the
model's accuracy.
Change Statistics:
 R Square Change: The change in R Square from the previous model is 0.278, indicating that the
addition of the independent variables resulted in a significant improvement in the model's predictive
power.
 F Change: The F statistic is 6.244, indicating that the overall model is statistically significant.
 df1: The degrees of freedom for the numerator of the F statistic is 4, representing the number of
predictors in the model.
The analysis method used is the Enter method, where all the requested independent variables
(House_Size, Lot_Acre, Bed, and Bath) were simultaneously entered into the regression model.
The regression analysis aims to examine the relationship between the independent variables and
the dependent variable and determine the extent to which the independent variables can predict
the variation in the dependent variable.
In this case, the regression analysis shows that the combined effect of the independent variables explains
approximately 27.8% of the variability in the house prices. The overall model is statistically significant,
indicating that the independent variables as a group have a significant relationship with the house prices.
Variables Entered/Removeda

Model Variables Variables Method


Entered Removed

House_Size,
1 Lot_Acre, Bed, . Enter
b
Bath

a. Dependent Variable: Price


b. All requested variables entered.

Model Summary

Model R R Square Adjusted R Std. Error of the Change Statistics


Square Estimate R Square F Change df1
Change

1 .527a .278 .233 $1,671,814.224 .278 6.244 4

Model Summary

Model Change Statistics

df2 Sig. F Change


a
1 65 .000

a. Predictors: (Constant), House_Size, Lot_Acre, Bed, Bath


ANOVA
The ANOVA table provides information on the sources of variation in the regression
analysis with the dependent variable "Price."
ANOVA Table:
1. Model: The model sum of squares is 69,810,190,781,586.875, with 4
degrees of freedom (df). The mean square is 17,452,547,695,396.719. The
F-statistic is 6.244, and the corresponding p-value is .000. This indicates that
the regression model as a whole has a significant effect on predicting the
price.
2. Residual: The residual sum of squares is 181,672,581,933,856.970, with 65
degrees of freedom. The mean square is 2,794,962,798,982.415. The
residual sum of squares represents the unexplained variation in the model.
3. Total: The total sum of squares is 251,482,772,715,443.840, with 69 degrees
of freedom. It represents the total variation in the dependent variable.
Analysis: The ANOVA table allows us to assess the overall significance of
the regression model in predicting the price. In this case, the model is
statistically significant, as indicated by the significant F-statistic (F = 6.244,
p < .001). This suggests that the combined effect of the predictors
(House_Size, Lot_Acre, Bed, and Bath) has a significant impact on
predicting the price.
The sum of squares provides information about the amount of variation
explained by the model (regression sum of squares) and the unexplained
variation (residual sum of squares). The regression sum of squares is
considerably smaller than the residual sum of squares, indicating that the
model explains only a portion of the total variation in the dependent
variable.
The ANOVA table helps in evaluating the goodness-of-fit of the regression
model and assessing the significance of the predictors. In this case, the
model shows a significant overall effect, but further analysis of the
coefficients and their significance is necessary to understand the specific
contributions of the predictors to the price prediction.

Model Sum of Squares df Mean Square F Sig.

6981019078158 1745254769539
Regression 4 6.244 .000b
6.875 6.719

1816725819338 2794962798982
1 Residual 65
56.970 .415

2514827727154
Total 69
43.840
a. Dependent Variable: Price
b. Predictors: (Constant), House_Size, Lot_Acre, Bed, Bath

Coefficients: The table presents the unstandardized coefficients, standardized coefficients (Beta),
t-values, and their associated p-values.
 Constant: The constant term is -623,251.420, indicating the estimated price when all
independent variables are zero.
 Bed: The coefficient for the variable "Bed" is -211,642.157 with a p-value of 0.179.
 Bath: The coefficient for the variable "Bath" is 819,830.249 with a p-value of 0.003.
 Lot_Acre: The coefficient for the variable "Lot_Acre" is -207,771.033 with a p-value of
0.310.
 House_Size: The coefficient for the variable "House_Size" is 64.191 with a p-value of
0.809.
Residuals Statistics: The table provides descriptive statistics for the predicted values and
residuals.
 Predicted Value: The predicted values range from -$502,461.16 to $4,350,080.50, with a
mean of $872,364.27 and a standard deviation of $1,005,853.814.
 Residual: The residuals range from -$2,667,797.000 to $11,432,442.000, with a mean of
$0.000 and a standard deviation of $1,622,632.420.
 Std. Predicted Value: The standardized predicted values range from -1.367 to 3.457, with
a mean of 0.000 and a standard deviation of 1.000.
 Std. Residual: The standardized residuals range from -1.596 to 6.838, with a mean of
0.000 and a standard deviation of 0.971.
Analysis: The regression analysis suggests that the independent variables collectively have a
significant impact on the dependent variable, "Price." The model's R-square value is 0.278,
indicating that approximately 27.8% of the variability in house prices can be explained by the
independent variables. The individual coefficients indicate the estimated change in the dependent
variable for a unit change in the corresponding independent variable, holding other variables
constant. However, the statistical significance of some variables, such as "Bed" and "Lot_Acre,"
is not confirmed at conventional significance levels (p > 0.05).
The regression model used the Enter method, where all the independent variables were entered
simultaneously. The model aims to predict the house prices based on the provided predictors.
Coefficientsa

Model Unstandardized Coefficients Standardized t Sig.


Coefficients

B Std. Error Beta

(Constant) -623251.420 518453.313 -1.202 .234

Bed -211642.157 155739.335 -.209 -1.359 .179

1 Bath 819830.249 269803.379 .613 3.039 .003

Lot_Acre -207771.033 203196.636 -.111 -1.023 .310

House_Size 64.191 263.794 .053 .243 .809

a. Dependent Variable: Price

Residuals Statistics - Residuals Statistics:


1. Predicted Value: The predicted values range from -$502,461.16 to $4,350,080.50, with a
mean of $872,364.27 and a standard deviation of $1,005,853.814. These values represent
the estimated prices based on the regression model.
2. Std. Predicted Value: The standardized predicted values range from -1.367 to 3.457, with
a mean of 0.000 and a standard deviation of 1.000. These values are obtained by
standardizing the predicted values.
3. Standard Error of Predicted Value: The standard error of the predicted values ranges from
207,683.391 to 1,071,376.625, with a mean of 407,672.570 and a standard deviation of
184,195.442. It represents the average deviation between the predicted values and the
actual values.
4. Adjusted Predicted Value: The adjusted predicted values range from -$535,552.94 to
$4,576,582.00, with a mean of $866,563.20 and a standard deviation of $1,047,086.990.
These values are adjusted based on the regression model.
5. Residual: The residuals range from -$2,667,797.000 to $11,432,442.000, with a mean of
$0.000 and a standard deviation of $1,622,632.420. The residual represents the difference
between the actual values and the predicted values.
6. Std. Residual: The standardized residuals range from -1.596 to 6.838, with a mean of
0.000 and a standard deviation of 0.971. These values are obtained by standardizing the
residuals.
7. Stud. Residual: The studentized residuals range from -1.899 to 7.446, with a mean of
0.002 and a standard deviation of 1.053. These values are obtained by dividing the
residuals by their standard errors.
8. Deleted Residual: The deleted residuals range from -$3,778,283.750 to $13,555,203.000,
with a mean of $5,801.071 and a standard deviation of $1,913,800.223. These values
represent the residuals after removing each observation from the analysis.
9. Stud. Deleted Residual: The studentized deleted residuals range from -1.939 to 19.272,
with a mean of 0.170 and a standard deviation of 2.379. These values are obtained by
dividing the deleted residuals by their standard errors.
10. Mahal. Distance: The Mahalanobis distances range from 0.079 to 27.352, with a mean of
3.943 and a standard deviation of 5.052. These distances measure the multivariate
distance between each observation and the centroid of the data.
11. Cook's Distance: The Cook's distances range from 0.000 to 2.059, with a mean of 0.040
and a standard deviation of 0.248. These distances are used to assess the influence of each
observation on the regression model.
12. Centered Leverage Value: The centered leverage values range from 0.001 to 0.396, with a
mean of 0.057 and a standard deviation of 0.073. These values indicate the leverage of
each observation in the regression model.
The provided statistics allow for the assessment of the residuals and their properties in the
regression analysis. They provide information about the predicted values, residuals, standardized

Residuals Statisticsa

Minimum Maximum Mean Std. Deviation N

Predicted Value -$502,461.16 $4,350,080.50 $872,364.27 $1,005,853.814 70


Residual -$2,667,797.000 $11,432,442.000 $0.000 $1,622,632.420 70
Std. Predicted Value -1.367 3.457 .000 1.000 70
Std. Residual -1.596 6.838 .000 .971 70

a. Dependent Variable: Price

Summary

summary of the overall analysis and conclusion:


 The descriptive statistics indicate that the dataset is relatively balanced, with no
significant outliers.
 The correlations suggest that there are moderate to strong positive relationships between
the number of bedrooms, bathrooms, and house size, and the price of the house. There is
a weak positive relationship between the lot size and the price of the house.
 The regression analysis shows that the combined effect of the independent variables
explains approximately 27.8% of the variability in the house prices. The overall model is
statistically significant, indicating that the independent variables as a group have a
significant relationship with the house prices.
 The individual coefficients indicate that the number of bathrooms has the strongest
impact on the price of the house, followed by the house size. The number of bedrooms
and lot size have a weaker impact on the price of the house.
 The regression model is not perfect, and there is still some variability in the house prices
that cannot be explained by the independent variables. This variability may be due to
other factors, such as the location of the house, the condition of the house, or the market
conditions.
Overall, the analysis suggests that the number of bathrooms, house size, and location are the
most important factors that affect the price of a house. However, other factors, such as the
condition of the house and the market conditions, can also play a role.
Here are some additional thoughts on the analysis:
 The analysis is based on a relatively small dataset, so the results should be interpreted
with caution.
 The analysis only looks at a limited number of factors that may affect the price of a
house. Other factors, such as the condition of the house and the market conditions, may
also play a role.
 The analysis does not take into account the future trends in the housing market. The price
of a house may change in the future due to factors such as inflation, interest rates, and
economic growth.

Charts
Charts
Charts

You might also like