Group 8 (A) Final Term Paper
Group 8 (A) Final Term Paper
Submitted to:
Md Abdul Hannan Miah
Submitted By:
Professor
Md. Fajle Rabbi 029-13-037
Department of management
Information Systems Mir Yamin Uddin Zidan 029-13-039
University of Dhaka Golam Mehbub Sifat 029-13-063
Afnanul Hoque 029-13-125
Muhammad Shakil Ahmed 029-13-251
, 2022
Abstract:
This study analyses the relationship between various factors and house prices per square foot.
Dependent variables, including house price per square foot, are influenced by independent
variables like transaction date, house age, distance to the nearest MRT station, number of
convenience stores, latitude, and longitude. Descriptive analysis provides insights into these
variables' characteristics. Correlation analysis reveals relationships, such as proximity to MRT
stations impacting house prices. Multiple regression analysis indicates that independent variables
collectively explain 58.24% of the variation in house prices per square foot. The study's statistical
significance is confirmed through ANOVA, and coefficient interpretation highlights the impact of
each variable on house prices.
Page 1 of 26
Introduction:
In the realm of data analysis and statistical research, understanding the intricate relationships
between variables is essential to unravelling the complexities of various phenomena. This study
delves into a comprehensive analysis of a dataset centred around house pricing, utilizing a range
of techniques to explore the dynamics between dependent and independent variables. The core
objective is to discern how certain factors influence the House Price Per Square Foot, shedding
light on critical insights that can inform decisions in real estate and urban planning.
The study's foundation rests upon the concept of dependent and independent variables. Dependent
variables are the subjects of observation, responding to changes in independent variables. These
variables are the focus of predictions and modelling efforts. In this study, the House Price Per
Square Foot serves as the dependent variable, reflecting the outcome of interest. The independent
variables, including Transaction Date, House Age, Distance to the Nearest MRT Station, Number
of Convenience Stores, Latitude, and Longitude, collectively form the backdrop against which the
dependent variable's behaviour is examined.
Descriptive statistical analysis plays a pivotal role in unravelling the dataset's characteristics. By
calculating measures such as mean, median, standard deviation, and skewness, the study uncovers
the distribution and tendencies of each variable. Correlation analysis delves further into the
relationships between variables, highlighting connections like the impact of distance to MRT
stations on house prices and the spatial patterns defined by latitude and longitude.
The study culminates in a comprehensive regression analysis, where the interplay between the
independent variables and the dependent variable is investigated through multiple regression
models. The regression statistics provide insights into the strength of relationships, the explanatory
power of the independent variables, and the significance of coefficients. This analysis enables a
deeper understanding of how changes in independent variables translate to shifts in the dependent
variable.
Ultimately, this study's findings not only contribute to the understanding of house pricing
dynamics but also demonstrate the power of statistical techniques in unravelling intricate
relationships within complex datasets. The insights gained from this analysis lay the groundwork
for informed decision-making in domains ranging from real estate investment to urban planning,
Page 2 of 26
showcasing the importance of robust data analysis in modern research and decision-making
processes.
Understanding the interplay between these variables can provide valuable insights for various
stakeholders, including homeowners, buyers, real estate professionals, and urban planners. For
potential homebuyers, insights into which factors influence pricing can help in making informed
decisions about property investments. Real estate professionals can benefit from identifying trends
and patterns that affect property values, enabling more accurate pricing strategies. Urban planners
can use this information to better allocate resources and plan infrastructure developments in ways
that align with housing market dynamics.
Furthermore, the study's results can contribute to the development of predictive models that
enhance the accuracy of property price estimations. This predictive capability can be invaluable
for guiding investment decisions, assessing the potential impact of urban development projects,
and supporting financial institutions in risk assessment.
Ultimately, by unravelling the intricate web of relationships between the dependent and
independent variables, this study aims to empower stakeholders with knowledge that can shape
housing market practices, guide decision-making processes, and foster a more transparent and
informed real estate landscape.
Page 3 of 26
Objectives of the study:
The The objective of the study is to analyse the relationship between various independent
variables, including Transaction Date, House Age, Distance to the Nearest MRT Station, Number
of Convenience Stores, Latitude, and Longitude, and the dependent variable, House Price Per
Square Foot. Through descriptive, correlation, and regression analyses, the study aims to:
1. Determine the extent to which the independent variables influence changes in House Price Per
Square Foot.
2. Identify any significant correlations between the independent variables and House Price Per
Square Foot.
3. Assess the impact of individual independent variables on House Price Per Square Foot while
accounting for the effects of other variables.
4. Provide insights into factors that contribute to variations in house prices and their spatial
distribution.
5. Validate the statistical significance of the relationships through ANOVA and regression
statistics.
6. Offer valuable information for informed decision-making in the housing and urban planning
domains.
By achieving these objectives, the study aims to enhance our understanding of the factors affecting
house prices and contribute to the development of predictive models for House Price Per Square
Foot based on the identified independent variables.
Methodology
Study Objective:
The objective of this study is to analyse the factors affecting house prices per square foot in a given
dataset. The primary focus is to understand the relationships between the dependent variable,
"House Price Per Square Foot," and various independent variables, including "Transaction Date,"
"House Age," "Distance to the Nearest MRT Station," "Number of Convenience Stores,"
"Latitude," and "Longitude."
Data Collection:
Page 4 of 26
The dataset used in this study contains information about house pricing, including the specified
independent and dependent variables. The dataset is collected from relevant sources, ensuring that
the data is accurate and representative of the target population.
Descriptive Analysis:
Correlation Analysis:
A correlation analysis is conducted to quantify the relationships between the dependent variable
and independent variables. Correlation coefficients are calculated to measure the strength and
direction of linear relationships. The resulting correlation matrix provides insights into how
variables are correlated with each other.
Hypothesis Formulation:
Based on the correlation analysis and domain knowledge, hypotheses are formulated to test the
significance of the relationships between the dependent and independent variables. Null and
alternative hypotheses are stated to determine whether the relationships observed are statistically
significant.
Regression Analysis:
Multiple regression analysis is performed to model the impact of the independent variables on the
dependent variable. The regression model is developed using the method of least squares, aiming
to find the best-fitting linear equation that explains the variation in the dependent variable based
on the independent variables.
ANOVA Test:
An analysis of variance (ANOVA) test is conducted to assess the overall significance of the
regression model. This test evaluates whether the variation explained by the regression model is
significant compared to the unexplained variation.
Page 5 of 26
Interpretation and Inference:
The results of the regression analysis, ANOVA test, and other statistical measures are interpreted
to draw meaningful conclusions. The coefficients of the independent variables provide insights
into their individual impacts on the dependent variable. The adjusted R-squared value helps
evaluate the goodness of fit of the regression model.
Discussion of Findings:
The findings of the study are discussed in the context of the research objective. The relationships
identified between the variables, along with their statistical significance, are highlighted. Insights
into factors influencing house prices per square foot are discussed, considering both practical and
theoretical implications.
The study acknowledges any limitations or assumptions made during the analysis. Suggestions for
future research directions are provided, including potential improvements in data collection,
methodology, and model refinement.
Page 6 of 26
Contents
• Abstract:................................................................................................................................... 1
• Introduction: ............................................................................................................................ 2
• Methodology ............................................................................................................................ 4
• Literature review...................................................................................................................... 8
ANOVA .................................................................................................................................... 20
• Findings ................................................................................................................................. 23
• Conclusion ............................................................................................................................. 23
• Reference: .............................................................................................................................. 25
Page 7 of 26
Literature review
The average cost of homes in a nation's real estate market is shaped by an intricate interplay of
economic, supply and demand, regulatory, and socio-demographic elements. This literature review
has offered insights into the multifaceted character of these influences, underscoring the
significance of contemplating a comprehensive spectrum of factors when dissecting and
prognosticating shifts in average residential property prices. Speculators' actions have been shown
to contribute to housing cycles (d’Amato & Coskun, 2022). As real estate markets continue to
undergo transformation, continual research and analysis stand as imperatives to inform policies
and strategies that foster sustainable and equitable housing affordability.
Economic trends occupy a pivotal role in dictating the value of a nation's mean residential property
price. Research has consistently demonstrated that elements such as gross domestic product
(GDP), inflation levels, and interest rates wield a substantial influence on real estate markets.
Research by Wilson and others (2017) suggests that the efficiency of the market for single-family
homes affects their prices. An expanding economy often generates heightened demand for
housing, thereby exerting upward pressure on housing prices. Conversely, periods of economic
downturn can induce decreased demand, resulting in stagnation or decline in prices.
The rudimentary concepts of supply and demand are integral to comprehending fluctuations in
residential property prices. Studies have underscored the criticality of scarcities in housing supply
as a driver for elevated prices. Albouy, Ehrlich, & Liu (2016) found that the income elasticities of
housing demand play a significant role in determining house prices. Aspects such as scarcity of
land, zoning directives, and construction expenses can curtail the availability of housing units, thus
intensifying the rivalry among potential buyers and propelling prices upwards. The supply
elasticity of owner-occupied housing has been a subject of research (Stec, Kordana, & Słyś, 2017).
Trends in demographics, encompassing population growth and rates of household establishment,
also contribute to shifts in housing requisites, thereby subsequently impacting property prices.
Interest rates wield a direct impact on the affordability of housing for aspiring buyers. Research
indicates that lower mortgage interest rates can invigorate demand for housing, leading to
augmented residential property prices. Additionally, the availability and conditions of mortgage
financing influence the purchasing power of buyers and their potential to engage in the real estate
market. Building restrictions were shown to impact housing affordability significantly (Bramley,
Page 8 of 26
2012). Stringent lending protocols have the potential to restrict access to credit, potentially
subduing demand and tempering the escalation of prices.
Psychological factors and prevailing market sentiment also contribute to oscillations in property
prices. Research has delved into the role of speculative investment, wherein buyers acquire
properties with the anticipation of capital appreciation rather than for primary habitation.
Speculative activity can introduce volatility into real estate markets, potentially inducing price
bubbles or crashes.
The determination of a nation's mean residential property cost is an intricate and dynamic process,
shaped by an intricate interplay of economic, societal, demographic, and regulatory elements.
Agglomeration economies have been recognized as influencing housing prices (Moreno-Monroy,
2012). This comprehensive analysis within the literature review examines the factors influencing
a nation's average residential property price, with a particular focus on a collection of autonomous
variables that encompass 'The Transaction Date', 'House Age', 'Distance to the Nearest MRT
Station', 'Number of Convenience Stores', 'Latitude', and 'Longitude'. Giuliano, Gordon, Pan, &
Park (2010) discussed the role of urban spatial structure in shaping housing market dynamics.
Through a meticulous examination of these variables within the broader context of economic
patterns, demographic transitions, and regulatory structures, a deeper understanding of the intricate
dynamics of the real estate market can be attained.
Page 9 of 26
The variable 'Distance to the Nearest MRT Station' highlights the pivotal role of accessibility in
molding property valuations. Real estate situated near major transit nodes, such as MRT stations,
often commands elevated prices due to the convenience it presents to occupants. Network effects
and congestion externalities can influence air traffic delays (Mayer & Sinai, 2003). Ongoing
research consistently reveals that diminished distances to public transportation led to amplified
housing demand, thereby instigating an upward force on residential property prices. This variable
underscores the prominence of location within the framework of urban mobility and illustrates the
direct influence of transportation infrastructure on housing market dynamics.
Furthermore, the variable 'Number of Convenience Stores' accentuates the significance of location
in formulating property values. Accessibility to vital amenities and services is a pivotal component
in the assessment of real estate. Areas endowed with a higher concentration of convenience stores
tend to entice potential buyers seeking ease of access to daily necessities. A greater abundance of
convenience stores near a property is indicative of heightened urbanization and residential allure,
thereby contributing to a surge in demand and subsequent appreciation of property values stated
by Giuliano, Gordon, Pan, & Park (2010).
Conversely, 'The Transaction Date' functions as a temporal variable that reflects market conditions
at a specific point in time. Fluctuations in economic conditions and seasonal trends can interact
with this variable, potentially generating price volatility (d’Amato & Coskun, 2022). A
comprehensive scrutiny of the interaction between the transaction date and other variables
augments our comprehension of short-term price oscillations.
The geographical coordinates, encapsulated by 'Latitude' and 'Longitude', encapsulate the spatial
dispersal of properties within a country's housing market. The tendency toward urbanization
frequently results in the clustering of properties within specific zones, thereby influencing housing
demand and subsequently affecting prices (Mayer & Sinai, 2003). Real estate holdings located in
prime urban locales, as denoted by precise coordinates, frequently command enhanced prices
owing to the convenience and accessibility they provide.
Lastly, the variable 'House Age' acts as a gauge of property condition and upkeep, impacting its
valuation (d’Amato & Coskun, 2022). Aged residences might incur lower prices due to the
perceived elevated expenses associated with maintenance.
Page 10 of 26
In a nutshell, the determination of a country's average residential property price is a multifaceted
and evolving process that hinges on intricate interactions among economic, demographic,
regulatory, and geographic elements. Wheaton (1990) proposed a housing market matching model
to analyze the impact of vacancy and search on prices. This comprehensive literature review sheds
light on the interconnected nature of these influences and underscores the necessity of a holistic
approach in understanding and predicting shifts in average residential property prices. As the real
estate sector continues to evolve, persistent research and analysis remain pivotal in shaping
effective strategies that foster sustainable and equitable residential property affordability.
Page 11 of 26
Description of variables
Dependent variables
A variable is considered to be dependent when it depends on other independent variables to
mitigate the effects of changes in it. It is the variable that is being measured or watched in order to
ascertain how changes in other variables will affect it. They are the results of an experiment or
study, and the manipulation of the independent variables affects their values. The only dependent
variable used in this study is the House Price Per Square Foot, and this is the variable we would
most likely want to predict or model based on the other variables in the dataset. The factors
affecting the country's average home price will be looked into in this study.
Independent variables
In a statistical analysis, independent variables are those that affect how the dependent variable
changes. The Transaction Date, House Age, Distance to the Nearest MRT Station, Number of
Convenience Stores, Latitude, and Longitude are the independent variables in this study.
This This conducted study will show the correlation between the dependent and the independent
variables and their impact on the dependent variable House Price Per Square Foot.
Page 12 of 26
The Descriptive analysis:
A fundamental technique for summarizing and comprehending a dataset's characteristics is
descriptive statistical analysis. We present a thorough analysis of a dataset containing data on
house pricing in this study. Transaction Date (X1), House Age (X2), Distance to the Nearest MRT
Station (X3), Number of Convenience Stores (X4), Latitude (X5), and Longitude (X6) are the
primary variables that are the focus of the analysis.
For each of the specified variables, the analysis involves calculating various descriptive statistical
measures. The mean, median, mode, standard deviation, standard error, sample variance, kurtosis,
skewness, range, minimum, maximum, sum, and count are some examples of these measurements.
Each of these measures provides insightful details about the characteristics of the dataset.
Mean: The average transaction date is approximately 2013.149, indicating the central tendency of
the transaction dates.
Median: The median transaction date is 2013.167, which represents the middle value and is less
affected by extreme values.
Mode: The mode, around 2013.417, is the most frequently occurring transaction date.
Page 13 of 26
Standard Deviation: With a value of approximately 0.282, the transaction dates exhibit a
moderate spread around the mean.
Skewness: The skewness of -0.151 indicates a slight leftward asymmetry in the distribution,
implying a slightly earlier concentration of transactions.
Kurtosis: The negative kurtosis value of -1.232 indicates a flatter distribution compared to a normal
distribution.
Mean: The average house age is approximately 17.713 years, providing insight into the general
age of the houses.
Median: The median house age is 16.100 years, indicating that most houses are relatively young.
Standard Deviation: A standard deviation of about 11.392 suggests significant variability in
house ages.
Skewness: A positive skewness of 0.383 suggests a slight tail towards older houses.
Kurtosis: The negative kurtosis value of -0.877 indicates a flatter distribution compared to a
normal distribution.
Mean: The average distance to the nearest MRT station is approximately 1083.886 units,
indicating the typical proximity of houses to public transportation.
Median: The median distance is 492.231 units, highlighting the middle point in the distribution.
Mode: The mode, around 289.325, is the most frequently occurring transaction date.
Standard Deviation: The substantial standard deviation of 1262.110 reveals wide variability in
distances.
Skewness: The positive skewness of 1.889 indicates a longer tail towards farther distances.
Kurtosis: A positive kurtosis value of 3.208 indicates that the distribution of the variable being
analysed has a relatively more pronounced peak (leptokurtic) and heavier tails (outliers) compared
to a normal distribution.
Page 14 of 26
Number of Convenience Stores (X4):
Mean: The average number of convenience stores is about 4.094, indicating a reasonable
availability of such amenities.
Median: With a median value of 4.00, the distribution is relatively symmetric.
Standard Deviation: The small standard deviation of 2.946 suggests a moderate spread around
the mean.
Skewness: The skewness of 0.155 indicates a slight asymmetry towards more convenience stores.
Kurtosis: The negative kurtosis value of -1.066 indicates a flatter distribution compared to a
normal distribution.
Latitude (X5):
Mean: The mean latitude represents the average latitude value across all data points. In this
dataset, the average latitude is approximately 24.969, indicating the central location of the points
along the north-south axis.
Median: The median is the middle value in the ordered dataset. In this case, the median latitude
is 24.971, which is close to the mean and indicates a relatively balanced distribution.
Mode: The mode is the most frequent value in the dataset. Here, the mode of latitude is
approximately 24.974.
Standard Deviation: The standard deviation measures the spread of latitude values. A small
standard deviation 0.012 suggests that the latitude values are closely clustered around the mean.
Skewness: A distribution with a negative skewness of -0.439 is slightly skewed to the left,
suggesting that some data points with lower latitude values may be pulling the distribution in that
direction.
Kurtosis: Kurtosis measures the tails of the distribution compared to a normal distribution. A
positive value 0.269 indicates that the distribution has slightly heavier tails than a normal
distribution.
Longitude (X6):
Mean: The mean longitude represents the average longitude value across all data points. In this
dataset, the average longitude is approximately 121.533, indicating the central location of the
points along the east-west axis.
Median: The median longitude is the middle value in the ordered dataset. Here, the median is
approximately 121.539.
Page 15 of 26
Mode: The mode is the most frequent longitude value in the dataset, which is around 121.543.
Standard Deviation: The standard deviation measures the spread of longitude values. A slightly
higher standard deviation 0.015 suggests more variability compared to latitude.
Skewness: The skewness -1.220 indicates that the distribution of longitude values is skewed to
the left, meaning there are more data points with higher longitude values.
Kurtosis: The positive kurtosis value 1.202 indicates that the distribution of longitude values has
slightly heavier tails than a normal distribution.
This dataset's descriptive statistical analysis gives a detailed rundown of its salient features. Our
comprehension of the properties of the dataset is improved by the analysis of transaction dates,
house ages, distances to MRT stations, convenience stores, and geographic coordinates. Making
informed decisions in a variety of housing and urban planning-related areas requires this
knowledge. This fundamental knowledge of the dataset's attributes can be the basis for additional
analyses or modelling.
Mean (Average): The average house price per square foot in this dataset is approximately
$37.98.
Standard Error: The standard error is a measure of the variability of the sample mean. It's
approximately 0.67.
Median (Midpoint): The median house price per square foot is $38.45. This means that half of
the data points are below this value, and half are above it.
Mode (Most Frequent Value): The mode is $40.3, indicating that this value appears most
frequently in the dataset.
Standard Deviation: The standard deviation measures the spread or dispersion of the data. A
higher standard deviation indicates greater variability in house prices per square foot. In this
dataset, the standard deviation is approximately $13.61.
Kurtosis: The kurtosis value of approximately 2.18 suggests that the distribution of house prices
per square foot has slightly heavier tails compared to a normal distribution.
A statistical method called correlation analysis is used to quantify the magnitude and direction of
the linear relationship between two or more variables. As a result, it refers to the relationship
between the study's variables. We have a correlation matrix that shows the correlations between
various variables based on the house pricing dataset. Let's examine the matrix's data in more detail:
The matrix's rows and columns each stand for a different variable. The correlation coefficients
between each pair of variables are represented by the values in the cells. From -1 to 1, the
correlation coefficient falls:
A perfect positive correlation of 1 means that as one variable rises, the other rises
proportionally.
A perfect negative correlation of -1 means that as one variable rises, the other one falls in
proportion.
Zero means there is no linear correlation. The variables don't have a linear relationship to
one another.
Here's how to interpret some of the correlations:
Distance to Nearest MRT Station (X3) vs. Number of Convenience Stores (X4): A moderate
negative correlation of approximately -0.60 suggests that houses closer to MRT stations tend to
have fewer nearby convenience stores. This could be due to a higher concentration of commercial
areas around transit hubs.
Page 17 of 26
Latitude (X5) vs. Longitude (X6): A moderate positive correlation of approximately 0.41
indicates that as latitude increases, longitude also tends to increase. This suggests a spatial pattern
where houses at higher latitudes tend to be situated farther east.
House Age (X2) vs. House Price Per Square Foot (Y): A weak negative correlation of
approximately -0.21 indicates that as the age of the house increases, its price per square foot tends
to decrease slightly. This suggests that buyers may be willing to pay less for older properties.
Number of Convenience Stores (X4) vs. House Price Per Square Foot (Y): A moderate positive
correlation of approximately 0.57 implies that houses located near more convenience stores tend
to have higher prices per square foot. This could be due to the added convenience and desirability
of such locations.
Distance to Nearest MRT Station (X3) vs. House Price (Y): A moderate negative correlation of
approximately -0.67 suggests that houses located closer to MRT stations tend to have higher prices.
This reflects the importance of accessibility to public transportation in influencing housing prices.
There is co-linearity among the variables in this situation between the dependent and independent
variables if the correlation between the independent variables is greater than 70%. Consequently,
none of the independent variables, including Transaction Date, House Age, Distance to the Nearest
MRT Station, Number of Convenience Stores, Latitude, and Longitude, are linearly related to one
another.
Page 18 of 26
The regression analysis
The impact of the independent variables Transaction Date, House Age, Distance to the Nearest
MRT Station, Number of Convenience Stores, Latitude, and Longitude on the dependent variable
House Price Per Square Foot was examined using multiple regression analysis.
Regression statistics
Here, the value of multiple R is the correlation coefficient. It reveals the strength of the linear
relationship. For instance, a value of 1 denotes the ideal positive relationship, while a value of 0
denotes the complete absence of any relationship. It is r square root. In that case the value of
multiple R (76.31%) represents very good linear relationship among the independent and
dependent variables. The R Square value (0.582385045) is a measure of how well the independent
variables explain the variation in the dependent variable. It displays the percentage of the
dependent variable's variance that can be accounted for by the independent variables. R Square
ranges from 0 to 1, where 0 means the independent variables do not explain any of the variation,
and 1 means they explain all of it. In our situation, the independent variables account for about
58.24% of the variance in the dependent variable. A modified version of the R Square value that
takes into account the sample size and the number of independent variables is the adjusted R
Square value (0.576228559). It penalizes the addition of unnecessary independent variables that
do not contribute significantly to the model's explanatory power. For comparing models with
various numbers of independent variables, adjusted R Square is helpful. Like R Square, it ranges
from 0 to 1. The standard error (8.857514581) measures the average difference between the actual
values and the predicted values. It gives you an idea of how much the dependent variable's values
Page 19 of 26
vary around the regression line. A lower standard error indicates that the predictions are closer to
the actual values, while a higher standard error indicates more variability. The number of
observations (414) refers to the number of data points used in the regression analysis.
ANOVA
The single and two-factor methods for performing the null hypothesis test in Excel are called
ANOVA (Analysis of Variance), and they determine whether the test will be PASSED for the null
hypothesis if all the population values are exactly equal to one another. Degrees of freedom are
used to describe how many values in a statistic's final calculation are subject to change. Degrees
of freedom are related to various sources of variation in the context of an ANOVA. Sum of Squares
is a metric for determining how variable your data are overall. The sum of the squared differences
between each data point and the overall mean is what this term refers to. The sum of squares
divided by the degrees of freedom is known as the mean square. The average squared difference
within each source of variation is what it represents. A ratio of the mean square values for the
sources of variation makes up the F-statistic. In a regression analysis, it's used to determine whether
there are meaningful variations in the group means. It shows whether the difference between the
group means and the difference within the groups is significantly greater. The F-statistic's p-value
is represented by this number. Assuming there is no significant difference between the group
means, it calculates the likelihood of obtaining an F-statistic as extreme as the one calculated. A
very low p-value suggests that at least one of the independent variables in your regression model
is significant, providing strong evidence against the null hypothesis.
Here, Regression component analyses the variance explained by the regression model. In our
case, the model has 6 degrees of freedom, an SS of 44529.96281, and an MS of 7421.660468.
Residual component represents the unexplained variance or error in the model. There are 407
degrees of freedom, an SS of 31931.41478, and an undefined MS for the residual The F-statistic
Page 20 of 26
is 94.5969927, and the p-value (Significance F) is 4.8291E-74. Since the p-value is extremely
small, it indicates that the regression model is statistically significant and provides evidence
against the null hypothesis that all regression coefficients are zero.
Coefficients Table
According to the multiple linear regression model, these are the estimated coefficients for each
predictor variable. While maintaining the other predictors' values constant, they depict the shift in
the dependent variable resulting from a one-unit shift in each respective predictor variable. Each
estimate of a coefficient's standard error is provided by Standard Error. The accuracy of the
coefficient estimate is measured. The estimated coefficient to its standard error is compared using
the t-statistic. The coefficient estimate's deviation from zero is quantified by the number of
standard errors. The coefficient is generally more likely to be statistically significant when the t-
statistic has larger absolute values. The p-value, which is related to the t-statistic, expresses the
likelihood that the t-statistic (or more extreme values) would be observed if the true coefficient
were zero. A small p-value (typically less than 0.05) suggests that the coefficient is statistically
significant. Lower 95% and Upper 95% columns provide the lower and upper bounds of the 95%
confidence interval for each coefficient. This interval gives you a range of values within which
you can be reasonably confident the true population parameter lies. Lower 95.0% and Upper
95.0%: These columns provide the lower and upper bounds of the 95% prediction interval for each
coefficient. This interval gives you a range of values within which you can expect individual future
observations to fall, given the variability in the data.
Coefficients Table
The intercept represents the estimated value of the dependent variable when all predictor variables
are zero. X1 transaction date: For each unit increase in the transaction date, the estimated value of
Page 21 of 26
the dependent variable increases by 5.149. This predictor appears to be statistically significant
(low p-value). X2 house age: For each unit increase in house age, the estimated value of the
dependent variable decreases by 0.270. This predictor is also statistically significant. X3 distance
to the nearest MRT station: For each unit increase in distance, the estimated value of the dependent
variable decreases by 0.0045. This predictor is statistically significant. X4 number of convenience
stores: For each unit increase in the number of convenience stores, the estimated value of the
dependent variable increases by 1.133. This predictor is statistically significant. X5 latitude: For
each unit increase in latitude, the estimated value of the dependent variable increases by 225.470.
This predictor is statistically significant.X6 longitude: For each unit increase in longitude, the
estimated value of the dependent variable decreases by 12.42906117.
Regression Model
The regression function from our analysis is depicted in below:
Page 22 of 26
Limitation of the study
The study is conducted to identify the elements that affect how a real estate firm determines its
house price. Data for the study were obtained from the websites of many institutions, including
the Global Party of Taiwan, the National Taiwan Real Estate Bureau, and Kaggle; nonetheless, it
is probable that the small number of observations makes it impossible to draw firm conclusions.
The sample data generalize the findings because convenience sampling was employed in this study
due to cost and time considerations. As a result, the study's findings cannot be applied to real estate
sectors in Bangladesh or elsewhere. Only the outcomes of this study may be used to make
generalizations. Bangladesh's real estate industry is comparable to that of other emerging nations
in terms of atmosphere and culture.
Findings
Based on your analysis of the regression model, the following findings are given below:
• The regression model is able to explain about 58.24% of the variation in the dependent
variable, which is the house price of unit area.
• The regression model is statistically significant, meaning that there is a linear relationship
between the dependent variable and at least one of the predictor variables.
• The most influential predictor variables are latitude, transaction date, and number of
convenience stores, as they have the largest positive coefficients and low p-values. This
means that these variables have a strong positive effect on the house price of unit area.
• The predictor variable is longitude, as it has a small negative coefficient and low p-value.
This means that this variable has a strong negative effect on the house price of unit area,
and it is statistically significant.
• The intercept term indicates that the estimated house price of unit area is 40.578 when all
predictor variables are zero. However, this value may not be meaningful, as some of the
predictor variables cannot be zero in reality.
• The F-statistic and the p-value show that the regression model is statistically significant at
the 0.05 level, meaning that there is sufficient evidence to reject the null hypothesis that all
regression coefficients are zero. This implies that at least one of the predictor variables has
a non-zero effect on the house price.
Page 23 of 26
Conclusion
The conclusion of the study is that the house price of unit area in Taiwan depends on several
factors, such as the transaction date, the house age, the distance to the nearest MRT station, the
number of convenience stores, and the latitude. Among these factors, latitude has the strongest
positive impact on the house price of unit area, followed by transaction date and number of
convenience stores, longitude has a negative impact on the house price of unit area. Therefore, if
someone wants to buy or sell a house in Taiwan, they should consider these factors carefully.
Page 24 of 26
Reference:
1. Albouy, D., Ehrlich, G., & Liu, Y. (2016). Housing demand, cost-of-living inequality, and the
affordability crisis (No. w22816). National Bureau of Economic Research.
2. Bramley, G. (2012). Affordability, poverty and housing need: triangulating measures and
standards. Journal of Housing and the Built Environment, 27, 133-151.
3. Wilson, E. J., Christensen, C. B., Horowitz, S. G., Robertson, J. J., & Maguire, J. B.
(2017). Energy efficiency potential in the US single-family housing stock (No. NREL/TP-5500-
68670). National Renewable Energy Lab.(NREL), Golden, CO (United States).
5. Leyk, S., Balk, D., Jones, B., Montgomery, M. R., & Engin, H. (2019). The heterogeneity and
change in the urban structure of metropolitan areas in the United States, 1990–2010. Retrieved
from https://www.nature.com/articles/s41597-019-0329-6
6. Giuliano, G., Gordon, P., Pan, Q., & Park, J. (2010). Accessibility and residential land values:
Some tests with new measures. Urban studies, 47(14), 3103-3130.
7. d’Amato, M., & Coskun, Y. (2022). Property Valuation in Uncertain and Cyclical Market
Condition. In Property Valuation and Market Cycle (pp. 165-178). Cham: Springer International
Publishing.
8. Wheaton, W. C. (1990). Vacancy, search, and prices in a housing market matching model.
Journal of Political Economy, 98(6), 1270-1292.
9. Stec, A., Kordana, S., & Słyś, D. (2017). Analysing the financial efficiency of use of water and
energy saving systems in single-family homes. Journal of Cleaner Production, 151, 193-205.
10. Mayer, C., & Sinai, T. (2003). Network effects, congestion externalities, and air traffic delays:
Or why not all delays are evil. American Economic Review, 93(4), 1194-1215.
Page 25 of 26