Lecture 9 - Geographically Weighted Regression II
Lecture 9 - Geographically Weighted Regression II
Regression II
Lecture 8 Week 8
What is Geographically Weighted Regression?
• Geographically Weighted Regression (GWR) was developed in response to the finding that a regression
model estimated over the entire area of interest may not adequately address local variations.
• The fairly simple principle on which it is based consists on estimating local models by least squares, each
observation being weighted by a decreasing function of its distance to the estimation point.
• To identify where local coefficients deviate the most from the overall coefficients, to build tests to assess
whether the phenomenon is non-stationary and to characterize non-stationary.
• In addition to this descriptive use, we present a more predictive approach, showing how taking non-
stationarity into account makes it possible to improve an estimator over a spatial area. The example is based
on a model linking the poor population and the number of beneficiaries of supplementary universal health
coverage (CMU-C) in Rennes.
Understanding Geographically Weighted Regression (GWR) II
• Geographically Weighted Regression (GWR) was developed in response to the finding that a regression
model estimated over the entire area of interest may not adequately address local variations.
• The fairly simple principle on which it is based consists on estimating local models by least squares, each
observation being weighted by a decreasing function of its distance to the estimation point.
• To identify where local coefficients deviate the most from the overall coefficients, to build tests to assess
whether the phenomenon is non-stationary and to characterize non-stationary.
• In addition to this descriptive use, we present a more predictive approach, showing how taking non-
stationarity into account makes it possible to improve an estimator over a spatial area. The example is based
on a model linking the poor population and the number of beneficiaries of supplementary universal health
coverage (CMU-C) in Rennes.
To identify the nature of relationships between variables, linear
regression models the dependent variable y as a linear function of
explanatory variables x1,..., xp. If you have n observations, the model is
written-
• Where β0, β1,..., βp are the parameters and ε1, ε2,..., εn are the error
terms. In this model, the coefficients βk are considered identical
across the study area. However, the hypothesis of spatial uniformity
of the effect of explanatory variables on the dependent variable is
often unrealistic (Brunsdon et al. 1996).
• If the parameters vary significantly in space, a global estimator will
hide the geographical richness of the phenomenon
Example 9.1 — Use of a hedonic model to
study real estate prices in Lyon
• Imagine you're shopping for a house and you notice that houses in the city
center are more expensive than those on the outskirts. You might think this is
simply because of the location, but there's a chance that houses in the center
are just generally nicer. So, how can you tell how much of the price is due to the
location and how much is due to the house itself being better?
• This is where the "hedonic model" comes into play. Think of it like a recipe
where the final price of a house is the finished dish, and the ingredients are
things like location, size, number of rooms, age of the house, and other features.
Just like how every ingredient adds a bit of flavor to a dish, each feature of a
house adds a bit of value to its price
Residuals
• Residuals in the context of a regression model, like the hedonic model you mentioned, are essentially the
"leftovers" or the differences between the observed values and the values predicted by the model.
• Imagine you are throwing darts at a dartboard where the bullseye represents the actual price of a house.
Every time you throw a dart (predict a price using your model), you aim for the bullseye, but you might
not hit it exactly every time. The distance between where your dart lands and the bullseye is like the
residual for that prediction.
• If your dart lands exactly on the bullseye, your predicted price is exactly right, and the residual is zero —
no leftovers! If your dart lands a bit off to the side, you have a residual that shows how much and in which
direction your prediction was off — the size and direction of your "leftover."
• In real estate, if the model predicts a house to be worth $300,000 based on its characteristics (like location,
size, number of bathrooms), but the house actually sells for $310,000, the residual for that house would be
$10,000. This means your prediction was $10,000 less than the actual selling price.
• Residuals are important because they can tell you how well your model is performing. If the residuals are
small and randomly distributed, your model is doing a good job. If they're large or show a pattern (like
always being too high or too low), it might mean your model is missing some important factor that affects
house prices.
Models with variable coefficients
Explanation: Let's take the house price example again. Normally, in a standard regression model, we'd say that being close to a park adds a certain
amount to a house's value, no matter where the house is. But what if the value of being near a park changes depending on whether you're in the city
center or out in the suburbs? Maybe in the city center, where parks are rare, being near one is super valuable, but in the suburbs, where there are lots
of parks, it doesn't make as much difference.
Geographically Weighted Regression (GWR) is like having a special set of glasses that lets you see how the value of being near a park changes as
you move from place to place. Instead of one rule for the whole city, you get a bunch of local rules.
In GWR, the impact of a feature like proximity to a park on house prices isn't just one number. It's a bunch of numbers that can go up and down
depending on where on the map you are. So, as you move your glasses around the map, the rule for "added value of being near a park" changes —
it's a surface of values rather than a single flat line.
The "regression coefficients" — the numbers that tell us how much each feature like size, number of bathrooms, or proximity to a park changes the
house price — are not set in stone. They vary depending on where the house is located. GWR gives us a more detailed, location-specific picture of
what affects house prices. It's like a tailor-made outfit for each neighborhood instead of a one-size-fits-all approach.
• In a regular regression model, you'd estimate the price of a house
based on features like its size, number of rooms, age, etc., and you'd
assume these factors have the same effect on house prices
everywhere. However, the hypothesis here is that the location of a
house changes how much these features affect its price. So, a large
house might be worth a lot more in the city center than in the
countryside.
• Geographically Weighted Regression (GWR) is a technique used
when the effect of the explanatory variables on the dependent variable
changes over space. For example, being close to a subway station
might significantly increase a property's value in a city but might not
matter at all in a rural area.
• In GWR, you give more importance to the data points (observations)
that are geographically closer to the location you're interested in. It's
like saying, "Tell me what my neighbors' houses sold for, and I'll
guess my house price based on that." The further away the data, the
less it counts.
• To manage this, you use weights that decrease the further away the
data point is from the location you're looking at. So you're not just
looking at how, say, the size of a house affects its price; you're
looking at how the size of houses near each other affects their price,
with an emphasis on the ones closest to the location of interest.
Another Example
• https://www.youtube.com/watch?v=YDG3LAijWHQ&t=136s