5 3-2 Spatial Environmental Data Model Selection Long-Range Dependencies
5 3-2 Spatial Environmental Data Model Selection Long-Range Dependencies
Course / Module 5 Environmental Data and G… / Sensing and Analyzing global patter…
Previous Next
2. Model Selection
Bookmark this page
Model Selection
to fit Gaussian processes on a variety
of data
without even that much prior
knowledge.
It's still good to know what these
different kernels do,
so that you can already come up with
a good set of candidate
kernels.
But other than that, you can actually
fit the rest to the data at hand that
you have.
20 43 / 20 43 1.50x
Video Transcripts
Download video file Download SubRip (.srt) file
Download Text (.txt) file
Initially, let's recall our setup where we have a pair of multivariate Gaussian random variables , and 𝐗1 ∈ ℝ
𝑑
𝐗2 ∈ ℝ
𝑁 −𝑑
. These two random variables are used to represent the temperature at two sets of cities: are the 𝐗1
cities for which we do no have temperature measurements, and are the cities for which we do have temperate 𝐗2
measurements. In addition, we also have access to the means of both of these random variables, which are
denoted by and respectively — these will be the mean temperature at each of the cities.
𝜇1 𝜇2
The random variables are associated with physical locations represented by the variables and 𝐙1 ∈ ℝ
𝑀 ×𝑑
spatial data. Further, we have selected a covariance function that serves as proxy for the relation
𝑘 (𝑧𝑖 , 𝑧𝑗 )
between two random variables as a function of their spatial locations. We use this kernel function to construct a
covariance matrix so that . Thus, we build the matrix:
Σ 𝑖𝑗 = 𝑐𝑜𝑣 (𝑋𝑖 , 𝑋𝑗 ) = 𝑘 (𝑧𝑖 , 𝑧𝑗 )
In the previous sections we have shown that the distribution of the random variable 𝐗1 conditioned on 𝐗 2 = 𝐱2
−1
𝜇𝐗 = 𝜇1 + Σ12 Σ (𝐱2 − 𝜇2 )
1 |𝐗2 22
−1
Σ𝐗 = Σ11 − Σ12 Σ Σ21 .
1 |𝐗2 22
The main running assumption in this process is to model the variables to be measured – like temperature – as a
jointly Normally distributed random variable with correlations determined as a function of location through the
kernel function . Once the means have been specified, we may predict the unobserved random variables
𝑘 (𝑧𝑖 , 𝑧𝑗 )
2
‖𝑦 𝑖 − 𝑦 𝑗 ‖
𝑘 (𝑦 𝑖 , 𝑦 𝑗 ) = exp − .
2
( 2ℓ )
We can say that 𝜃 = {ℓ} , and our objective is to find the “best" in some particular sense that will be defined
𝜃
later.
The two approaches we will explore are:
Estimate Generalization error: cross-validation, leave-one-out, or k-fold. This defines a “good model"' as one
that predicts best data that we have not seen before, i.e., generalization. This approach corresponds to the
classical tension between having a model that fits the data well, and at the same time, generalizes to
unobserved data.
Here we assume we have a probabilistic
Maximize the log marginal likelihood of the data, 𝑝 (𝑦|𝑋, 𝜃) to 𝜃 .
model, where we compute how likely the data is that we have seen, under the chosen model. Alternatively, in
short, how well the model fits the data as measured by a normalized probability. This approach balances fitting
power and the simplicity of the model.
Previous Next
https://learning.edx.org/course/course-v1:MITx+6.419x+1T2021/block-v1:MITx+6.419x+1T2021+type@sequential+block@gp_lec3/block-v1:MITx+6.419x+1T2021+type@vertical+block@gp_lec3-tab2 2/3
5/16/2021 Sensing and Analyzing global patterns of dependence | Module 5: Environmental Data and Gaussian Processes | Data Analysis: Statistical Modeling and Computation in Applications | edX
edX
About
Affiliates
edX for Business
Open edX
Careers
News
Legal
Terms of Service & Honor Code
Privacy Policy
Accessibility Policy
Trademark Policy
Sitemap
Connect
Blog
Contact Us
Help Center
Media Kit
Donate
https://learning.edx.org/course/course-v1:MITx+6.419x+1T2021/block-v1:MITx+6.419x+1T2021+type@sequential+block@gp_lec3/block-v1:MITx+6.419x+1T2021+type@vertical+block@gp_lec3-tab2 3/3