Kernel Regression Section3
Kernel Regression Section3
Kernel Regression Section3
Y
20
15
10
0 0 2 4 6 8 10 12 14
Locally weighted regression (also called Local polynomial regression) is a form of nonparametric regression that addresses this boundary problem Locally weighted regression uses weighted least squares (WLS) regression5 to fit a d-th degree polynomial to the data, where d is an integer, e.g., d=1 is local linear regression, d=2
Weighted least squares is a method of regression, similar to least squares in that it uses the same minimization of the sum of the residuals. However, instead of weighting all the residuals equally, they are weighted such that points with a greater weight contribute more to the sum.
Page 8 of 15
A Note on Kernel Regression Partho Sarkar is local quadratic regression, etc. The weights assigned to the observations are calculated via the kernel function, as above. Then these weights are used to estimate the coefficients of a local polynomial function fit. The simple kernel regression, as described previously, is just a special form of locally weighted regression, with d = 0. Apart from the boundary problem present in kernel regression, mentioned earlier, locally weighted regression, also addresses the problem of potentially inflated bias and variance in the interior of the data set if the points are not uniformly densely distributed or if substantial curvature is present in the underlying, though undefined, regression. The above figure illustrates these points- the locally linear regression fit seems more accurate than the kernel regression fit, especially towards the boundaries and at points of curvature. We now sketch the procedure for locally weighted regression. Consider as before fitting yj at the point xj. First, the weights, wij are obtained for the i=1,2,,m points in the memory set. This results in the vector of kernel weights wj:
w j = ( w1 j w2 j wmj )
Recall that the simple (zero order/Nadaraya-Watson) KR estimate of yi is a weighted sum of the yis: 14.
y j = m( x j ) = wij yi
i =1
Following the procedure of weighted least squares, the estimated coefficients for the locally weighted regression fit at xj are then found via 17. j = ( X'Wj X ) X'Wjy
-1
where j = ( 0 j 1 j ... dj ) ' is the column of regression coefficients and X is the matrix
6
Page 9 of 15
xd1 xd m
for locally weighted regression determined by the degree d of the polynomial. Note that a column of constants (1s) is the first column- this corresponds to the constant term 0 in the equation below. Thus, provided ( X'Wj X ) exists, the fit at xj is obtained as:
-1
19. y j = x j j = 0 j + 1 j x j + 2 j x j 2 + ... + dj x j d
where x j is the j-th row of the X matrix Note that a separate regression on all the memory points has to be carried out for every query point, i.e., the coefficients have to re-estimated for every xj (though they are used to estimate yj only for the j-th point). This makes local polynomial regression even more computationally intensive than simple kernel regression for sizeable memory and query sets. Authors generally agree that for the majority of cases, a first order fit (local linear regression) is an adequate choice for d. Local linear regression is suggested to balance computational ease with the flexibility to reproduce patterns that exist in the data. Nonetheless, local linear regression may fail to capture sharp curvature if present in the data structure. In such cases, local quadratic regression (d=2) may be needed to provide an adequate fit. Most authors agree there is usually no need for polynomials of order d>2).
x11 X= x1m
x21 x2 m
xk1 x km
The data in memory will now take the form of pairs of vectors of values of the independent and dependent variables, ( X1 , y1 ) , ( X 2 , y2 ) , , ( X m , ym ) , where Xi is the i-th independent variable observation vector8, Xi = [ x1i
7 8
Matrices and vectors are shown in bold type. It is more convenient for later work to express this as a column vector, hence the transpose operator
Page 10 of 15