0% found this document useful (0 votes)

26 views

Lecture 16: Polynomial and Categorical Regression 1 Review

This document discusses polynomial and categorical regression. It begins with a review of linear regression, modeling a scalar response variable Y as a linear function of predictor variables X plus noise. It then covers adding curvature through polynomial regression by including higher-order terms of the predictor variables in the model, such as X2 and X3. The document provides an example comparing a linear regression model to a cubic polynomial regression model using some example data that is better modeled non-linearly. It also discusses how to perform polynomial regression in R.

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views

Lecture 16: Polynomial and Categorical Regression 1 Review

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Lecture 16: Polynomial and Categorical Regression

1 Review
We predict a scalar random variable Y as a linear function of p different predictor variables
X1 , . . . Xp , plus noise:
Y = β0 + β1 X1 + . . . βp Xp +
and assume that E [|X] = 0, Var [|X] = σ 2 , with being uncorrelated across observations. In
matrix form,
Y = Xβ +
the design matrix X including an extra column of 1s to handle the intercept, and E [|X] = 0,
Var [|X] = σ 2 I.
If we add the Gaussian noise assumption, ∼ M V N (0, σ 2 I), independent of all the predictor
variables.
The least squares estimate of the coefficient vector, which is also the maximum likelihood
estimate if the noise is Gaussian, is

βb = (XT X)−1 XT Y

These are unbiased, with variance σ 2 (XT X)−1 . Under

h i theqGaussian noise assumption, β itself has
b
a Gaussian distribution. The standard error seb βbj = σ (XT X)−1 . Fitted values are given by
jj
Xβb = HY, and residuals by e = (I − H)Y. Fitted values m b and residuals e are also unbiased and
have Gaussian distributions, with variance matrices σ 2 H and σ 2 (I − H), respectively.
The maximum likelihood estimator of σ 2 is the in-sample mean-squared error, n−1 (eT e) and
b2 = σ 2 n−q σ 2 /σ 2 ∼ χ2n−q . Also under the Gaussian

E σ n . Under the Gaussian noise assumption, nb
noise assumption, the Gaussian sampling distribution of any particular coefficient or conditional
mean can be converted into a t distribution, with n−q degrees of freedom, by using the appropriate
standard error, obtained by plugging in the de-biased estimate of σ 2 .

2 Adding Curvature: Polynomial Regression

If the relationship between Y ansd X is non-linear, we could try to capture that fact with a
polynomial. For example, we could use a model like

Y = β0 + β1 X + β1 X 2 + β1 X 3 + .

The same is true if there are many covariates. For example, we could a model like

Y = β0 + β1 X1 + β2 X12 + . . . βd X1d + βd+1 X2 + . . . βp+d−1 Xp +

where instead of Y being linearly related to X1 , it’s polynomially related, with the order of the
polynomial being d. We just add d − 1 columns to the design matrix X, containing x21 , x31 , . . . xd1 ,
and treat them just as we would any other predictor variables. With this expanded design matrix,
it’s still true that βb = (XT X)−1 XT Y, that fitted values are HY (using the expanded X to get

1
H), etc. The number of degrees of freedom for the residuals will be n − q where, in this case,
q = p + 1 + (d − 1). Of course, there are many other such models we can fit. For example,
p X
di
βi,j Xij + .
X
Y = β0 +
i=1 j=1

There are many mathematical and statistical points to make about polynomial regression, but let’s
take a look at how we’d actually estimate one of these models in R first.

2.1 R Practicalities
There are a couple of ways of doing polynomial regression in R.
The most basic is to manually add columns to the data frame with the desired powers, and
then include those extra columns in the regression formula.

xsq = x*x
out = lm(y ~ x + xsq)

A second method is:

out = lm(y ~ x + I(x^2))

The function I() is the identity function, which tells R “leave this alone”. We use it here
because the usual symbol for raising to a power, ^, has a special meaning in linear-model formulas,
relating to interactions.
The third method is:

out = lm(y ~ poly(x,2))

which tells R to use a second order polynomial.

2.2 Example
Look at the data in Figure 1. The first fit is linear. It does not look good. Then I tried a cubic
polynomial fit which works much better.
Here is the code.

> pdf("example.pdf")
> par(mfrow=c(2,2))
>
> ### linear regression
> out = lm(y ~ x)
> summary(out)

Coefficients:

2
● ●
10

●
● ●
●

3
● ●
● ●
●
8

●
● ●

2
● ●
6

resid(out)
●●
●
●● ●
●● ● ●●
● ● ●

1
● ● ●● ●
y

●
4

● ● ●●● ●
●● ● ●
●●
●●● ●●●● ● ● ● ●
●
● ●● ● ● ●
●● ●● ●
● ●
●● ●●●● ●
●
●●●●●●
●
●●●

0
●
2

● ●●●
● ● ●●
● ●●
● ● ●● ● ●
●● ● ● ●
● ●● ●● ● ●
●●●● ●●●● ●● ● ●
● ●● ●●● ●●
●●
●●● ● ●●● ●●
● ●
0

● ● ●

−1
●●
● ● ● ●
●● ●
●●● ●●●● ●
● ● ● ●● ●
●
● ● ●
−2

●●
● ●●

−1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0

x x

● ●
10

●
0.6

●
●
● ● ● ●
●
● ●
●
● ● ● ●
● ●
8

● ● ●
● ● ● ● ● ● ●
●
●● ●● ●●
0.2

● ●
6

● ● ● ● ●
resid(out)

●● ●
●
●●● ●●● ● ● ●●● ●
●●●● ● ● ● ●●
● ●
●
● ●
● ● ●
y

● ●●
●
●
●●
●
●
●
● ●● ● ●
●●
●●
● ●● ● ● ●● ● ●●
● ● ● ● ●
−0.2

● ● ●
●● ● ●●● ●
●● ●● ●●
●●●● ● ● ● ●
●●●●
●
●●●●
● ● ● ●●
2

● ●●● ●●●● ●
●●●
●●●
● ●
● ●● ●
●●
●● ●
●● ●
● ●● ● ● ● ●
●
●
●
●●●
●
● ● ●
●
●● ● ● ● ● ● ●
●●
●● ●
●● ●
0

●● ●
●●
●●● ● ●
●● ●
●●
−0.6

●●
−2

●●
● ●●

−1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0

x x

Figure 1: Top left: data and fitted line. Top right: residuals. Bottom left: a cubic polynomial fit. Bottom
right: residuals.

3
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.6823 0.1124 23.86 <2e-16 ***
x 4.1847 0.1981 21.12 <2e-16 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

Residual standard error: 1.124 on 98 degrees of freedom

Multiple R-squared: 0.8199,Adjusted R-squared: 0.8181
F-statistic: 446.3 on 1 and 98 DF, p-value: < 2.2e-16

> plot(x,y)
> abline(out)
> plot(x,resid(out))
> abline(h=0)
>
> ### cubic regression
> out = lm(y ~ poly(x,3))
> summary(out)

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.63305 0.02976 88.48 <2e-16 ***
poly(x, 3)1 23.73862 0.29759 79.77 <2e-16 ***
poly(x, 3)2 7.22625 0.29759 24.28 <2e-16 ***
poly(x, 3)3 7.93886 0.29759 26.68 <2e-16 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

Residual standard error: 0.2976 on 96 degrees of freedom

Multiple R-squared: 0.9876,Adjusted R-squared: 0.9872
F-statistic: 2555 on 3 and 96 DF, p-value: < 2.2e-16

> plot(x,y)
> points(x,fitted(out),col="blue")
> plot(x,resid(out))
> abline(h=0)
>
> dev.off()

2.3 Another Example

Suppose we use the mobility data. Now we fit the model:

out = lm(Mobility ~ Commute + poly(Latitude,2) + Longitude,data=mobility)

4
The output looks like this:

summary(out)

##
## Call:
## lm(formula = Mobility ~ Commute + poly(Latitude, 2) + Longitude,
## data = mobility)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.12828 -0.02384 -0.00691 0.01722 0.32190
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.0261223 0.0121233 -2.155 0.0315
## Commute 0.1898429 0.0137167 13.840 < 2e-16
## poly(Latitude, 2)1 0.1209235 0.0475524 2.543 0.0112
## poly(Latitude, 2)2 -0.2596006 0.0484131 -5.362 1.11e-07
## Longitude -0.0004245 0.0001394 -3.046 0.0024
##
## Residual standard error: 0.04148 on 724 degrees of freedom
## Multiple R-squared: 0.3828,Adjusted R-squared: 0.3794
## F-statistic: 112.3 on 4 and 724 DF, p-value: < 2.2e-16

We can use predict as usual:

new = data.frame(Commute = 0.298,Latitude = 40.57, Longitude = -79.58)

predict(out, newdata = new)

## 0.07079416

2.4 Which Polynomial?

Smoothness. Polynomial functions vary continuously in all their arguments. In fact, they are
“smooth” in the sense in which mathematicians use that word, meaning that all their derivatives
exist and are continuous, too. This is desirable if you think the real regression function you’re trying
to model is smooth, but not if you think there are sharp thresholds or jumps. Polynomials can
approximate thresholds arbitrarily closely, but you end up needing a very high order polynomial.

Over-fitting and wiggliness. A polynomial of degree d can exactly fit any d points. (Any two
points lie on a line, any three on a parabola, etc.) Using a high-order polynomial, or even summing
a large number of low-order polynomials, can therefore lead to curves which come very close to the
data we used to estimate them, but predict very badly. In particular, high-order polynomials can

5
display very wild oscillations in between the data points. Plotting the function in between the data
points (using predict) is a good way of noting this. We will also look at more formal checks when
we cover cross-validation later in the course.

Picking the polynomial order. The best way to pick the polynomial order is on the basis of
some actual scientific theory which says that the relationship between Y and Xi should, indeed, by
a polynomial of order di . Failing that, carefully examining the diagnostic plots is your next best
bet. Finally, the methods we’ll talk about for variable and model selection in forthcoming lectures
can also be applied to picking the order of a polynomial, though as we will see, you need to be very
careful about what those methods actually do, and whether that’s really what you want.

2.5 Orthogonal Polynomials

There are actually many different polynomials that you can use. Suppose that x ∈ [−1, 1]. Here
are some polynomials:

f0 (x) = 1, f1 (x) = x, f2 (x) = x2 , f3 (x) = x3 , . . .

But here is another set of polynomials:

1 1
g0 (x) = 1, g1 (x) = x, g2 (x) = (3x2 − 1), g3 (x) = (5x3 x), . . .
2 2
The latter polynomials are called Legendre polynomials. These polynomials are orthogonal, mean-
ing that Z 1
gj (x)gk (x)dx = 0
−1
of j 6= k. The advantage of using orthogonal polynomials is that the least squares calculations
are more stable and the standard errors of the coefficients are smaller. However, the final fitted
regression function will be the same whether you use one set of polynomials or another set. In fact,
the poly command in R is actually creating orthogonal polynomials for you.

2.6 Non-Polynomial Basis Functions

The polynomial functions form a basis. Any well-behaved function can be approximated by a linear
combination of polynomials. There are other smooth functons we can use besides polynomials. For
example, we can use sines and cosines:
d
X
γi1j sin (jωXi ) + γi2j cos (jωXi ).
j=1

In this course, we will mostly use polynomials.

3 Categorical Predictors
We often have variables which we think are related to Y which are not real numbers, but are
qualitative rather than quantitative — answers to “what kind?” rather than to “how much?”. For
people, these might be things like gender, race and occupation.

6
Some of these are purely qualitative, coming in distinct types, but with no sort of order or rank-
ing implied; these are often specifically called “categorical”, and the distinct values “categories”.
(The values are also called “levels”, though that’s not a good metaphor without an order.) Other
have distinct levels which can be put in a sensible order, but there is no real sense that the distance
between one level and the next is the same — they are ordinal but not metric. When it is nec-
essary to distinguish non-ordinal categorical variables, they are often called nominal, to indicate
that their values have names but no order.
In R, categorical variables are represented by a special data type called factor, which has as a
sub-type for ordinal variables the data type ordered.
In this section, we’ll see how to include both categorical and ordinal variables in multiple linear
regression models, by coding them as numerical variables, which we know how to handle.

3.1 Binary Categories

The simplest case is that of a binary variable, one which comes in two qualitatively different types.
Suppose, for example, that X1 ∈ {0, 1}. We just include X1 in the regression as usual. This is
called an indicator variable or dummy variable. That is, we code the qualitative categories
as 0 and 1.
We then regress on the indicator variable, along with all of the others, getting the model

Y = β0 + β1 X1 + . . . βp Xp + .

The coefficient β1 is the expected difference in Y between two units which are identical, except
that one of them has X1 = 0 and the other has X1 = 1. That is, it’s the expected difference in the
response between members of the reference category and members of the other category, all else
being equal. For this reason, β1 is often called the contrast between the two classes.

In R. If a data frame has a column which is a two-valued factor already, and it’s included in
the right-hand side of the regression formula, lm handles creating the column of indicator variables
internally.
Here, for instance, we use a classic data set to regress the weight of a cat’s heart on its body
weight and its sex. (If it worked, such a model would be useful in gauging doses of veterinary heart
medicines.)

library(MASS)
data(cats)
out = lm(Hwt ~ Sex + Bwt,data=cats)
summary(out)

##
## Call:
## lm(formula = Hwt ~ Sex + Bwt, data = cats)
##
## Residuals:
## Min 1Q Median 3Q Max

7
## -3.5833 -0.9700 -0.0948 1.0432 5.1016
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.4149 0.7273 -0.571 0.569
## SexM -0.0821 0.3040 -0.270 0.788
## Bwt 4.0758 0.2948 13.826 <2e-16
##
## Residual standard error: 1.457 on 141 degrees of freedom
## Multiple R-squared: 0.6468,Adjusted R-squared: 0.6418
## F-statistic: 129.1 on 2 and 141 DF, p-value: < 2.2e-16

Sex is coded as F and M, and R’s output indicates that it chose F as the reference category.

Diagnostics. The mean of the residuals within each category is guaranteed to be zero but they
should also have the same variance and otherwise the same distribution, so there is still some point
in plotting residuals against X1 .

Inference. There is absolutely nothing special about the inferential statistics for the estimated
contrast βb1 . It works just like inference for any other regression coefficient.

Why not two columns? It’s natural to wonder why we have to pick out one level as the
reference, and estimate a contrast. Why not add two columns to X, one indicating each class? The
problem is that then those two columns will be linearly dependent (they’ll always add up to one),
so the data would be collinear and the model in-estimable.

Why not two slopes? The model we’ve specified has two parallel regression surfaces, with the
same slopes but different intercepts. We could also have a model with the same intercept across
categories, but different slopes for each variable. Geometrically, this would mean that the regression
surfaces weren’t parallel, but would meet at the origin (and elsewhere). We’ll see how to make that
work when we deal with interactions in a few lectures. If we wanted different slopes and intercepts,
we might as well just split the data.

3.2 Categorical Variables with More than Two Levels

Suppose our categorical variable has more than two levels, say k of them. We can handle it in
almost exactly the same way as the binary case. We pick one level — it really doesn’t matter which
— as the reference level. We then introduce k − 1 columns into the design matrix X, which are
indicators for the other categories. If, for instance, k = 3 and the classes are North, South, West,
we pick one level, say North, as the reference, and then add a column XSouth which is 1 for data
points in class South and 0 otherwise, and another column XWest which is 1 for data points in that
class and 0 otherwise.
Having added these columns to the design matrix, we regress as usual, and get k − 1 contrasts.
The over-all β0 is really the intercept for the reference class; the contrasts are added to β0 to get
the intercept for each class. Geometrically, we now have k parallel regression surfaces, one for each
level of the variable. Diagnostics and inference are the same as in the binary case.

8
If you have a variable x in R and you want R to treat it as a factor, use this command:
x = as.factor(x)

Why not k columns? Because, just like in the binary case, that would make all those columns
for that variable sum to 1, causing problems with collinearity.

3.3 Two, Three, Many Categorical Predictors

Nothing in what we did above requires that there be only one categorical predictor; the other
variables in the model could be indicator variables for other categorical predictors. Nor do all the
categorical predictors have to have the same number of categories. The only wrinkle with having
multiple categories is that β0 , the over-all intercept, is now the intercept for individuals where all
categorical variables are in their respective reference levels. Each combination of categories gets its
own regression surface, still parallel to each other.
With multiple categories, it is natural to want to look at interactions — to let their be an
intercept for left-handed little-endian plumber, rather than just adding up contrasts for being left-
handed and being a little-endian and being a plumber. We’ll look at that when we deal with
interactions.

3.4 Analysis of Variance: Only Categorical Predictors

A model in which there are only categorical predictors is, for historical reasons, often called an
analysis of variance model. Estimating such a model presents absolutely no special features
beyond what we have already covered. An “analysis of covariance” model is just a regression with
both qualitative and quantitative predictors.

3.5 Ordinal Variables

An ordinal variable is one where the qualitatively-distinct levels can be put in a sensible order, but
there’s no implication that the distance from one level to the next is constant. At our present level
of sophistication, we have basically two ways to handle them:

1. Ignoring the ordering and treat them like nominal categorical variables.

2. Ignoring the fact that they’re only ordinal, assign them numerical codes (say 1, 2, 3, . . . ) and
treat them like ordinary numerical variables.

The first procedure is unbiased, but can end up dealing with a lot of distinct coefficients. It
also has the drawback that if the relationship between Y and the categorical variable is monotone,
that may not be respected by the coefficients we estimate. The second procedure is very easy, but
usually without any substantive or logical basis. It implies that each step up in the ordinal variable
will predict exactly the same difference in Y , and why should that be the case? If, after treating an
ordinal variable like a nominal one, we get contrasts which are all (approximately) equally spaced,
we might then try the second approach.

9
3.6 R Example
Let’s revisit the mobility data. Let’s add State to the model.

nlevels(mobility$State)

## [1] 51

levels(mobility$State)

## [1] "AK" "AL" "AR" "AZ" "CA" "CO" "CT" "DC" "DE" "FL" "GA" "HI" "IA" "ID"
## [15] "IL" "IN" "KS" "KY" "LA" "MA" "MD" "ME" "MI" "MN" "MO" "MS" "MT" "NC"
## [29] "ND" "NE" "NH" "NJ" "NM" "NV" "NY" "OH" "OK" "OR" "PA" "RI" "SC" "SD"
## [43] "TN" "TX" "UT" "VA" "VT" "WA" "WI" "WV" "WY"

There are 51 levels for State, as there should be, corresponding to the 50 states and the District
of Columbia.
Running a model with State and Commute as the predictors, we therefore expect to get 52
coefficients (1 intercept, 1 slope, and 51-1 = 50 contrasts). R will calculate contrasts from the first
level, which here is AK, or Alaska.

out = lm(Mobility ~ Commute + State, data=mobility)

coefficients(out)

## (Intercept) Commute StateAL StateAR StateAZ StateCA

## 0.018400 0.126000 -0.005600 0.001840 0.007290 0.031500
## StateCO StateCT StateDC StateDE StateFL StateGA
## 0.044100 0.021500 0.071300 0.007460 0.004160 -0.022000
## StateHI StateIA StateID StateIL StateIN StateKS
## 0.029100 0.052200 0.029700 0.013200 0.010000 0.042400
## StateKY StateLA StateMA StateMD StateME StateMI
## 0.011900 0.021100 0.001230 0.018700 0.004710 0.005230
## StateMN StateMO StateMS StateMT StateNC StateND
## 0.055300 0.011900 -0.018700 0.045200 -0.011400 0.146000
## StateNE StateNH StateNJ StateNM StateNV StateNY
## 0.060400 0.032200 0.062700 0.006670 0.045400 0.022300
## StateOH StateOK StateOR StatePA StateRI StateSC
## -0.000559 0.036000 0.013800 0.035000 0.022400 -0.019300
## StateSD StateTN StateTX StateUT StateVA StateVT
## 0.042300 0.000761 0.032200 0.060500 0.014100 0.017300
## StateWA StateWI StateWV StateWY
## 0.025800 0.031700 0.057800 0.061200

When the categorical variable has so many levels you might want to simplify the variable. For
example, you could replace State with a simple variable with a new variable that takes two values:
North and South. This leads to something called a bias-variance tradeoff that we will discuss later.

Ms 236 N 0
No ratings yet
Ms 236 N 0
63 pages
Stats216 hw3 PDF
No ratings yet
Stats216 hw3 PDF
26 pages
36-708 Statistical Machine Learning Homework #4 Solutions: DUE: April 19, 2019
No ratings yet
36-708 Statistical Machine Learning Homework #4 Solutions: DUE: April 19, 2019
16 pages
Polynomial Regression
No ratings yet
Polynomial Regression
15 pages
7 Nonlinear
No ratings yet
7 Nonlinear
48 pages
Chapter 5: Basis Functions and Regularization
No ratings yet
Chapter 5: Basis Functions and Regularization
4 pages
CHAPTER 3- POLYNOMIAL REGRESSION AND INTERACTION
No ratings yet
CHAPTER 3- POLYNOMIAL REGRESSION AND INTERACTION
12 pages
Module05 Notes
No ratings yet
Module05 Notes
19 pages
Lecture 2
No ratings yet
Lecture 2
23 pages
Least Squares Fit To Polynomial
No ratings yet
Least Squares Fit To Polynomial
12 pages
Appendix Nonparametric Regression
No ratings yet
Appendix Nonparametric Regression
17 pages
228371_Lecture_Notes_Week_3
No ratings yet
228371_Lecture_Notes_Week_3
61 pages
Multiple Linear Reegression
No ratings yet
Multiple Linear Reegression
21 pages
ch4 2
No ratings yet
ch4 2
36 pages
ML Regression Documentation
No ratings yet
ML Regression Documentation
7 pages
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
No ratings yet
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
78 pages
Data Analytics Unit 3 Notes
100% (2)
Data Analytics Unit 3 Notes
28 pages
Lecture 11 Regression
No ratings yet
Lecture 11 Regression
53 pages
Notebook 4 - Machine Learning
No ratings yet
Notebook 4 - Machine Learning
17 pages
matlabnoteschap06
No ratings yet
matlabnoteschap06
34 pages
4 - Multiple Linear Regressions
No ratings yet
4 - Multiple Linear Regressions
61 pages
Simple Regression Model Fitting
No ratings yet
Simple Regression Model Fitting
5 pages
Lecture 22: Review For Exam 2 1 Basic Model Assumptions (Without Gaussian Noise)
No ratings yet
Lecture 22: Review For Exam 2 1 Basic Model Assumptions (Without Gaussian Noise)
7 pages
NVT SDS Unit V Final PDF
No ratings yet
NVT SDS Unit V Final PDF
100 pages
Lecture 19: Interactions
No ratings yet
Lecture 19: Interactions
4 pages
Assignment No.4 - (20-Ele-68)
No ratings yet
Assignment No.4 - (20-Ele-68)
17 pages
Statistical Learning
No ratings yet
Statistical Learning
31 pages
intro to regression
No ratings yet
intro to regression
4 pages
Linear Ization 1
No ratings yet
Linear Ization 1
24 pages
BDA Unit 4
No ratings yet
BDA Unit 4
144 pages
Module08 PolynomialRegressionSplineGAMs
No ratings yet
Module08 PolynomialRegressionSplineGAMs
56 pages
Statlearn PDF
No ratings yet
Statlearn PDF
123 pages
Stat 378
No ratings yet
Stat 378
73 pages
4. Curve Fitting & Approximation
No ratings yet
4. Curve Fitting & Approximation
29 pages
CS 2008 3complete PDF
No ratings yet
CS 2008 3complete PDF
53 pages
Supplement 5 - Multiple Regression
No ratings yet
Supplement 5 - Multiple Regression
19 pages
CHAPTER 3- POLYNOMIAL REGRESSION AND INTERACTION
No ratings yet
CHAPTER 3- POLYNOMIAL REGRESSION AND INTERACTION
18 pages
Multiple Regression Okk PDF
No ratings yet
Multiple Regression Okk PDF
19 pages
Mindanao State University General Santos City: Simple Linear Regression
No ratings yet
Mindanao State University General Santos City: Simple Linear Regression
12 pages
Nonlinear Regression
No ratings yet
Nonlinear Regression
8 pages
Machine Learning (CSO851) - Lecture 02
No ratings yet
Machine Learning (CSO851) - Lecture 02
74 pages
Matlab Homework Experts 2
No ratings yet
Matlab Homework Experts 2
10 pages
Regression
No ratings yet
Regression
7 pages
Linear Regression
100% (2)
Linear Regression
228 pages
Chapter 5. Regression Models: 1 A Simple Model
No ratings yet
Chapter 5. Regression Models: 1 A Simple Model
49 pages
Da Unit-3
No ratings yet
Da Unit-3
27 pages
Curve Fitting
No ratings yet
Curve Fitting
18 pages
UNIT-1 Polynomial Regression
No ratings yet
UNIT-1 Polynomial Regression
7 pages
5.2. SE5072_Surrogate Modeling
No ratings yet
5.2. SE5072_Surrogate Modeling
40 pages
What Is Empirical - Models
No ratings yet
What Is Empirical - Models
14 pages
Machine Learning Lecture 1
No ratings yet
Machine Learning Lecture 1
5 pages
(Slide) Non Linear Regression
No ratings yet
(Slide) Non Linear Regression
39 pages
Linear Regression Course
No ratings yet
Linear Regression Course
22 pages
Understanding Polynomial Regression Model
No ratings yet
Understanding Polynomial Regression Model
11 pages
Lecture 09_02.09.2024_Regression-01
No ratings yet
Lecture 09_02.09.2024_Regression-01
62 pages
Data Science Lab 5
No ratings yet
Data Science Lab 5
8 pages
CH 03 Regression Techniques
No ratings yet
CH 03 Regression Techniques
74 pages
Regression 101
No ratings yet
Regression 101
18 pages
Differential Privacy: 1 N I 1 N N
No ratings yet
Differential Privacy: 1 N I 1 N N
7 pages
Linear Classification: 1 1 N N I D I
No ratings yet
Linear Classification: 1 1 N N I D I
33 pages
Linear Regression: 1 1 N N I I I D I I
No ratings yet
Linear Regression: 1 1 N N I I I D I I
20 pages
Density Estimation 36-708
No ratings yet
Density Estimation 36-708
32 pages
Sparse Additive Models: University of California, Berkeley, USA
No ratings yet
Sparse Additive Models: University of California, Berkeley, USA
22 pages
Support Vector Machines
No ratings yet
Support Vector Machines
5 pages
Boosting: I I I I
No ratings yet
Boosting: I I I I
5 pages
Nonparametric Classification 10/36-702: 1 1 N N N I I
No ratings yet
Nonparametric Classification 10/36-702: 1 1 N N N I I
20 pages
Dimension Reduction and Hidden Structure: 1.1 Principal Component Analysis (PCA)
No ratings yet
Dimension Reduction and Hidden Structure: 1.1 Principal Component Analysis (PCA)
40 pages
36-708 Statistical Machine Learning Homework #3 Solutions: DUE: March 29, 2019
No ratings yet
36-708 Statistical Machine Learning Homework #3 Solutions: DUE: March 29, 2019
22 pages
Homework 4 Due Friday April 19 3:00 PM Submit A PDF File On Canvas
No ratings yet
Homework 4 Due Friday April 19 3:00 PM Submit A PDF File On Canvas
2 pages
10/36-702 Statistical Machine Learning Homework #2 Solutions
No ratings yet
10/36-702 Statistical Machine Learning Homework #2 Solutions
11 pages
Data Analysis Project 2 Due 5:00 PM Nov 21 1 Instructions
No ratings yet
Data Analysis Project 2 Due 5:00 PM Nov 21 1 Instructions
3 pages
Online Learning: T T T T T T T T
No ratings yet
Online Learning: T T T T T T T T
8 pages
36-708 Statistical Methods For Machine Learning Homework #1 Solutions
No ratings yet
36-708 Statistical Methods For Machine Learning Homework #1 Solutions
12 pages
High-Dimensional, Two-Sample Testing
No ratings yet
High-Dimensional, Two-Sample Testing
9 pages
Causal Inference: 1.1 Two Types of Causal Questions
No ratings yet
Causal Inference: 1.1 Two Types of Causal Questions
19 pages
Manifold Estimation, Hidden Structure and Dimension Reduction
No ratings yet
Manifold Estimation, Hidden Structure and Dimension Reduction
39 pages
Data Analysis Exam 1 36-401, Section B
No ratings yet
Data Analysis Exam 1 36-401, Section B
3 pages
HW7
No ratings yet
HW7
1 page
High-Dimensional, Two-Sample Testing
No ratings yet
High-Dimensional, Two-Sample Testing
9 pages
A Closer Look at Sparse Regression Ryan Tibshirani: 2.1 Three Norms: ', ', '
No ratings yet
A Closer Look at Sparse Regression Ryan Tibshirani: 2.1 Three Norms: ', ', '
25 pages
Lecture 9: Predictive Inference
No ratings yet
Lecture 9: Predictive Inference
10 pages
Lecture 7: Diagnostics: 36-401, Fall 2017, Section B
No ratings yet
Lecture 7: Diagnostics: 36-401, Fall 2017, Section B
35 pages
36-401 Modern Regression HW #2 Solutions: Problem 1 (36 Points Total)
No ratings yet
36-401 Modern Regression HW #2 Solutions: Problem 1 (36 Points Total)
15 pages
Lecture 8: Inference 36-401, Fall 2015, Section B
No ratings yet
Lecture 8: Inference 36-401, Fall 2015, Section B
16 pages
Lecture 4: Simple Linear Regression Models, With Hints at Their Estimation
No ratings yet
Lecture 4: Simple Linear Regression Models, With Hints at Their Estimation
12 pages
1 Review
No ratings yet
1 Review
7 pages
Nonparametric Regression
No ratings yet
Nonparametric Regression
24 pages
Cs Practical 12
No ratings yet
Cs Practical 12
1 page
Course3 1100
No ratings yet
Course3 1100
69 pages
Chapter 9 - Database Management Systems
No ratings yet
Chapter 9 - Database Management Systems
14 pages
NTU FYP Presentation
No ratings yet
NTU FYP Presentation
46 pages
Step-by-Step Guide - Key Steps in A MIS TLIF Procedure PDF
100% (2)
Step-by-Step Guide - Key Steps in A MIS TLIF Procedure PDF
11 pages
Kimo Debimo Airflow Blade Data Sheet
No ratings yet
Kimo Debimo Airflow Blade Data Sheet
4 pages
Information & Communication Technology: Student Seminar Series - 201 - 2018
No ratings yet
Information & Communication Technology: Student Seminar Series - 201 - 2018
16 pages
04 - QC - NCQC KT Practice Paper 04 - 16-Dec-2
No ratings yet
04 - QC - NCQC KT Practice Paper 04 - 16-Dec-2
5 pages
8118 28 Im WLANHandsetInstallation 8AL90832ENAA 2 en
No ratings yet
8118 28 Im WLANHandsetInstallation 8AL90832ENAA 2 en
47 pages
Design and Analysis of Gravity Dam - A Case Study Analysis Using Staad Pro 1
No ratings yet
Design and Analysis of Gravity Dam - A Case Study Analysis Using Staad Pro 1
11 pages
CHAPTER 2 Basic Microeconomics
No ratings yet
CHAPTER 2 Basic Microeconomics
19 pages
Mathematical Notation - Big Oh, Omega and Theta_
No ratings yet
Mathematical Notation - Big Oh, Omega and Theta_
14 pages
Experimental Study of CH4/O2/CO2 Mixtures Flammability
No ratings yet
Experimental Study of CH4/O2/CO2 Mixtures Flammability
7 pages
Fruit Shop
No ratings yet
Fruit Shop
6 pages
Anisah, A., Winanda, V., & Herawan, E. (2023)
No ratings yet
Anisah, A., Winanda, V., & Herawan, E. (2023)
10 pages
1 C7000随机文件新型规（英文）R15011937
No ratings yet
1 C7000随机文件新型规（英文）R15011937
45 pages
ZEISS CALYPSO 2021 Flyer Action Software Options EN
No ratings yet
ZEISS CALYPSO 2021 Flyer Action Software Options EN
2 pages
KOM6115 Assignment 2 (GS65807)
No ratings yet
KOM6115 Assignment 2 (GS65807)
12 pages
Pragya Girls School
No ratings yet
Pragya Girls School
8 pages
Shaft sinking
No ratings yet
Shaft sinking
61 pages
Troxler 3440 App Brief
No ratings yet
Troxler 3440 App Brief
4 pages
T2900 Manual Install Rev 1
No ratings yet
T2900 Manual Install Rev 1
24 pages
A Literature Review On Application of Sentiment Analysis Using Machine Learning Techniques
No ratings yet
A Literature Review On Application of Sentiment Analysis Using Machine Learning Techniques
38 pages
About Dial Gaug-WPS Office
No ratings yet
About Dial Gaug-WPS Office
6 pages
Prpm113: Pharmaceutical Biochemistry: Midterms Coverage
No ratings yet
Prpm113: Pharmaceutical Biochemistry: Midterms Coverage
6 pages
Consolidated Setup Rules - Dungeon Lords
No ratings yet
Consolidated Setup Rules - Dungeon Lords
1 page
TAMU MEEN 431 HW 1 Solutions
No ratings yet
TAMU MEEN 431 HW 1 Solutions
10 pages
Chapter 4 - Gross Plastic Deformation Design Check (GPD-DC)
No ratings yet
Chapter 4 - Gross Plastic Deformation Design Check (GPD-DC)
13 pages
The Effects of Portfolio Assessment On Writing of EFL Students
No ratings yet
The Effects of Portfolio Assessment On Writing of EFL Students
11 pages
GPAT 13 P'ceutics
No ratings yet
GPAT 13 P'ceutics
29 pages

Lecture 16: Polynomial and Categorical Regression 1 Review

Uploaded by

Lecture 16: Polynomial and Categorical Regression 1 Review

Uploaded by

Lecture 16: Polynomial and Categorical Regression

These are unbiased, with variance σ 2 (XT X)−1 . Under

2 Adding Curvature: Polynomial Regression

Y = β0 + β1 X1 + β2 X12 + . . . βd X1d + βd+1 X2 + . . . βp+d−1 Xp + 

A second method is:

out = lm(y ~ x + I(x^2))

out = lm(y ~ poly(x,2))

which tells R to use a second order polynomial.

Residual standard error: 1.124 on 98 degrees of freedom

Residual standard error: 0.2976 on 96 degrees of freedom

2.3 Another Example

out = lm(Mobility ~ Commute + poly(Latitude,2) + Longitude,data=mobility)

We can use predict as usual:

new = data.frame(Commute = 0.298,Latitude = 40.57, Longitude = -79.58)

2.4 Which Polynomial?

2.5 Orthogonal Polynomials

f0 (x) = 1, f1 (x) = x, f2 (x) = x2 , f3 (x) = x3 , . . .

But here is another set of polynomials:

2.6 Non-Polynomial Basis Functions

In this course, we will mostly use polynomials.

3.1 Binary Categories

3.2 Categorical Variables with More than Two Levels

3.3 Two, Three, Many Categorical Predictors

3.4 Analysis of Variance: Only Categorical Predictors

3.5 Ordinal Variables

out = lm(Mobility ~ Commute + State, data=mobility)

## (Intercept) Commute StateAL StateAR StateAZ StateCA

You might also like

Y = β0 + β1 X1 + β2 X12 + . . . βd X1d + βd+1 X2 + . . . βp+d−1 Xp +