0% found this document useful (0 votes)
7 views3 pages

Stat 324: Lecture 22 Nonlinear Regression: Moo K. Chung Mchung@stat - Wisc.edu April 26, 2005

The lecture discusses nonlinear regression, specifically polynomial regression models, using an example of grain yield data over days. It covers model fitting, coefficients estimation, and the significance of polynomial degrees, as well as introducing logistic regression for binary outcomes. The document also includes practical exercises and R code for implementation.

Uploaded by

bassirou ndao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views3 pages

Stat 324: Lecture 22 Nonlinear Regression: Moo K. Chung Mchung@stat - Wisc.edu April 26, 2005

The lecture discusses nonlinear regression, specifically polynomial regression models, using an example of grain yield data over days. It covers model fitting, coefficients estimation, and the significance of polynomial degrees, as well as introducing logistic regression for binary outcomes. The document also includes practical exercises and R code for implementation.

Uploaded by

bassirou ndao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Stat 324: Lecture 22

Nonlinear regression
Moo K. Chung
[email protected]
April 26, 2005

1. Example. Data showing nonlinearity.

> xmp13.07
days yield
1 16 2508
2 18 2518
...
15 44 3103
16 46 2776

>polyfit<-lm(yield˜days+I(daysˆ2))
>beta <- coef(polyfit)
> beta
(Intercept) days I(daysˆ2)
-1070.397689 293.482948 -4.535802

x<-15:45
y <- beta[1] + beta[2]*x + beta[3]*xˆ2
lines(x,y)

> summary(polyfit)

Call: lm(formula = yield ˜ days + I(daysˆ2))

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1070.3977 617.2527 -1.734 0.107
days 293.4829 42.1776 6.958 9.94e-06 ***
I(daysˆ2) -4.5358 0.6744 -6.726 1.41e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 203.9 on 13 degrees of freedom Multiple


R-Squared: 0.7942, Adjusted R-squared: 0.7625 F-statistic: 25.08
on 2 and 13 DF, p-value: 3.452e-05

2. We will consider a general model that includes a linear model as a special case. For given n pared measurements
(xj , yj ), we will fit the following model:
Yi = φ(xi ) + ²i ,
where ²i ∼ N (0, σ 2 ). Some possible model choice would be φ(x) = β0 + β1 x (linear), φ(x) = β0 + β1 x + β2 x2
(quadratic). The k-th degree polynomial regression model is given by

φ(xj ) = β0 + β1 xj + · · · + βk xkj .
3800

3800
3400

3400
yield

yield
3000

3000
2600

2600
15 20 25 30 35 40 45 15 20 25 30 35 40 45

days days

Figure 1: Grain yield vs. the number of days after flowering. Left: quadratic model. Right: cubic model. Since do not see
much difference, the cubic model is not suitable for this data.

In general φi (x) is expressed as


m
X
φ(x) = βi φi (x),
i=0

where φi are basis functions such as φi (x) = xi and we estimate βi via the least squares estimation. We need to solve
the system of the linear equation
    
φ1 (x1 ) · · · φm (x1 ) β1 y1
 φ1 (x2 ) · · · φm (x2 )   β2   y2 
  = 
 ··· ··· ···  ···   ··· .
φ1 (xn ) · · · φm (xn ) βm yn

It can be written as y = Xβ where X is rectangular. Then

X 0 Xβ = X 0 y

If there are more data then the number of basis functions, i.e. n ≥ m + 1, X 0 X is invertible. So we get

β̂ = (X 0 X)−1 X 0 y.
2
Popular choice for φi is Hermite polynomials Hi (x) which forms orthogonal polynomial basis with respect to e−x in
R, i.e. Z ∞
2 √
Hi (x)Hj (x)e−x dx = δij 2j j! π.
−∞

The first few Hermite polynomials are H0 (x) = 1, H1 (x) = 2x, H2 (x) = 4x2 − 2.
SSE
3. For inference on βj , we estimate σ̂ 2 = n−(m+1) . Also R2 is now called the coefficient of multiple determination and
it measures the goodness of the polynomial model fit. If SSEm is the SSE of the m-th degree model, we can show
2 2
that SSEm+1 ≤ SSEm and Rm+1 ≥ Rm . If Sβ̂j is the estimator for σβ̂j ,

β̂j − βj
T = ∼ tn−(m+1) .
Sβ̂j

Determining the degree of a polynomial model is done by performing a test on H0 : βm = 0 vs. H1 : βm 6= 0.

y <- beta[1] + beta[2]*x + beta[3]*xˆ2+beta[4]*xˆ3


> summary(polyfit)

2
Call: lm(formula = yield ˜ days + I(daysˆ2) + I(daysˆ3))

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -203.60852 2285.13020 -0.089 0.930
days 199.07674 242.92513 0.819 0.428
I(daysˆ2) -1.32071 8.16843 -0.162 0.874
I(daysˆ3) -0.03457 0.08751 -0.395 0.700

Residual standard error: 210.8 on 12 degrees of freedom Multiple


R-Squared: 0.7968, Adjusted R-squared: 0.746 F-statistic: 15.68
on 3 and 12 DF, p-value: 0.0001876

4. Logistic regression. Consider model Yj = β0 + β1 βj + ²j with E²j = 0, V²j = σ 2 . If the response variable Yj is
distributed as Bernoulli(p) with the probability of failure p, the above linear model is no longer appropriate since

EYj = p(xj ) = β0 + β1 xj

but β0 + β1 xj ma not be in the range [0, 1]. To address this problem, we introduce a nonlinear model on the probability
of failure p:
eβ0 +β1 xj
p(x) = .
1 + eβ0 +β1 xj
This is called the logit function. Note that the odd ratio is given by

p(x)
= eβ0 +β1 x .
1 − p(x)

We estimate the parameters via MLE. For Yj ∼ Bernoulli(p(xj )),


n ³
eβ0 +β1 xj ´y Y ³ ´1−yj
n
Y Y n
1
L(β0 , β1 ) = P (Yj = yj ) = β +β x β +β x
.
j=1 j=1
1 + e 0 1 j j j=1 1 + e 0 1 j

> xmp13.06
Temperature Failure
1 53 Y
2 56 Y
3 57 Y
...
oring<-glm(Failure˜Temperature,family=binomial(link=logit))
beta <- coef(oring)
x<-40:90
p <- exp(beta[1] + beta[2]*x)/(1+ exp(beta[1] + beta[2]*x))
plot(x,p,type="l")

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 11.74641 6.02142 1.951 0.0511 .
Temperature -0.18843 0.08909 -2.115 0.0344 *

Assigned problems. Exercise 13.32., Exercise 13.25 use R package to generate a similar computer output. HW 6 due April
28th. Solve the assigned problems in Lecture 19-22.

You might also like