Lecture 2 Regression Approach
Lecture 2 Regression Approach
New data
2020 60 3.2 5.5 ???
Revision on Regression Analysis
• Goal: Guess Y for the new data.
New data:
2021 180 ???
Revision on Simple Linear Regression
Revision on Simple Linear Regression
• Linear model: 𝑌𝑡 = 𝑎 + 𝑏𝑋𝑡 + 𝐸𝑟𝑟𝑜𝑟 (a=intercept, b=slope)
• # Making predictions
predict(result, newdata)
predict(result, newdata, interval="prediction")
Revision on Simple Linear Regression
• Simple linear regression: Finding 𝑎ො and 𝑏 so that the sum of squared
errors is the smallest
𝑛
2
𝑆𝑆 𝑎, 𝑏 = 𝑌𝑡 − 𝑎 − 𝑏𝑋𝑡
𝑡=1
Revision on Simple Linear Regression
• Mathematical details:
𝑑 𝑑
• Set 𝑑𝑎
𝑆𝑆 𝑎, 𝑏 = 𝑑𝑏
𝑆𝑆 𝑎, 𝑏 = 0
σ𝑛 ത ത
𝑡=1(𝑋𝑡 −𝑋)(𝑌𝑡 −𝑌)
• After some algebraic manipulations, one gets 𝑏 = σ𝑛 ത 2
and 𝑎 =
𝑡=1 𝑋𝑡 −𝑋
𝑌ത − 𝑏𝑋,
ത where 𝑋ത and 𝑌ത are the sample averages.
• To avoid confusion, usually the symbols 𝑎ො and 𝑏 are used for the above
solution.
Revision on Multiple Linear Regression
• Problem: Predict Y from X1, X2, X3, …, Xp
• 𝑌 = 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + ⋯ + 𝛽𝑝 𝑋𝑝 + 𝑒𝑟𝑟𝑜𝑟
• Data:
• savings.csv
• Data of 50 countries collected over the period 1960-1970
• Belsley, Kuh, and Welsch (1980) Regression Diagnostics. Wiley.
• Predictors
• X1 = pop15 = percent population under age of 15
• X2 = pop75 = percent population over age of 75
• X3 = dpi = per capita disposable income in US dollars
• X4 = ddpi = percent growth rate of dpi
Revision on Multiple Linear Regression
• savings = read.csv("savings.csv",header=T)
• lm(sr~pop15+pop75+dpi+ddpi , savings)
• newdata = data.frame(country="Erehwon",
pop15=30, pop75=3, dpi=2000, ddpi=5)
• predict(result, newdata)
Revision on Multiple Linear Regression
• newdata = data.frame(
pop15=c(30,25), pop75=c(3,4), dpi=c(2000,2000),
ddpi=c(5,5))
• The mathematical details are NOT the focus of this course and are
omitted. Interested students may consult textbooks on regression.
Time Series Analysis – Regression Approach
Time Series Analysis – Regression Approach
• Trend models:
• Linear: 𝑚𝑡 = 𝑎 + 𝑏𝑡
• Quadratic: 𝑚𝑡 = 𝑎 + 𝑏𝑡 + 𝑐𝑡 2
• Exponential: 𝑚𝑡 = 𝑎𝑏 𝑡 or log 𝑚𝑡 = 𝛼 + 𝛽𝑡
Time Series Analysis – Regression Approach
• In R, one can use dim(gtemp)[1] and dim(gtemp)[2] to retrieve the
size of the dataset.
• gtemp = read.csv("gtemp.csv",header=T)
• n = dim(gtemp)[1]
• year = 1:n
• gtemp = data.frame(gtemp,
time=year, time2=year^2)
Time Series Analysis – Regression Approach
• Exercise: Annual revenues of a company
Year 2007 2008 2009 2010 2011 2012 2013
• Log-linear: log 𝑚𝑡 = 𝑎 + 𝑏𝑡
• Log-quadratic: log 𝑚𝑡 = 𝑎 + 𝑏𝑡 + 𝑐𝑡 2
• Try to predict the revenue in Year 2023
Time Series Analysis – Regression Approach
• Causal regression models:
• salesdata = data.frame(
year = year[2:6],
adv.lag = advertisement[1:5],
sales = sales[2:6]
)
• newdata = data.frame(
year = 2021, adv.lag = advertisement[6]
)