0% found this document useful (0 votes)
2 views23 pages

Lec 1 Course Overview

STAT 4130J Applied Regression Analysis is a course focused on understanding relationships between variables using regression techniques, including simple and multiple linear regression. The course covers model diagnostics, variable transformations, and various regression models, with an emphasis on practical applications in fields like science, finance, and social science. Grading includes homework, quizzes, a midterm, and a final project, with resources provided for further study.

Uploaded by

jorokangjoestar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views23 pages

Lec 1 Course Overview

STAT 4130J Applied Regression Analysis is a course focused on understanding relationships between variables using regression techniques, including simple and multiple linear regression. The course covers model diagnostics, variable transformations, and various regression models, with an emphasis on practical applications in fields like science, finance, and social science. Grading includes homework, quizzes, a midterm, and a final project, with resources provided for further study.

Uploaded by

jorokangjoestar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

STAT 4130J Applied Regression Analysis

Lecture 1: Course overview


Agenda
● Meet your teaching team
● Course Introduction
● Course Logistics
● Meet your classmates
Meet your teaching team

Ailin Zhang
- Background: PhD in Geophysics from UCLA (2019). Formerly data scientist @
ExxonMobil
- Research Interests: Data-driven solutions to geoscience problems: earthquake rupture,
seismic signal processing.
- OH: Wed @ Longbin building 437B by appointment
- I am teaching the following courses in the semester
- STAT 4710: Data Science and Analytics with Python
- STAT 4130: Applied Regression Analysis
- STAT 1000: Data Science for Future Engineers
Related courses:
In JI:
STAT 4130 Applied Regression Analysis
STAT 4710 Data Science and Analytics using Python
STAT 4060 Computational Methods for Statistics and Data Science
STAT 4510 Bayesian Analysis

In UCB:
STAT 150 Stochastic Processes
MATH 128A Numerical Analysis
CS 182 Designing, Visualizing and Understanding Deep Neural
Networks

Yihong Chen
Area of Interests:
Email: [email protected] ML/DL (Time series prediction and LLM)
MBTI: INTJ Quantitative Finance (especially HFT)

You can contact me through Feishu


Regression Analysis

● Goal: Infer relationships between


○ a single variable (called response
variable) and
○ one or more other variables (often
called predictors or explanatory
variables) from data.

● Broad applications in
○ Science and engineering
○ Social Science
○ Finance
○ Epidemiology
○ Psychology and Education
Course Introduction

● Regression analysis is used to estimate the relationships between variables and includes many
different techniques.
● We will determine if the one of more predictor/independent variables have a significant effect on the
response/ dependent variable.
● Linear regression serves as the basis for other types of regression.

● This course focuses on the topic of regression, covering


o simple and multiple linear regression,
o use of categorical variables in regression,
o model diagnostics,
o variable transformations,
o nonlinear regression techniques.
Let’s start with an example

● Consider the following dataset which contains weights


and heights of 507 physically active individuals - 247 men
and 260 women.
(Data source:
http://www.amstat.org/publications/jse/datasets/bod
y.dat )

Just by looking at the figure, we can conclude….


● There seems to be roughly a linear relationship between
the two variables, weight (y) and height (x).
● However, there is also uncertainty in y at each value of x.
Let’s start with an example (Continued)

Mathematically, we can write down the following equation:

𝑦 = 𝛽! + 𝛽"𝑥 + 𝜖

Where:
• 𝑥 : predictor, regressor, or independent variable,
• 𝑦 : response, or dependent variable
• 𝜖 : error accounting for the variability in 𝑦
• 𝛽!, 𝛽" : parameters (to be determined)

This is called a simple linear regression model!


Let’s start with an example (Continued)

Let’s keep adding statistical assumptions:

● Suppose that for each fixed height 𝑥, the error follows the same normal
distribution
𝜖 ∼ N 0, 𝜎 #

This implies that:

○ 𝒚 ∣ 𝒙 ∼ 𝐍 𝜷𝟎 + 𝜷𝟏 𝒙, 𝝈𝟐 with

• E(𝑦 ∣ 𝑥) = 𝛽! + 𝛽"𝑥
• Var(𝑦 ∣ 𝑥) = 𝜎 #
Let’s start with an example (Continued)

Parameter estimation

● Once a regression model is specified, the next step is to choose the values of the
unknown parameters (e.g., 𝛽!, 𝛽" in the simple linear regression model) based on
a set of observations 𝑥", 𝑦" , … , 𝑥' , 𝑦' .

● This process is called fitting the model to the data.

● There are different ways to find the "optimal" values of the parameters:

o Method of Least Squares


o Maximum Likelihood Estimation
Let’s start with an example (Continued)

Model adequacy checking


● The major assumptions that we have made for regression analysis are
• The relationship between the response y and the regressor x is linear, at
least approximately.
• The errors are iid Gaussian with zero mean and constant variance (at
different values of x)

● We should always consider the validity of these assumptions to be doubtful


and conduct analyses to examine the adequacy of the model we have
tentatively built.
Multiple linear regression

● Sometimes, a response variable may depend linearly on more than one explanatory variable,
leading to the task of multiple linear regression.

● For example, in the weight-height example, we may add age and gender to the regression
model:

○ 𝑦 = 𝛽! + 𝛽" 𝑥⏟" + 𝛽# 𝑥⏟# + 𝛽( 𝑥⏟( +𝜖

weight age gender


● Note that 𝑥! is a categorical variable (male/ female).
Topics to be discussed

● Simple linear regression


● Linear Algebra
● Multiple linear regression
● Dummy variables
● Analysis of variance and covariance
● Model building techniques, evaluating model fit, and dealing with violations of model assumptions.
● Model Selection
● Collinearity
● Other Regression models
○ Generalized linear models
○ Nonlinear regression
○ Time-series regression
○ Nonparametric regression
● Data analysis with R
The bigger picture

● Regression Analysis
(Linear models) is the
gateway to statistics and
machine learning

● Foundational ideas of
statistical and machine
learning thinking, viewed
through the lens of linear
models.
● https://medium.com/@_asabovesobel
ow_/remembering-ludwig-wittgenstein-
in-the-age-of-ai-3364cc3dc92d
Commonly Asked Questions
● What is the difference between STAT 4130 and STAT 4060?
○ In STAT 4060, we are more focused on the computational aspects of different algorithms, i.e. code
up linear regression from scratch, how to numerically address the inverse of a matrix etc.,
○ In STAT 4130, we using computational languages as a tool to help us explain the relationship
between variables: you are welcome to use all built-in libraries! But you need to very precise about
the interpretation of models.

● I am interested in machine learning, can I learn machine learning in STAT 4130?


○ Yes, but probably not the best fit. We will provide high-level introduction to some modern regression
methods (machine learning models) in the second half of the lecture, but that is not our primary
focus.
○ We will cover 60% linear regression and 40% other regression models in this course.

● Prerequisites
○ Probability and statistics
○ Linear algebra
○ Coding
Course Logistics (Things are evolving!)

● References: We don’t have an official textbook, but here are some references
○ B. Abraham and J. Ledolter, Introduction to Regression Modeling. Duxbury Press, 2006
○ Fox, John. Applied regression analysis and generalized linear models. Sage Publications, 2015.
○ Weisberg, Sanford. Applied linear regression. Vol. 528. John Wiley & Sons, 2005.
○ Hadi, Ali S., and Samprit Chatterjee. Regression analysis by example. John Wiley & Sons, 2015.
○ James, Gareth, et al. An introduction to statistical learning. Vol. 112. New York: springer, 2013. (Recommended
for the second half)
Study notes and Slides

● Lecture notes will be updated regularly on Canvas


Grading
● I will guarantee that >30% of students will get grades A/A+.
○ 5% Homework
○ 5% R demo
○ 30% Quiz (Biweekly)
○ 40% Midterm
■ Around week 8
■ Closed book, closed note
○ 20% Final Project
■ Due by the end of week 11
○ 3%* Extra Credit
■ 1 pt for course evaluation
■ 2 pt for proof reading lecture notes

Using AI tools in your homework and project is fine, but you need state it somewhere in your work.
R demo

● We are looking for R examples for each chapter in the handout.


● Please work in a group to preform a comprehensive data analysis that includes important topics
discussed in the chapter.
● We will assign chapters for each team later in the semester.
● Nice work will be selected and presented in the handout (Your contribution will be acknowledged!)
Quiz
● We will have bi-weekly quizzes scheduled on Thursdays for even weeks.
● The quiz will be distributed at the beginning of the lecture (You need to bring your laptop).
● Open-book, open-note, no internet access, 15-20 min.
● You can only work on the quiz in the classroom and turn in your work on paper.
● We don’t allow online participation for the quiz. Unable to submit the on paper will get no credits.
● No make-up quiz
Project
● A group of 2-3 students find a real-world problem to analyze and summarize.
● We will use the last week for group presentation.
● Submit a final report that will be due a week before the end of the class.
● Peer review: each class member will receive a report from another group, and evaluate the
analysis and conclusions of the project.
● Final grade for the project will be based on your presentation (graded by TA and instructor),
and peer-reviewed report (graded by peers and instructor).
Meet you!

• Name, Year, Major


• What is your expectation for taking this course? (Why STAT 413?)
• Your background in R and Linear Algebra
• Any other questions?

You might also like