Lec 1 Course Overview
Lec 1 Course Overview
Ailin Zhang
- Background: PhD in Geophysics from UCLA (2019). Formerly data scientist @
ExxonMobil
- Research Interests: Data-driven solutions to geoscience problems: earthquake rupture,
seismic signal processing.
- OH: Wed @ Longbin building 437B by appointment
- I am teaching the following courses in the semester
- STAT 4710: Data Science and Analytics with Python
- STAT 4130: Applied Regression Analysis
- STAT 1000: Data Science for Future Engineers
Related courses:
In JI:
STAT 4130 Applied Regression Analysis
STAT 4710 Data Science and Analytics using Python
STAT 4060 Computational Methods for Statistics and Data Science
STAT 4510 Bayesian Analysis
In UCB:
STAT 150 Stochastic Processes
MATH 128A Numerical Analysis
CS 182 Designing, Visualizing and Understanding Deep Neural
Networks
Yihong Chen
Area of Interests:
Email: [email protected] ML/DL (Time series prediction and LLM)
MBTI: INTJ Quantitative Finance (especially HFT)
● Broad applications in
○ Science and engineering
○ Social Science
○ Finance
○ Epidemiology
○ Psychology and Education
Course Introduction
● Regression analysis is used to estimate the relationships between variables and includes many
different techniques.
● We will determine if the one of more predictor/independent variables have a significant effect on the
response/ dependent variable.
● Linear regression serves as the basis for other types of regression.
𝑦 = 𝛽! + 𝛽"𝑥 + 𝜖
Where:
• 𝑥 : predictor, regressor, or independent variable,
• 𝑦 : response, or dependent variable
• 𝜖 : error accounting for the variability in 𝑦
• 𝛽!, 𝛽" : parameters (to be determined)
● Suppose that for each fixed height 𝑥, the error follows the same normal
distribution
𝜖 ∼ N 0, 𝜎 #
○ 𝒚 ∣ 𝒙 ∼ 𝐍 𝜷𝟎 + 𝜷𝟏 𝒙, 𝝈𝟐 with
• E(𝑦 ∣ 𝑥) = 𝛽! + 𝛽"𝑥
• Var(𝑦 ∣ 𝑥) = 𝜎 #
Let’s start with an example (Continued)
Parameter estimation
● Once a regression model is specified, the next step is to choose the values of the
unknown parameters (e.g., 𝛽!, 𝛽" in the simple linear regression model) based on
a set of observations 𝑥", 𝑦" , … , 𝑥' , 𝑦' .
● There are different ways to find the "optimal" values of the parameters:
● Sometimes, a response variable may depend linearly on more than one explanatory variable,
leading to the task of multiple linear regression.
● For example, in the weight-height example, we may add age and gender to the regression
model:
● Regression Analysis
(Linear models) is the
gateway to statistics and
machine learning
● Foundational ideas of
statistical and machine
learning thinking, viewed
through the lens of linear
models.
● https://medium.com/@_asabovesobel
ow_/remembering-ludwig-wittgenstein-
in-the-age-of-ai-3364cc3dc92d
Commonly Asked Questions
● What is the difference between STAT 4130 and STAT 4060?
○ In STAT 4060, we are more focused on the computational aspects of different algorithms, i.e. code
up linear regression from scratch, how to numerically address the inverse of a matrix etc.,
○ In STAT 4130, we using computational languages as a tool to help us explain the relationship
between variables: you are welcome to use all built-in libraries! But you need to very precise about
the interpretation of models.
● Prerequisites
○ Probability and statistics
○ Linear algebra
○ Coding
Course Logistics (Things are evolving!)
● References: We don’t have an official textbook, but here are some references
○ B. Abraham and J. Ledolter, Introduction to Regression Modeling. Duxbury Press, 2006
○ Fox, John. Applied regression analysis and generalized linear models. Sage Publications, 2015.
○ Weisberg, Sanford. Applied linear regression. Vol. 528. John Wiley & Sons, 2005.
○ Hadi, Ali S., and Samprit Chatterjee. Regression analysis by example. John Wiley & Sons, 2015.
○ James, Gareth, et al. An introduction to statistical learning. Vol. 112. New York: springer, 2013. (Recommended
for the second half)
Study notes and Slides
Using AI tools in your homework and project is fine, but you need state it somewhere in your work.
R demo