0% found this document useful (0 votes)
20 views39 pages

Slides

The document provides a comprehensive overview of Simple Regression, covering its definition, purpose, types, applications, and challenges. It details the process of model building, evaluation metrics, and interpretation techniques, emphasizing the importance of data preparation and analysis. Additionally, it discusses practical applications across various fields and highlights limitations such as multicollinearity and the impact of outliers.

Uploaded by

zhihanyu3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views39 pages

Slides

The document provides a comprehensive overview of Simple Regression, covering its definition, purpose, types, applications, and challenges. It details the process of model building, evaluation metrics, and interpretation techniques, emphasizing the importance of data preparation and analysis. Additionally, it discusses practical applications across various fields and highlights limitations such as multicollinearity and the impact of outliers.

Uploaded by

zhihanyu3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 39

Simple

Regression
Qiyu Wang 2025-03-05
• Introduction to Simple Regression
• Basic Concepts and Formulas
• Data Preparation & Analysis

目录 • Model Building & Evaluation Metrics


• Interpretation & Inference Techniques
• Practical Applications & Case Studies
• Challenges & Limitations in Simple
Regression
01
Introduction to Simple
Regression
Definition and Purpose

Definition Purpose Importance


Simple Regression is a statistical The main purpose of simple Simple regression is a
method that examines the regression is to find a fundamental tool in statistical
relationship between two mathematical function that best analysis, providing insights into
variables, typically denoted as X describes the relationship the nature of relationships
and Y, where one variable (X) is between the predictor and the between variables and allowing
the predictor (independent response variables, enabling for predictions and decision-
variable) and the other (Y) is the predictions and understanding of making based on data.
response (dependent variable). the data.
Types of Simple Regression

Simple regression can be classified into two main types based on the nature of the relationship
between the variables:

Linear Regression: This type of regression assumes a linear relationship between the predictor
and the response variables, i.e., a straight line can be fitted to the data points. It is the most
commonly used type of simple regression.

Non-linear Regression: In this type of regression, the relationship between the predictor and the
response variables is not linear. Various forms of non-linear functions can be used to describe the
relationship, such as polynomial, exponential, or logarithmic functions.
Applications in Various Fields

Business and Economics


Sales Prediction: Simple regression can be used to predict sales based on various factors such as advertising
expenditure, pricing strategies, or market trends.
Economic Forecasting: Economists use simple regression to forecast economic indicators such as GDP,
inflation, or unemployment rates based on historical data and relevant predictors.
Applications in Various Fields

Social Sciences
Behavioral Analysis: Simple regression can help researchers understand the relationship between
different behaviors, such as the impact of education on income or the relationship between age and
voting patterns.
Survey Research: In survey research, simple regression can be used to analyze the relationship
between survey responses and various demographic variables, such as age, gender, or income.
Science and Engineering
Experimental Data Analysis: Scientists and engineers often use simple regression to analyze
experimental data and understand the relationship between different variables, such as the effect of
temperature on chemical reactions or the relationship between pressure and volume in a gas.
Quality Control: Simple regression can be used in quality control processes to identify relationships
between different variables and to predict the quality of products based on certain characteristics.
02
Basic Concepts and
Formulas
Independent and Dependent Variables
01 Independent Variable
A variable whose variation does not depend on the variation
of another variable; it is the presumed cause.

02 Dependent Variable
A variable whose variation is dependent on the variation of
another variable; it is the presumed effect.

03 Examples
In a study on the relationship between hours of study and
test scores, hours of study would be the independent variable
and test scores would be the dependent variable.
Scatter Plot Diagram
Scatter Plot
A graph that displays the relationship between
two variables; each data point is plotted as a
dot on the graph.
Trend Line
A line that summarizes the direction of the
relationship between two variables; it is often
used to identify patterns in scatter plots.
Positive/Negative Correlation
A positive correlation indicates that as one
variable increases, the other variable also
increases; a negative correlation indicates that
as one variable increases, the other variable
decreases.
Linear Equation & Regression Line

Linear Equation: An equation that represents a Regression Line: A line that is calculated to best
straight line; in the context of regression, it is fit the data points in a scatter plot; it is used to
used to model the relationship between two predict the value of the dependent variable based
variables. on the value of the independent variable.

Slope: The measure of the steepness of the Intercept: The point where the regression line
regression line; it indicates the change in the crosses the Y-axis; it represents the value of the
dependent variable for a one-unit change in the dependent variable when the independent
independent variable. variable is zero.
Residual Analysis

01 02 03 04

Residual: The difference Residual Plot: A graph Patterns in Residuals: If Outliers: Data points
between the observed that shows the the residuals show a that lie far away from
value of the dependent residuals on the vertical pattern, it suggests that the regression line; they
variable and the value axis and the the relationship can have a large impact
predicted by the independent variable between the variables on the slope and
regression line. on the horizontal axis; it is not linear; this may intercept of the
is used to check for indicate that a more regression line and
patterns or outliers in complex model is should be carefully
the data. needed. examined.
03
Data Preparation &
Analysis
Data Collection Methods

Surveys and questionnaires


Collecting data through surveys or questionnaires can provide a large
amount of information about the variables of interest.

Experimental data
Controlled experiments can provide high-quality data for regression
analysis, allowing for the manipulation of independent variables.

Observational data
Observational data, collected through natural or real-world settings,
can provide insights into relationships between variables.
Data Cleaning & Processing Techniques

Missing data handling

Techniques such as imputation or deletion can be used to


address missing data.

Data normalization

Scaling data to a common range can help to improve the


accuracy of the regression model.

Data deduplication

Removing duplicate data points can help to ensure the


accuracy of the regression analysis.
Outliers Detection & Treatment

Visual inspection
Outliers can be identified through visual inspection of scatterplots or other
graphical representations of the data.

Statistical methods
Statistical methods, such as the Z-score or IQR method, can be used to detect
outliers.

Treatment options
Outliers can be removed, transformed, or kept in the dataset depending on their
nature and impact on the regression model.
Data Transformation Strategies

Linear transformation
Techniques such as log transformation or square root transformation can help to linearize
relationships between variables.

Polynomial transformation
When the relationship between variables is non-linear, polynomial transformation can help to
model the curvature.

Dummy variables
For categorical variables, dummy variables can be created to represent the different
categories in the regression model.
04
Model Building & Evaluation
Metrics
Simple Linear Regression Model Building

Data Preparation Model Specification


Collecting, cleaning, and preparing the data Defining the relationship between the
for analysis. dependent and independent variables.

Parameter Estimation Model Implementation


Calculating the coefficients (slope and Using the model to make predictions on
intercept) that minimize the error between new data.
the predicted and actual values.
Goodness-of-Fit Test

01 R-Squared 02 Residual analysis

Measuring the proportion of the variance in the Examining the residuals (differences between
dependent variable that is predictable from the observed and predicted values) to check for
independent variable. patterns or non-linearity.

Normality of
03 Residuals
04 Visual Inspection

Checking if the residuals follow a normal Using plots to visually assess the fit of the model.

distribution.
Evaluation Metrics: RMSE, MAE, etc.

RMSE (Root Mean MAE (Mean


01 Squared Error)
02 Absolute Error)
Measuring the average magnitude of the errors Measuring the average absolute difference
between predicted and actual values. between predicted and actual values.

Mean Squared
03 Error (MSE)
04 Accuracy

Similar to RMSE, but without taking the square Measuring the proportion of predictions that are

root. within a certain range of the actual values.


Model Selection Criteria

Parsimony Predictive Accuracy


Selecting the simplest model Choosing the model that
that explains the data well. performs best on a test dataset.

Interpretability Domain Knowledge


Selecting a model that is easy Considering the knowledge and
to understand and explain. understanding of the domain
from which the data is
collected.
05
Interpretation & Inference
Techniques
Coefficient Interpretation

Slope coefficient
Represents the average change in the dependent variable for a one-
unit increase in the independent variable.

Intercept coefficient
Represents the value of the dependent variable when the
independent variable is zero.

Coefficient of determination (R-


squared)
Indicates the proportion of the variance in the dependent variable
that is predictable from the independent variable.
Confidence Intervals & Hypothesis Testing

Confidence intervals
Ranges of values that are likely to contain the true parameter values.

Hypothesis testing
A statistical method used to test the validity of a claim or hypothesis about a population
parameter.

Null and alternative hypotheses


The null hypothesis assumes no difference or no effect, while the alternative hypothesis
assumes a difference or effect.
P-value and Statistical Significance

P-value
The probability of observing the test statistic as extreme as or more extreme than
the one observed, given that the null hypothesis is true.

Statistical significance
A measure of the evidence against the null hypothesis, usually expressed in terms
of a p-value.

Alpha level
The pre-determined threshold for statistical significance, typically set at 0.05.
Prediction & Forecasting Methods

Point prediction 01 02 Interval prediction

A single estimated value for the A range of values within which the
dependent variable. dependent variable is expected to
fall.

Holdout method 03 04 Cross-validation


Splitting the data into training and A technique that partitions the data
testing sets to evaluate the model's into multiple subsets and uses each
performance. subset for both training and testing.
06
Practical Applications & Case
Studies
Predictive Modeling in Business

Customer Churn Prediction


Using regression to predict which customers are most likely
to leave.

Sales Forecasting
Estimating future sales based on historical data.

Employee Turnover
Identifying factors that lead to employee retention or
turnover.
Trend Analysis in Economics

Housing Prices

Analyzing factors that influence housing prices.

Stock Market Predictions

Using regression to forecast stock market trends.

Economic Indicator Analysis

Identifying relationships between various economic


indicators.
Quality Control in Manufacturing

Defective Products Process Control Supplier Evaluation


Identifying factors that lead Monitoring and controlling Assessing the quality of
to defective products. manufacturing processes to suppliers based on their
ensure quality. performance.
Other Real-World Applications
Marketing
Determining the effectiveness
of marketing campaigns.

Healthcare
Predicting patient outcomes
based on various health
indicators.
Social Sciences
Examining relationships
between variables in social
science research.
07
Challenges & Limitations in
Simple Regression
Multicollinearity Issue

Multicollinearity leads to high variance in the regression coefficients


When independent variables are highly correlated, the standard errors of the coefficients increase,
making it difficult to determine the individual effect of each variable on the dependent variable.
Reduced accuracy of predictions
Multicollinearity can lead to overfitting of the regression model, reducing its ability to generalize to new
data.
Instability of the model
Multicollinearity can cause the regression coefficients to change drastically with small changes in the
data, leading to an unstable model.
Outliers Impact on Model Accuracy
Outliers can have a large impact on the regression
line
The regression line is sensitive to extreme values, and outliers can disproportionately
influence the slope and intercept of the line.

Distorted measures of central tendency


Outliers can distort measures such as mean and variance, which are used to describe the
data and can affect the interpretation of the regression results.

Reduced model accuracy


The presence of outliers can lead to a reduction in the overall accuracy of the regression
model, as the model is not able to capture the true relationship between the variables.
Non-linear Relationships between Variables

Inability to capture non-linear relationships


Simple regression assumes a linear relationship between the dependent and independent
variables, which may not always be the case in real-world scenarios.

Misleading results
If the true relationship between the variables is non-linear, the regression coefficients may
be misleading, leading to incorrect conclusions about the relationship between the variables.

Need for transformation


In some cases, a non-linear relationship can be transformed into a linear one through a non-
linear transformation of the data, but this adds complexity to the model.
Model Assumptions Violations

<font color="accent1"><strong>Violations of the


assumptions underlying regression</strong></font>
Simple regression relies on several assumptions, such
as the linearity of the relationship, the independence of
the errors, and the constancy of the variance of the
errors. Violations of these assumptions can lead to
biased and inefficient estimates of the regression
coefficients.
Model Assumptions Violations

Reduced reliability of statistical tests


Violations of the assumptions can also affect the reliability of
statistical tests used to assess the significance of the regression
coefficients, such as t-tests and F-tests.

Limited generalizability
If the assumptions are not met, the regression model may not be
generalizable to other populations or situations, limiting its
usefulness.
Thanks
汇报人: XXX 2025-03-05

You might also like