0% found this document useful (0 votes)
10 views6 pages

Regression

Lecture notes on Regression

Uploaded by

Abhyudya Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views6 pages

Regression

Lecture notes on Regression

Uploaded by

Abhyudya Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Regression

Prof. Satanik Mitra


BITS Pilani
Regression
• Regression models predict a continuous target variable rather than a categorical target variable.
• In some ways, this is a more difficult problem to solve than classification.
• Regression belongs to the supervised learning category.
• The most common algorithm predictive modelers use for regression problems is linear regression.
• The most common algorithm predictive modelers use for regression problems is linear regression.
• Types of regression –
• Simple linear regression
• One dependent variable (interval or ratio)
• One independent variable (interval or ratio or dichotomous)
• Multiple linear regression
• One dependent variable (interval or ratio)
• Two or more independent variables (interval or ratio or dichotomous)
• Logistic regression
• One dependent variable (binary)
• Two or more independent variable(s) (interval or ratio or dichotomous)
• Ordinal regression
• One dependent variable (ordinal)
• One or more independent variable(s) (nominal or dichotomous)
• Multinomial regression
• One dependent variable (nominal)
• One or more independent variable(s) (interval or ratio or dichotomous)
Linear Regression
• Linear regression most well-known of all algorithms used in
regression modeling.
• The linear regression algorithm finds the slope of the output with
respect to the input.
• The constituent part of the equation - Y=β0​+β1​X+ε
• Y: This is the dependent variable, also known as the response variable or
outcome variable. It is the variable you want to predict or explain based on the
independent variable.

• X: This is the independent variable, also known as the predictor variable or


explanatory variable. It is the variable used to predict or explain the variation
in the dependent variable.

• β0: This is the y-intercept, the value of Y when X is zero. It represents the
expected value of Y when X is zero.

• β1 : This is the slope coefficient, which represents the change in Y for a one-
unit change in X. It indicates the rate of change in Y for each unit change in X.

• ε : This is the error term, also known as the residual. It represents the
difference between the observed value of Y and the value predicted by the
regression equation. It captures all other factors that influence Y aside from X
and the intercept.

• Courtesy: https://gbhat.com/machine_learning/linear_regression.html
Residual and Regression Error
• Residuals: Residuals, also known as "errors," are the differences between the observed values of the
dependent variable and the values predicted by the regression model. In other words, the residual for
each data point is the vertical distance between the observed value and the value predicted by the
regression line. Mathematically, the residual for the ith observation is given by:
• Residuali = Observedi − Predictedi

• Regression Errors: Regression errors, on the other hand, refer to the differences between the true values
of the dependent variable and the values predicted by the regression model. These errors are inherent to
the relationship between the independent and dependent variables and cannot be reduced by improving
the model.

• Residuals are a key diagnostic tool in regression analysis. They are used to assess the goodness-of-fit of
the model, detect outliers, check for violations of assumptions, and identify patterns in the data that may
not be captured by the model.
• Regression errors represent the discrepancy between the observed data and the true underlying
relationship between the variables. While residuals are specific to the sample data and the fitted model,
regression errors are conceptual and relate to the population from which the sample is drawn
Assumptions
• Linearity: The relationship between the independent variables and the
dependent variable is linear. This means that changes in the independent
variables result in a constant change in the dependent variable.

• Independence of Errors: The errors (residuals) should be independent of


each other. This assumption ensures that there is no systematic pattern in the
residuals that could affect the validity of the estimates.

• Homoscedasticity: The variance of the errors should be constant across all


levels of the independent variables. In other words, the spread of the residuals
should be consistent throughout the range of predicted values.

• Normality of Errors: The errors should be normally distributed. While this


assumption is not necessary for large sample sizes due to the Central Limit
Theorem, it can be important for smaller sample sizes to ensure the validity of
statistical tests and confidence intervals.

• No Multicollinearity: There should be no exact linear relationships among the


independent variables. High multicollinearity can lead to unstable estimates
and inflated standard errors.

• No Outliers or Influential Observations: Outliers are data points that deviate


significantly from the rest of the data, while influential observations have a
large impact on the estimated regression coefficients.
Assumptions

Citation: https://www.superdatascience.com/blogs/assumptions-of-linear-regression

You might also like