The document discusses various predictive analytic techniques, focusing on regression analysis, including simple linear regression, multiple linear regression, and logistic regression. It highlights the importance of model adequacy, including the significance of regression coefficients, the coefficient of determination (R²), and the need for residual analysis to ensure the model's assumptions are met. Additionally, it addresses multicollinearity and methods for assessing model performance through metrics like PRESS and adjusted R².
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
7 views41 pages
Lecture Week 9 - Regression
The document discusses various predictive analytic techniques, focusing on regression analysis, including simple linear regression, multiple linear regression, and logistic regression. It highlights the importance of model adequacy, including the significance of regression coefficients, the coefficient of determination (R²), and the need for residual analysis to ensure the model's assumptions are met. Additionally, it addresses multicollinearity and methods for assessing model performance through metrics like PRESS and adjusted R².
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 41
LO4:Investigate a range of predictive analytic
techniques to discover new knowledge for
forecasting future events • Agenda • Linear regression • Multiple linear regression • Categorical regression • Logistics regression Simple Linear regression and Correlation • Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis is a statistical technique that is very useful for these types of problems. For example, in a chemical process, suppose that the yield of the product is related to the process-operating temperature. Regression analysis can be used to build a model to predict yield at a given temperature level. This model can also be used for process optimization, such as finding the level of temperature that maximizes yield, or for process control purposes. Simple Linear regression and Correlation HYPOTHESIS TESTS IN SIMPLE LINEAR REGRESSION (Individual Coeficients) Analysis of Variance Approach to Test Significance of Regression ADEQUACY OF THE REGRESSION MODEL:Coefficient of Determination(R2) A model may have a high R² but still be inadequate if assumptions are violated: Linearity, normality of residuals, Homoscedasticity, independence of error, no multicollinearity (Variance Inflation Factor VIF<5 low to moderate, VIF>5 moderate to high) sample correlation coefficient MULTIPLE LINEAR REGRESSION Matrix Approach to Multiple Linear Regression Multicollinearity: Variance Inflation Factor VIF<5 low to moderate, VIF>5 moderate to high
R-sqr: for training data
R-sqr adj: Guard against overfitting R-sqr pred: for testing data PRESS: (prediction Sum of Squares) assess how well a regression model will predict new, unseen data. leave-one-out cross-validation (LOOCV) calculate the SSE without y^i.Note: R-sqr pred=PRESS/SST Test for Significance of Regression (Overall Model significance) •n: Total number of observations (samples) •p: Total number of parameters estimated, including the intercept •k: Number of predictor variables (independent variables) R2 and Adjusted R2 MODEL ADEQUACY CHECKING: Residual Analysis