QTA 18-04-2013 Logistic Regression
QTA 18-04-2013 Logistic Regression
The type of Regression which is used to predict/forecast/model a categorical variable is called Logistic Regression. There are two types of Logistic Regression: 1. Binary Logistic Regression (Logit) 2. Multinomial Logistic Regression (Tobit) Logit: It is used when we have to predict a two category (di-chotomous/di-chotomy) variable. Eg.: Gender, Purchase Intension, Default. Tobit: It is used when we have to predict a multi category (multi chotomous/multi chotomy) variable. Eg.: Education category, Designation.
Categorical
LOGIT Requirements: 1. Depependent should be di-chotomous. 2. Independents may have any number and measure. 3. No restriction of sample size. Model: Y = + x + x + + kxk Here does not tells about Rate of change
Smallest code 0 failure Largest code 1 success Exp = P/(1-P) > 1 Exp = P/(1-P) < 1 Exp = P/(1-P) = 1 Success is more likely to occur Failure is more likely to occur Failure & Success are equally likely
Objective: Determine a model for prediction of default based on the age, education, income, years on current address and years with the current employer. Default Dependent Categorical (Di-chotomous) Age, Income, Education, Years at current address & Years with current employers Independent. SPSS Open bankloan.sav and variable view. To check if the default is di-chotomy or multi-chotomy, look for default in the first column and look for the number of values in the same row. 2 values means dichotomy and more means multi-chotomy. Now goto: AnalyzeRegressionBinary logistic Put previously defaulted in dependent box. Put age, education, employ, address and income in covariate box. Open options and check Hosmer-lemeshow and At last step boxes. Click continue and then open save and check Probabilities and Unstandardized, then click continue and ok.
Block-0 fluke Prediction without using independent variables After applying logistic regression, it is identified (from block-0) that we can make 73.9% accurate prediction about default based on fluke without using any of the independent variable. 4 of the 5 variables are showing that they can significantly increase the accuracy of prediction if they involve in regression or prediction model 0 failure No Default Constant is more likely to report that the respondent is not default.
Block-1 (Prediction involving independent variables) Model sig. < 0.05 (Model is significant for forecast) i.e. dependency of dependent variable exist with independent variables. Cox & Snell Adj. R Unbiased accuracy NegKarke R Biased accuracy In Logistic Regression R tells about the Expected increase in fluke accuracy by involving independent variables. Hosmer and Lenshow tests the fitness of Logistic Regression on the given data in the given scenario. If its p-value becomes above or equal to it would be considered fitted. i.e. p-value 0.05 (fitness) p-value < 0.05 (unfit) Default = -0.0165 (employee) 0.061 (address) + 0.012 (income) Employee (= 0.848) : employment year increases are more likely to provide non default. Address (= 0.941) : employment year increases are more likely to provide non default. Income (=1.012) : Income increases are more likely to provide the default person. Eg.: Emp = 17 Add = 12 Income = 176 Y = -0.2524
Quiz Assignment 3files (Minimum 1 to be solved) AML Survival Chemo Dependent (Yes, No)
Auto accidents Gender Dependent (Male, Female) Ceral.sav Marital status Dependent (Married, Unmarried)