This document outlines a seminar on multinomial logistic regression. It begins with an introduction defining multinomial logistic regression as an extension of logistic regression for dependent variables with more than two categories. It then compares multinomial logistic regression to simple and binary logistic regression. The document provides an example of using multinomial logistic regression to study factors influencing different types of diabetes. It presents the model, assumptions, and uses of multinomial logistic regression and includes an outline of the seminar topics.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
39 views
Multinomial Logistic Regression-1
This document outlines a seminar on multinomial logistic regression. It begins with an introduction defining multinomial logistic regression as an extension of logistic regression for dependent variables with more than two categories. It then compares multinomial logistic regression to simple and binary logistic regression. The document provides an example of using multinomial logistic regression to study factors influencing different types of diabetes. It presents the model, assumptions, and uses of multinomial logistic regression and includes an outline of the seminar topics.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 17
Seminar Topic
Multinomial Logistic Regression
Name: Asma Parveen Roll# 8403 MSc 3rd Semester Supervisor Name: Muhammad Aftab GC University Faisalabad OUTLINE Introduction Definition Comparison with simple regression and Binary Logistic Regression Model Assumptions Applications Example Introduction: The logistic regression model can be extended to the situation where the response variable assumes more than two categories that are not ordinal (i.e. they have no natural ordering) then we use multinomial logistic regression to model the dependent variable. Multinomial logistic regression is also known as multiclass logistic regression, multinomial logit etc. Definition: Multinomial logistic regression is the linear regression analysis to conduct when the dependent variable is nominal with more than two levels. Thus it is an extension of logistic regression, which analyzes dichotomous (binary) dependents. For example: Choice of color (red, yellow, green) Choice of profession (doctor, engineer, lawyer) Choice of undergraduate program (physics, history, chemistry) Comparison with regression analysis and Binary logistic regression:
In regression analysis we estimate the effect of one or more
explanatory variable(s) on the continuous response variable. When our response variable assumes only two values for
example we want to study that whether the customer will buy
the product or not. Then our dependent variable would be a binary variable (1=yes, 0=No). But sometimes our response variable assumes more than two
values then we use multinomial logistic regression. For
example in the study of mode of transportation to work, the response variable may be the private automobile, car, bicycle, public transport, or walking ( no natural ordering). Assumptions: Dependent variable should be nominal. Independent variables can be continuous or categorical. Independence of observations. No multicollinearity. Linear relationship between any continuous independent variables and the logit transformation of the dependent variable. No outliers or highly influential points Kelinger and pedhazur recommended that at least thirty observations per variables should be used. Uses of Multinomial logistic Regression: Medical Marketing engineering and social sciences. Model:
Log(=+++··· +
Where
is the intercept j=1,2,…,k-1
, ,…., are regression coefficients
, ,…., are explanatory variables Example: We want to study about the factors that have influence on diabetes. Our response variable is diabetes and its categories may
be : Chemical diabetes 2) Overt diabetes 3) normal diabetes
Many variables have influence on diabetes but we take
only three of them :
Insulin response (IR) 2) steady- state plasma glucose
(SSPG) 3) relative weight (RW).
These measurements were taken on 145 volunteers who
were subjected to same regimen.
Solution: N: shows the number of observation. Marginal percentage - The marginal percentage lists the proportion of valid observations found in each of the outcome variable's groups. This can be calculated by dividing the N. Valid - This indicates the number of observations in the dataset where the outcome variable and all predictor variables are non-missing. Missing - This indicates the number of observations in the dataset where data are missing from the outcome variable or any of the predictor variables. Total -This indicates the total number of observations in the dataset--the sum of the number of observations in which data are missing and the number of observations with valid data. Results interpretation- Out of 145 people 33 have overt diabetes, 36 have chemical diabetes and 76 have normal diabetes. Thus marginal percentage (33/145)*100 = 22.8% Sig: The p = 0.000 which is less than 0.05 indicate that the regression coefficients are not equal to zero. i.e. all the variables are significant. This table contain the chi-square statistic. This statistic intended to test whether the observed data are consistent with the fitted model. Sig: Shows the p = 1.000 value is greater than 0.05 so we conclude that observed data are consistent with the fitted model. There three R2 that tells the variation in the data. Cox and Snell: is based on the log likelihood for the model compare to the log likelihood for intercept model. Its value is less than one. Nagelkerke: it adjusts the Cox & Snell’s so that the range of possible values extends to one. McFadden: its value depends on the estimated likelihood. Its range is 0 to 1 but never reach 1. Hence the value of R2 = 0.667 , 0.767 and 0.539 which indicate that 66.7% , 76.7% and 53.9% of the variation in response variable is due to explanatory variables. Interpretation: Intercepts: -1.903 and -7.611 are the average values of response variable when the effect of explanatory variables are zero. 0.046 it means that the average increase in response variable is 0.046 due to unit increase in SSPG when all the other variables held constant. Sig: The “P” value of SSPG is 0.000 which is less than our significance level set at 5% so we reject our null hypothesis. So we conclude that SSPG has a significant effect on diabetes. 95% Confidence Interval: This is the Confidence Interval for an individual regression coefficient given in the model. For a given predictor with a level of 95% confidence, we would say that we are 95% confident that the "true" population regression coefficient lies in between the lower and upper limit of the interval. Reference: www.research.gate.net Regression Analysis by Example 5th edition by