- The document describes using qualitative and quantitative variables to predict credit card balance through regression analysis.
- It shows how to create dummy variables to represent qualitative predictors with two or more levels, such as gender and ethnicity.
- Interaction terms between qualitative and quantitative predictors allow the effect of a quantitative variable like income to vary between groups.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
64 views15 pages
Qualitative Predictor
- The document describes using qualitative and quantitative variables to predict credit card balance through regression analysis.
- It shows how to create dummy variables to represent qualitative predictors with two or more levels, such as gender and ethnicity.
- Interaction terms between qualitative and quantitative predictors allow the effect of a quantitative variable like income to vary between groups.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15
Qualitative Predictors
Credit Data Set
• Balance: Average Credit Card Debt • Age • Cards: Number of credit cards • Education: Years of education • Income (in thousands of dollars) • Limit: Credit Limit • Rating: Credit Rating • Gender • Student • Married • Ethnicity: Asian, African American, Caucasian Qualitative Predictors with Two Levels • Consider only “Balance” (response variable) and Gender (qualitative predictor). • For a qualitative predictor, we simply create an indicator/ dummy variable that takes on two possible values. • Based on the “Gender” variable, we can have the following dummy variable: 1, if 𝑖th person is female 𝑥𝑖 = ൜ . 0, if 𝑖th person is male Qualitative Predictors with Two Levels • Fit the following model: 𝛽0 + 𝛽1 + 𝜀𝑖 , if 𝑖th person is female 𝑦𝑖 = 𝛽0 + 𝛽1 𝑥𝑖 + 𝜀𝑖 = ቊ . 𝛽0 + 𝜀𝑖 , if 𝑖th person is male • So 𝛽0 is the average credit card balance among males. • And 𝛽0 + 𝛽1 as the average credit card balance among females. • Therefore 𝛽1 is the average difference in credit card balance between females and males. Regression Table Coefficients Std. Error t-statistic P-value Intercept 509.80 33.13 15.39 <0.0001 Gender[Female] 19.73 46.05 0.43 0.67 Interpretation • From the previous Table, we observe that the average credit card debt for males is $509.80. • Females are expected to carry $19.73 in additional debt. • However, we note that the p-value for the dummy variable is very high. This indicates that there is no statistical evidence of a difference in average credit card balance between the genders. Qualitative Predictors with More than Two Levels • Consider the variable “ethnicity” as a predictor. • Ethnicity has three levels. So we need to create two dummy variables. • The first dummy variable may be 1, if 𝑖th person is Asian 𝑥𝑖1 = ൜ . 0, if 𝑖th person is not Asian • Similarly, the second dummy variable could be 1, if 𝑖th person is Caucasian 𝑥𝑖2 = ൜ . 0, if 𝑖th person is 𝑛𝑜𝑡 Caucasian Regression Table Coefficients Std. Error t-statistic P-value Intercept 531.00 46.32 11.46 <0.0001 Ethnicity[Asian] −18.69 65.02 −0.29 0.7740 Ethnicity[Caucasian] −12.50 56.68 −0.22 0.8260 Interpretation • From the previous Table, we see that the estimated average balance for the African American is $531.00. • The Asian category are expected to have $18.69 less debt than the African American category. • The Caucasian category are expected to have $12.50 less debt than the African American category. • However, we observe that the p-values associated with the coefficient estimates for the two dummy variables are very large. • This suggests that there is no statistical evidence of a real difference in credit card balance between the ethnicities. Extension to Quantitative and Qualitative Variable • Consider the predictors “income” (quantitative predictor) and “student” (qualitative predictor) along with balance as response variable. • Suppose the model takes the following form: balance𝑖 𝛽2 , if 𝑖th person is student ≈ 𝛽0 + 𝛽1 × income𝑖 + ቊ . 0, if 𝑖th person is not a student 𝛽0 + 𝛽2 , if 𝑖th person is a student = 𝛽1 × income𝑖 + ቊ . 𝛽0 , if 𝑖th person is not a student Extension to Quantitative and Qualitative Variable Coefficients Std. Error t-statistic P-value Intercept 211.14 32.46 6.51 <0.0001 Income 5.98 0.56 10.75 <0.0001 Student[Yes] 382.67 65.31 5.86 <0.0001 Extension to Quantitative and Qualitative Variable • This suggests that the average effect on balance of a one-unit increase in income does not depend on whether or not the individual is a student. • This represents a potentially serious limitation of the model, since in fact a change in income may have a very different effect on the credit card balance of a student versus a non-student. Inclusion of Interaction Term • This limitation can be resolved if we add an interaction variable, created by multiplying income with the dummy variable for student. • So the new model is balance𝑖 𝛽2 + 𝛽3 × income𝑖 , if 𝑖th person is student ≈ 𝛽0 + 𝛽1 × income𝑖 + ቊ . 0, if 𝑖th person is not a student (𝛽0 + 𝛽2 ) + (𝛽1 + 𝛽3 ) × income𝑖 , if 𝑖th person is student =ቊ . 𝛽0 + 𝛽1 × income𝑖 , if 𝑖th person is not a student Inclusion of Interaction Term Coefficients Std. Error t-statistic P-value Intercept 200.62 33.70 5.95 <0.0001 Income 6.22 0.59 10.50 <0.0001 Student[Yes] 476.68 104.35 4.57 <0.0001 Income × Student[Yes] -2.00 1.73 -1.16 0.25 Inclusion of Interaction Term • We observe that the slope for students is lower than the slope for non-students. • This indicates that increases in income are associated with smaller increases in credit card balance among students as compared to non-students.