Logistic regression double loop in R

Last Updated : 20 Jun, 2024

Performing logistic regression with a double loop in R can be useful for tasks like cross-validation or nested model selection, where you might need to iterate over different sets of predictors and then further iterate over different subsets or parameter settings within those sets. This approach is beneficial for fine-tuning the model and improving its predictive performance.

Steps to Perform Logistic Regression with a Double Loop

Here are the main steps to Perform Logistic Regression with a Double Loop in R Programming Language.

Prepare the Data: Ensure your dataset is suitable for logistic regression.
Define Predictor Sets and Subsets: Create lists of different sets of predictors and subsets or parameter settings.
Iterate Over Predictor Sets: Use an outer loop to iterate over different sets of predictors.
Iterate Over Subsets or Parameters: Use an inner loop to iterate over subsets or parameter settings within each predictor set.
Fit Logistic Regression Models: Fit the logistic regression models within the inner loop.
Store and Summarize Results: Collect and summarize the results from each model.

Let's walk through an example using a built-in dataset, such as mtcars, modified for binary classification.

Step 1: Prepare the Data

Load the mtcars dataset and create a binary response variable.

# Load the necessary dataset
data(mtcars)

# Create a binary response variable: 1 if mpg > 20, else 0
mtcars$mpg_binary <- ifelse(mtcars$mpg > 20, 1, 0)

# Convert to factors where necessary
mtcars$cyl <- factor(mtcars$cyl)
mtcars$gear <- factor(mtcars$gear)

# View the dataset
head(mtcars)

Output:

                   mpg cyl disp  hp drat    wt  qsec vs am gear carb mpg_binary
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4          1
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4          1
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1          1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1          1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2          0
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1          0

Step 2: Define Predictor Sets and Subsets

Create lists of different sets of predictors and subsets.

# Define the sets of predictors
predictor_sets <- list(
  c("hp"),
  c("hp", "wt"),
  c("hp", "wt", "qsec")
)

# Define subsets (if any) or other parameters to iterate over within each predictor set
# For this example, we'll iterate over different interaction terms
interaction_terms <- list(
  NULL,
  c("hp:wt"),
  c("hp:wt", "wt:qsec")
)

Step 3: Iterate Over Predictor Sets and Subsets

Use nested loops to fit logistic regression models.

# Initialize a list to store models and results
results <- list()

# Outer loop: iterate over sets of predictors
for (i in seq_along(predictor_sets)) {
  predictors <- predictor_sets[[i]]
  
  # Inner loop: iterate over interaction terms
  for (j in seq_along(interaction_terms)) {
    interactions <- interaction_terms[[j]]
    
    # Create the formula
    formula_parts <- c(predictors, interactions)
    formula <- as.formula(paste("mpg_binary ~", paste(formula_parts, collapse = " + ")))
    
    # Fit the logistic regression model
    model <- glm(formula, data = mtcars, family = binomial)
    
    # Store the model and summary in the results list
    results[[paste("Model", i, "Interaction", j)]] <- summary(model)
  }
}

Step 4 : Fit Logistic Regression Models and Store Summarize Results

Models are fitted within the inner loop, and the results are stored.

# Print the results summaries
for (name in names(results)) {
  cat("\n", name, "\n")
  print(results[[name]])
}

Output:

 Model 1 Interaction 1 

Call:
glm(formula = formula, family = binomial, data = mtcars)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)  
(Intercept)  24.4431    14.7518   1.657   0.0975 .
hp           -0.2110     0.1309  -1.612   0.1069  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 43.8601  on 31  degrees of freedom
Residual deviance:  8.4922  on 30  degrees of freedom
AIC: 12.492

Number of Fisher Scoring iterations: 10


 Model 1 Interaction 2 

Call:
glm(formula = formula, family = binomial, data = mtcars)

Coefficients:
              Estimate Std. Error z value Pr(>|z|)
(Intercept)    360.311 105224.092   0.003    0.997
hp               3.583   1144.003   0.003    0.998
hp:wt           -2.080    606.438  -0.003    0.997

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 4.3860e+01  on 31  degrees of freedom
Residual deviance: 2.4373e-08  on 29  degrees of freedom
AIC: 6

Number of Fisher Scoring iterations: 25


 Model 1 Interaction 3 

Call:
glm(formula = formula, family = binomial, data = mtcars)

Coefficients:
              Estimate Std. Error z value Pr(>|z|)
(Intercept)  6.481e+02  3.331e+05   0.002    0.998
hp          -7.213e-01  2.222e+03   0.000    1.000
hp:wt       -6.887e-01  6.110e+02  -0.001    0.999
wt:qsec     -4.887e+00  3.386e+03  -0.001    0.999

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 4.3860e+01  on 31  degrees of freedom
Residual deviance: 1.2814e-08  on 28  degrees of freedom
AIC: 8

Number of Fisher Scoring iterations: 25


 Model 2 Interaction 1 

Call:
glm(formula = formula, family = binomial, data = mtcars)

Coefficients:
              Estimate Std. Error z value Pr(>|z|)
(Intercept)    894.228 365884.162   0.002    0.998
hp              -2.021    858.062  -0.002    0.998
wt            -202.865  84688.218  -0.002    0.998

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 4.3860e+01  on 31  degrees of freedom
Residual deviance: 1.1156e-08  on 29  degrees of freedom
AIC: 6

Number of Fisher Scoring iterations: 25


 Model 2 Interaction 2 

Call:
glm(formula = formula, family = binomial, data = mtcars)

Coefficients:
              Estimate Std. Error z value Pr(>|z|)
(Intercept)   1376.310 847009.895   0.002    0.999
hp              -7.017   4404.678  -0.002    0.999
wt            -384.770 240635.444  -0.002    0.999
hp:wt            1.847   1187.706   0.002    0.999

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 4.3860e+01  on 31  degrees of freedom
Residual deviance: 4.7507e-09  on 28  degrees of freedom
AIC: 8

Number of Fisher Scoring iterations: 25


 Model 2 Interaction 3 

Call:
glm(formula = formula, family = binomial, data = mtcars)

Coefficients:
              Estimate Std. Error z value Pr(>|z|)
(Intercept)   1337.773 886478.329   0.002    0.999
hp              -7.012   4486.808  -0.002    0.999
wt            -336.040 580802.371  -0.001    1.000
hp:wt            1.781   1303.441   0.001    0.999
wt:qsec         -1.525  18207.652   0.000    1.000

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 4.3860e+01  on 31  degrees of freedom
Residual deviance: 4.6651e-09  on 27  degrees of freedom
AIC: 10

Number of Fisher Scoring iterations: 25


 Model 3 Interaction 1 

Call:
glm(formula = formula, family = binomial, data = mtcars)

Coefficients:
              Estimate Std. Error z value Pr(>|z|)
(Intercept)  1.109e+03  2.074e+06   0.001    1.000
hp          -2.588e+00  5.439e+03   0.000    1.000
wt          -1.745e+02  2.734e+05  -0.001    0.999
qsec        -1.253e+01  1.176e+05   0.000    1.000

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 4.3860e+01  on 31  degrees of freedom
Residual deviance: 1.1077e-08  on 28  degrees of freedom
AIC: 8

Number of Fisher Scoring iterations: 25


 Model 3 Interaction 2 

Call:
glm(formula = formula, family = binomial, data = mtcars)

Coefficients:
              Estimate Std. Error z value Pr(>|z|)
(Intercept)  1.434e+03  3.467e+06   0.000    1.000
hp          -7.146e+00  8.625e+03  -0.001    0.999
wt          -3.760e+02  5.690e+05  -0.001    0.999
qsec        -3.498e+00  2.033e+05   0.000    1.000
hp:wt        1.835e+00  1.390e+03   0.001    0.999

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 4.3860e+01  on 31  degrees of freedom
Residual deviance: 4.6546e-09  on 27  degrees of freedom
AIC: 10

Number of Fisher Scoring iterations: 25


 Model 3 Interaction 3 

Call:
glm(formula = formula, family = binomial, data = mtcars)

Coefficients:
              Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.095e+03  7.677e+06       0        1
hp          -9.549e-01  1.492e+04       0        1
wt           5.949e+02  2.465e+06       0        1
qsec         1.450e+02  3.546e+05       0        1
hp:wt        2.681e-01  4.309e+03       0        1
wt:qsec     -4.168e+01  1.142e+05       0        1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 4.3860e+01  on 31  degrees of freedom
Residual deviance: 2.4511e-09  on 26  degrees of freedom
AIC: 12

Number of Fisher Scoring iterations: 25

Model Summaries: The output contains summaries for each model fitted using different sets of predictors and interaction terms.
Coefficients: Each model summary includes coefficients, standard errors, z-values, and p-values for the predictors and interaction terms.
Model Fit Statistics: The summaries also provide model fit statistics such as the AIC.

summaries show logistic regression models with different interactions. Most models exhibit extremely low residual deviance and non-significant p-values for all coefficients, indicating potential overfitting. Each model's AIC is low, suggesting good fit but also hinting at overfitting due to overly complex models with interactions. The models use hp, wt, and qsec as predictors and their interactions. Despite low AIC values, the high standard errors and non-significant coefficients imply these models are not reliable for predictive purposes. This output highlights the importance of careful model selection and validation to avoid overfitting.

Conclusion

Using a double loop to perform logistic regression in R allows for systematic exploration of different sets of predictors and interaction terms. This approach is particularly useful for model selection and fine-tuning. By following the steps outlined in this guide, you can automate the fitting of multiple logistic regression models and efficiently compare their performance.

How to Plot a Logistic Regression Curve in R?

nyadavxenc

Improve

Article Tags :

Practice Tags :

Machine Learning