hw1 1
hw1 1
The first part of the assignment is designed to ensure you follow the tutorial on logistic regression. The
tasks are to be clearly laid out with references to the sections in the tutorial.
o Go through the Real Python tutorial on Logistic Regression. Pay special attention to the
sections on data preparation, logistic regression implementation, and model evaluation.
2. Dataset Selection:
o Use the sklearn’s Iris dataset or the Titanic dataset (or any other dataset you prefer that
is relevant for classification tasks).
o Write Python code that loads the dataset and applies logistic regression as explained in
the tutorial.
o Train the logistic regression model and evaluate it using common metrics like accuracy,
confusion matrix, precision, recall, and F1 score.
4. Code Submission:
o Include a summary explaining each section of the code and what was learned from the
tutorial.
This part expands the assignment to ensure you understand the fundamental aspects of logistic
regression, such as interpretation, feature importance, and model evaluation.
1. Coefficient Interpretation:
o After training the logistic regression model, interpret the coefficients (weights). Explain
what each coefficient means in the context of the data (e.g., what impact does each
feature have on the target variable?).
o Plot the ROC curve and compute the AUC score for your logistic regression model.
o Briefly describe what the AUC score represents and how the model’s performance can
be interpreted from it.
3. Hyperparameter Tuning:
4. Error Analysis:
o Generate a confusion matrix and discuss the types of errors your model makes (false
positives, false negatives).
o Suggest strategies for improving the model (e.g., adjusting the decision threshold,
feature engineering, or collecting more data).
To ensure you have grasped key concepts, include the following tasks:
o Write a 500-word summary explaining what logistic regression is, how it works
mathematically, and where it is best applied.
o Expand your binary classification model to handle multiclass classification (using the
multinomial option in scikit-learn’s logistic regression).
• A PDF report that includes answers to the interpretive and conceptual questions (Task 3).
• Visualizations (e.g., ROC curve, confusion matrix) included in both the notebook and report.
Grading Rubric
• Task on Regularization: Ask students to include both L1 and L2 regularization and compare the
results.
• Ethical Considerations Task: Provide students with a case study where logistic regression is
applied (e.g., in credit scoring or medical diagnoses) and have them discuss ethical implications
of the model's decisions.
Submission
Submit your Jupyter notebook and reports as above. This homework is to be done individually.