0% found this document useful (0 votes)
10 views13 pages

Assignment 1 Predicting Loan Approval With Logistic Regression

The document outlines three assignments focused on predictive modeling using different datasets and techniques. Assignment 1 involves predicting loan approval using logistic regression, Assignment 2 focuses on classifying iris species with k-NN, and Assignment 3 classifies emails as spam or not using both logistic regression and k-NN. Each assignment includes tasks for data preprocessing, exploratory data analysis, model training and evaluation, and deliverables such as Jupyter notebooks and summary reports.

Uploaded by

sarvadnya mense
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views13 pages

Assignment 1 Predicting Loan Approval With Logistic Regression

The document outlines three assignments focused on predictive modeling using different datasets and techniques. Assignment 1 involves predicting loan approval using logistic regression, Assignment 2 focuses on classifying iris species with k-NN, and Assignment 3 classifies emails as spam or not using both logistic regression and k-NN. Each assignment includes tasks for data preprocessing, exploratory data analysis, model training and evaluation, and deliverables such as Jupyter notebooks and summary reports.

Uploaded by

sarvadnya mense
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Assignment

Assignment 1: Predicting Loan


Approval with Logistic Regression
Objective
Predict whether a loan will be
approved or not based on
applicant information using
logistic regression.
Dataset
Use the Loan Approval Dataset
with features such as applicant
income, loan amount, credit
history, etc.
Sample Data:
csv

ApplicantIncome, LoanAmount,
CreditHistory, Education,
LoanStatus
5000, 200, 1, Graduate, Y
3000, 150, 0, Not Graduate, N
4000, 180, 1, Graduate, Y
2500, 100, 0, Not Graduate, N

...
Tasks
1. Data Preprocessing:
Handle missing values.
Convert categorical variables to
numeric using one-hot encoding
or label encoding.
2. Exploratory Data Analysis:
Visualize the relationships
between features and loan
status.
Calculate descriptive statistics.
3. Model Training and Evaluation:
Split the data into training and
testing sets.
Train a logistic regression
model.
Evaluate the model using
accuracy, precision, recall, and
F1-score.
4. Interpretation and
Visualization:
Plot the ROC curve and
calculate the AUC.
Interpret the coefficients of the
logistic regression model.
Deliverables
Jupyter notebook with the entire
process and code.
Summary report of findings and
model performance metrics.
Assignment 2: Predicting Iris
Species with k-NN
Objective
Classify iris species based on sepal
and petal measurements using k-
nearest neighbors (k-NN).
Dataset
Use the Iris Dataset with features
such as sepal length, sepal width,
petal length, and petal width.
Sample Data:
csv

SepalLengthCm, SepalWidthCm,
PetalLengthCm, PetalWidthCm,
Species
5.1, 3.5, 1.4, 0.2, Iris-setosa
7.0, 3.2, 4.7, 1.4, Iris-versicolor
6.3, 3.3, 6.0, 2.5, Iris-virginica
...
Tasks
1. Data Preprocessing:
Check for missing values and
handle them if any.
Normalize the feature values to
ensure they are on the same
scale.
2. Exploratory Data Analysis:
Visualize the data using pair
plots to understand the
distribution of different species.
Calculate summary statistics for
each species.
3. Model Training and Evaluation:
Split the data into training and
testing sets.
Train a k-NN classifier with
different values of k.
Evaluate the model using
accuracy, confusion matrix, and
classification report.
4. Hyperparameter Tuning:
Use cross-validation to find the
optimal value of k.
Plot the accuracy for different
values of k.
5. Interpretation and
Visualization:
Visualize the decision
boundaries for different values
of k.
Plot the confusion matrix for
the best k value.
Deliverables
Jupyter notebook with the entire
process and code.
Summary report of findings,
optimal k value, and model
performance metrics.
Assignment 3: Spam Email
Classification with Logistic
Regression and k-NN
Objective
Classify emails as spam or not
spam using logistic regression and
k-nearest neighbors.
Dataset
Use the Spam Email Dataset with
features extracted from the emails
such as word frequencies, length
of emails, presence of special
characters, etc.
Sample Data:
csv
Copy code
WordFreqMake, WordFreqAddress,
WordFreqAll, WordFreq3D,
WordFreqOur, ..., Spam
0.21, 0.28, 0.50, 0.00, 0.21, ..., 1
0.06, 0.00, 0.71, 0.00, 0.17, ..., 0
0.00, 0.14, 0.28, 0.00, 0.00, ..., 1
...
Tasks
1. Data Preprocessing:
Handle missing values if any.
Normalize or standardize the
feature values.
2. Exploratory Data Analysis:
Visualize the distribution of
features for spam and non-
spam emails.
Calculate summary statistics for
both classes.
3. Model Training and Evaluation:
Split the data into training and
testing sets.
Train both logistic regression
and k-NN models.
Evaluate both models using
accuracy, precision, recall, and
F1-score.
4. Model Comparison:
Compare the performance of
logistic regression and k-NN.
Discuss the strengths and
weaknesses of each model in
the context of this problem.
5. Interpretation and
Visualization:
Plot ROC curves for both
models and calculate AUC.
Visualize the confusion matrix
for both models.
Deliverables
Jupyter notebook with the entire
process and code.
Comparative report of findings,
model performance metrics, and
recommendations for the best
model.
Assignment

You might also like