Machine Learning Case Study
Machine Learning Case Study
Question 1
2. Classification
Predicts categorical labels based on the input features
The models learn decision boundaries to separate data points into
classes.
Common algorithms: Logistic Regression, Decision Trees, Random
Forest, Support Vector Machines (SVMs), Neural Networks.
Used for tasks like: Email spam filtering, image recognition,
sentiment analysis.
3.Clustering:
5.Random Forest:
6. Neural Networks:
Question 3
QUESTION What is Overfitting, and How Can You Avoid It?
ANSWER Overfitting occurs when the model fits the training data too closely
and leads to poor generalization.
Question 4
QUESTION What are some real-life applications of clustering algorithms?
ANSWER 1.Bio-information
2.Urban Planning
4.Astronomy
Loaded Dataset
Imported the dataset and explored initial records.
Data Cleaning:
Removed rows with zero tenure. Cleaned the data to ensure it was
ready for modeling.
Preprocessing:
Label Encoding, Feature Scaling
Split the dataset into training and test sets
Modeling:
o Trained multiple machine learning models:
Decision Trees, RandomForest, KNN, SVM, XGBoost,
CatBoost, and Neural Networks.
o Evaluated initial model performance using metrics such as
accuracy, precision, recall, and F1-score.
Hyperparameter Tuning:
o Hyperparameter tuning was done to improve model
performance by optimizing model settings that are not learned
during training
o GridSearchCV and RandomizedSearchCV to systematically
search for the best hyperparameter combinations.
o Performance Improvement: After tuning, the models’
performances were reevaluated using metrics such as
accuracy, precision, recall, and F1-score.
Evaluation:
o Evaluated the models using various metrics such as accuracy,
precision, recall, F1-score, and ROC curves.
o Compared the performance of tuned models with the original
ones to select the most accurate model for customer churn
prediction.
Preprocessing:
Handling Missing Values: dropped null values
Encoding Categorical Variables: Any categorical data encoded
into numerical format using
Feature Scaling.
Splitting Data: The dataset was divided into features (X) and
target (y) for house price prediction, followed by splitting into
training and test sets (train-test split).
Model Creation: A custom Linear() class was created and used to train
a linear regression model for predicting house prices.
Model Training: The data was split into training and testing sets using
train_test_split, and the model was trained on the training set.