0% found this document useful (0 votes)
18 views

Machine Learning Case Study

Uploaded by

prasadgade469
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Machine Learning Case Study

Uploaded by

prasadgade469
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

AIM: Case Study: Machine Learning For Data Science

Question 1

QUESTION : Explain Supervised unsupervised and


reinforcement learning Model

ANSWER: Supervised Models


 Technique where the machine learning model is trained on labeled
dataset.
 This means the dataset contains the input features as well as the
output labels
 These output labels guide the model to learn patterns and
relationships between input and output
Unsupervised Models
 Model is trained on unlabeled dataset
 Model learns by identifying patterns and structures within the data
itself.
Reinforcement Learning
 Technique where an agent learns to interact with the environment to
maximize its rewards
 Major components of Reinforcement learning are:
 Agent
 Environment
 State
 Action
 Policy
 Reward
Question 2
QUESTION Explain Machine learning techniques
1. Regression
2. Classification
3. Clustering
4. Decision Tree
5. Random Forest
6. Neural Network
7. Support Vector machines
ANSWER: 1. Regression

 Establish a relationship between input features and a continuous


numerical output

 Fit a mathematical function to the data, aiming to minimize the


difference between predicted and actual values

 Predicts continuous numerical values.


 Common algorithms: Linear Regression, Polynomial Regression,
Ridge Regression, Lasso Regression.
 Used for tasks like: Predicting house prices, stock prices,
temperature.

2. Classification
 Predicts categorical labels based on the input features
 The models learn decision boundaries to separate data points into
classes.
 Common algorithms: Logistic Regression, Decision Trees, Random
Forest, Support Vector Machines (SVMs), Neural Networks.
 Used for tasks like: Email spam filtering, image recognition,
sentiment analysis.
3.Clustering:

 Groups similar data points together.


 Common algorithms: K-means Clustering, Hierarchical Clustering,
DBSCAN.
 Used for tasks like: Customer segmentation, market basket analysis,
anomaly detection.
4. Decision Trees:

 Creates a tree-like model to make decisions based on rules.


 Can be used for both classification and regression.
 Easy to interpret and visualize.
 Can be prone to overfitting, especially with deep trees.

5.Random Forest:

 Ensemble of decision trees that reduces overfitting.


 Combines predictions from multiple trees.
 More robust to noise and outliers than individual decision trees.
 Can be computationally expensive to train.

6. Neural Networks:

 Inspired by the human brain and made up of interconnected


nodes called neurons.
 Can learn complex patterns and non-linear relationships.
 Used for tasks like: Image and speech recognition, natural language
processing.
 Require careful tuning of hyperparameters and can be
computationally expensive to train.

7.Support Vector Machines (SVMs):

 Finds the optimal hyperplane to separate data points into


different classes.
 Effective for high-dimensional data.
 Can be sensitive to outliers.
 Require careful selection of the kernel function.

Question 3
QUESTION What is Overfitting, and How Can You Avoid It?

ANSWER Overfitting occurs when the model fits the training data too closely
and leads to poor generalization.

The result of this is that the model performs extremely well on


training data(the data that it has seen previously) but performs poorly
when unseen(new) data is given to it.
This happens when the model learns the noise or random fluctuation in the
training data rather than the patterns/relationships.
Few other causes include:
Complexity: model with too many layers and parameters
Small dataset: model may memorize a small dataset
Noise in dataset: model may learn these noise or random fluctuation rather
than the underlying pattern.

How to avoid overfitting:


 Reduce model complexity: Simplify the model by removing
parameters, layers, or features.
 Increase the size of the dataset: Gather more data to provide the
model with more information.
 Regularization: Introduce a penalty term to the model's objective
function to discourage overfitting.
 Cross-validation: Evaluate the model's performance on multiple
subsets of the data to assess its generalization ability.

Question 4
QUESTION What are some real-life applications of clustering algorithms?

ANSWER 1.Bio-information

 Group genes based on expression patterns helping identify


biological pathways.

 Group similar protein structures to understand its evolution.

2.Urban Planning

 Identify various zones in a city such as residential, commercial,


industrial

 Group similar travel patterns to optimize public transport

3.Art and design

 Clustering can be used to group images based on their artistic style,


helping artists and designers to identify trends and inspiration.

 Generate color pallets for design projects

4.Astronomy

 Group stars based on their special characteristics, helping


astronomers to classify different types of stars.

 Identify groups of galaxies known as galaxy clusters.


Question 5
QUESTION Interpret the given case study
a) Predicting customer churn
https://github.com/Pradnya1208/Telecom-Customer-Churn-
prediction
ANSWER  Imported Libraries
Used libraries like pandas, numpy, matplotlib, seaborn, plotly, scikit-
learn, and classifiers such as RandomForest, XGBoost, CatBoost, etc.

 Loaded Dataset
Imported the dataset and explored initial records.

 Exploratory Data Analysis (EDA):


 Inspected the dataset for information such as the number of rows
and columns,
 Generated summary statistics for numerical features
 Visualizations:
o Visualized the distribution of numerical features using
histograms and boxplots.
o Created a heatmap
o Analyzed the distribution of the target variable (Churn)
o Plotted bar charts to visualize the relationships between
categorical features
 Insights:
o Identified key patterns and potential predictors of churn such
as higher churn rates among customers with month-to-month
contracts or those using electronic check payments.

 Checked for Missing Data:


Verified missing values in the dataset.

 Data Cleaning:
 Removed rows with zero tenure. Cleaned the data to ensure it was
ready for modeling.

 Preprocessing:
 Label Encoding, Feature Scaling
 Split the dataset into training and test sets

 Modeling:
o Trained multiple machine learning models:
 Decision Trees, RandomForest, KNN, SVM, XGBoost,
CatBoost, and Neural Networks.
o Evaluated initial model performance using metrics such as
accuracy, precision, recall, and F1-score.

 Hyperparameter Tuning:
o Hyperparameter tuning was done to improve model
performance by optimizing model settings that are not learned
during training
o GridSearchCV and RandomizedSearchCV to systematically
search for the best hyperparameter combinations.
o Performance Improvement: After tuning, the models’
performances were reevaluated using metrics such as
accuracy, precision, recall, and F1-score.

 Evaluation:
o Evaluated the models using various metrics such as accuracy,
precision, recall, F1-score, and ROC curves.
o Compared the performance of tuned models with the original
ones to select the most accurate model for customer churn
prediction.

QUESTION Interpret the given case study


b) House Price Prediction
https://github.com/DASHANANT/Machine-learning-case-
studies/blob/main/House%20Price%20Prediction/house-price-
prediction.ipynb

ANSWER  Data Loading: The dataset has been loaded.

 Preprocessing:
 Handling Missing Values: dropped null values
 Encoding Categorical Variables: Any categorical data encoded
into numerical format using
 Feature Scaling.
 Splitting Data: The dataset was divided into features (X) and
target (y) for house price prediction, followed by splitting into
training and test sets (train-test split).

 Exploratory Data Analysis (EDA):


 Data Overview: A quick inspection of the dataset, including types
of features, shape, and basic statistics.
 Visualizations
 Identified relationships between independent variables and the
target variable to select important features for the model.

 Model Creation: A custom Linear() class was created and used to train
a linear regression model for predicting house prices.
 Model Training: The data was split into training and testing sets using
train_test_split, and the model was trained on the training set.

 Multiple Iterations: The training and testing were repeated 10 times to


optimize the model, storing parameters, scores, and Mean Squared
Error (MSE) for each iteration.

 Evaluation Metrics: The model's performance was evaluated using R-


squared score and MSE, with the best result being selected based on
the highest score.

 Best Result Output: The best hyperplane parameters, score (0.819),


and MSE (4429.39) were printed as final outputs.

CONCLUSION Understood basic terminologies related to machine learning such as


Supervised, Unsupervised and Reinforcement learning. Read about various
algorithms in supervised and unsupervised learnings such as regression,
neural networks, clustering, SVM, random forest. Read about overfitting its
causes and how to overcome it, Interpreted given machine learning projects
titled Predicting Customer Churn and House Price prediction respectively
understood how the author has performed data cleaning, EDA and
employed machine learning algorithms.

You might also like