0% found this document useful (0 votes)

18 views8 pages

Machine Learning Case Study

Uploaded by

prasadgade469

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views8 pages

Machine Learning Case Study

Uploaded by

prasadgade469

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

AIM: Case Study: Machine Learning For Data Science

Question 1

QUESTION : Explain Supervised unsupervised and

reinforcement learning Model

ANSWER: Supervised Models

 Technique where the machine learning model is trained on labeled
dataset.
 This means the dataset contains the input features as well as the
output labels
 These output labels guide the model to learn patterns and
relationships between input and output
Unsupervised Models
 Model is trained on unlabeled dataset
 Model learns by identifying patterns and structures within the data
itself.
Reinforcement Learning
 Technique where an agent learns to interact with the environment to
maximize its rewards
 Major components of Reinforcement learning are:
 Agent
 Environment
 State
 Action
 Policy
 Reward
Question 2
QUESTION Explain Machine learning techniques
1. Regression
2. Classification
3. Clustering
4. Decision Tree
5. Random Forest
6. Neural Network
7. Support Vector machines
ANSWER: 1. Regression

 Establish a relationship between input features and a continuous

numerical output

 Fit a mathematical function to the data, aiming to minimize the

difference between predicted and actual values

 Predicts continuous numerical values.

 Common algorithms: Linear Regression, Polynomial Regression,
Ridge Regression, Lasso Regression.
 Used for tasks like: Predicting house prices, stock prices,
temperature.

2. Classification
 Predicts categorical labels based on the input features
 The models learn decision boundaries to separate data points into
classes.
 Common algorithms: Logistic Regression, Decision Trees, Random
Forest, Support Vector Machines (SVMs), Neural Networks.
 Used for tasks like: Email spam filtering, image recognition,
sentiment analysis.
3.Clustering:

 Groups similar data points together.

 Common algorithms: K-means Clustering, Hierarchical Clustering,
DBSCAN.
 Used for tasks like: Customer segmentation, market basket analysis,
anomaly detection.
4. Decision Trees:

 Creates a tree-like model to make decisions based on rules.

 Can be used for both classification and regression.
 Easy to interpret and visualize.
 Can be prone to overfitting, especially with deep trees.

5.Random Forest:

 Ensemble of decision trees that reduces overfitting.

 Combines predictions from multiple trees.
 More robust to noise and outliers than individual decision trees.
 Can be computationally expensive to train.

6. Neural Networks:

 Inspired by the human brain and made up of interconnected

nodes called neurons.
 Can learn complex patterns and non-linear relationships.
 Used for tasks like: Image and speech recognition, natural language
processing.
 Require careful tuning of hyperparameters and can be
computationally expensive to train.

7.Support Vector Machines (SVMs):

 Finds the optimal hyperplane to separate data points into

different classes.
 Effective for high-dimensional data.
 Can be sensitive to outliers.
 Require careful selection of the kernel function.

Question 3
QUESTION What is Overfitting, and How Can You Avoid It?

ANSWER Overfitting occurs when the model fits the training data too closely
and leads to poor generalization.

The result of this is that the model performs extremely well on

training data(the data that it has seen previously) but performs poorly
when unseen(new) data is given to it.
This happens when the model learns the noise or random fluctuation in the
training data rather than the patterns/relationships.
Few other causes include:
Complexity: model with too many layers and parameters
Small dataset: model may memorize a small dataset
Noise in dataset: model may learn these noise or random fluctuation rather
than the underlying pattern.

How to avoid overfitting:

 Reduce model complexity: Simplify the model by removing
parameters, layers, or features.
 Increase the size of the dataset: Gather more data to provide the
model with more information.
 Regularization: Introduce a penalty term to the model's objective
function to discourage overfitting.
 Cross-validation: Evaluate the model's performance on multiple
subsets of the data to assess its generalization ability.

Question 4
QUESTION What are some real-life applications of clustering algorithms?

ANSWER 1.Bio-information

 Group genes based on expression patterns helping identify

biological pathways.

 Group similar protein structures to understand its evolution.

2.Urban Planning

 Identify various zones in a city such as residential, commercial,

industrial

 Group similar travel patterns to optimize public transport

3.Art and design

 Clustering can be used to group images based on their artistic style,

helping artists and designers to identify trends and inspiration.

 Generate color pallets for design projects

4.Astronomy

 Group stars based on their special characteristics, helping

astronomers to classify different types of stars.

 Identify groups of galaxies known as galaxy clusters.

Question 5
QUESTION Interpret the given case study
a) Predicting customer churn
https://github.com/Pradnya1208/Telecom-Customer-Churn-
prediction
ANSWER  Imported Libraries
Used libraries like pandas, numpy, matplotlib, seaborn, plotly, scikit-
learn, and classifiers such as RandomForest, XGBoost, CatBoost, etc.

 Loaded Dataset
Imported the dataset and explored initial records.

 Exploratory Data Analysis (EDA):

 Inspected the dataset for information such as the number of rows
and columns,
 Generated summary statistics for numerical features
 Visualizations:
o Visualized the distribution of numerical features using
histograms and boxplots.
o Created a heatmap
o Analyzed the distribution of the target variable (Churn)
o Plotted bar charts to visualize the relationships between
categorical features
 Insights:
o Identified key patterns and potential predictors of churn such
as higher churn rates among customers with month-to-month
contracts or those using electronic check payments.

 Checked for Missing Data:

Verified missing values in the dataset.

 Data Cleaning:
 Removed rows with zero tenure. Cleaned the data to ensure it was
ready for modeling.

 Preprocessing:
 Label Encoding, Feature Scaling
 Split the dataset into training and test sets

 Modeling:
o Trained multiple machine learning models:
 Decision Trees, RandomForest, KNN, SVM, XGBoost,
CatBoost, and Neural Networks.
o Evaluated initial model performance using metrics such as
accuracy, precision, recall, and F1-score.

 Hyperparameter Tuning:
o Hyperparameter tuning was done to improve model
performance by optimizing model settings that are not learned
during training
o GridSearchCV and RandomizedSearchCV to systematically
search for the best hyperparameter combinations.
o Performance Improvement: After tuning, the models’
performances were reevaluated using metrics such as
accuracy, precision, recall, and F1-score.

 Evaluation:
o Evaluated the models using various metrics such as accuracy,
precision, recall, F1-score, and ROC curves.
o Compared the performance of tuned models with the original
ones to select the most accurate model for customer churn
prediction.

QUESTION Interpret the given case study

b) House Price Prediction
https://github.com/DASHANANT/Machine-learning-case-
studies/blob/main/House%20Price%20Prediction/house-price-
prediction.ipynb

ANSWER  Data Loading: The dataset has been loaded.

 Preprocessing:
 Handling Missing Values: dropped null values
 Encoding Categorical Variables: Any categorical data encoded
into numerical format using
 Feature Scaling.
 Splitting Data: The dataset was divided into features (X) and
target (y) for house price prediction, followed by splitting into
training and test sets (train-test split).

 Exploratory Data Analysis (EDA):

 Data Overview: A quick inspection of the dataset, including types
of features, shape, and basic statistics.
 Visualizations
 Identified relationships between independent variables and the
target variable to select important features for the model.

 Model Creation: A custom Linear() class was created and used to train
a linear regression model for predicting house prices.
 Model Training: The data was split into training and testing sets using
train_test_split, and the model was trained on the training set.

 Multiple Iterations: The training and testing were repeated 10 times to

optimize the model, storing parameters, scores, and Mean Squared
Error (MSE) for each iteration.

 Evaluation Metrics: The model's performance was evaluated using R-

squared score and MSE, with the best result being selected based on
the highest score.

 Best Result Output: The best hyperplane parameters, score (0.819),

and MSE (4429.39) were printed as final outputs.

CONCLUSION Understood basic terminologies related to machine learning such as

Supervised, Unsupervised and Reinforcement learning. Read about various
algorithms in supervised and unsupervised learnings such as regression,
neural networks, clustering, SVM, random forest. Read about overfitting its
causes and how to overcome it, Interpreted given machine learning projects
titled Predicting Customer Churn and House Price prediction respectively
understood how the author has performed data cleaning, EDA and
employed machine learning algorithms.

Next - Level - Data - Science - Sample Chapter
No ratings yet
Next - Level - Data - Science - Sample Chapter
37 pages
Peugeot 307 Owners Manual 2003
100% (2)
Peugeot 307 Owners Manual 2003
83 pages
What Are The Types of Machine Learning?
100% (1)
What Are The Types of Machine Learning?
24 pages
Gear Beam and Wear Strength
No ratings yet
Gear Beam and Wear Strength
46 pages
Ultrasonic Testing
100% (1)
Ultrasonic Testing
57 pages
C11.4.QA1.Chemical Bonding.R
No ratings yet
C11.4.QA1.Chemical Bonding.R
9 pages
Asign-3 DWDM
No ratings yet
Asign-3 DWDM
27 pages
SCI10 - Q4 - M4 - Predicting and Balancing Chemical Equations
No ratings yet
SCI10 - Q4 - M4 - Predicting and Balancing Chemical Equations
20 pages
Machine Learning Basics
No ratings yet
Machine Learning Basics
16 pages
Remedial Math 9 q1
No ratings yet
Remedial Math 9 q1
5 pages
Ai Possible Qns
No ratings yet
Ai Possible Qns
15 pages
Beginner Course-Navigating The UI
No ratings yet
Beginner Course-Navigating The UI
8 pages
BTC Script Grabber
No ratings yet
BTC Script Grabber
3 pages
C++ All Modules
No ratings yet
C++ All Modules
68 pages
Worksheet Chapter - Machine Learning Algorithms
No ratings yet
Worksheet Chapter - Machine Learning Algorithms
2 pages
SCP Productmanual A2compu 16037 2020 en
No ratings yet
SCP Productmanual A2compu 16037 2020 en
90 pages
ML QB Ans
No ratings yet
ML QB Ans
48 pages
ML QB
No ratings yet
ML QB
6 pages
Machine Learning Strategies
No ratings yet
Machine Learning Strategies
59 pages
MLT Essentials
No ratings yet
MLT Essentials
32 pages
Softening Behavior of Reinforced Concrete Beams Under Cyclic Loading
No ratings yet
Softening Behavior of Reinforced Concrete Beams Under Cyclic Loading
24 pages
Iachasta (Inter-Administration Charging and Statistics)
No ratings yet
Iachasta (Inter-Administration Charging and Statistics)
15 pages
Machine Learning Basics
No ratings yet
Machine Learning Basics
32 pages
Shivaji University, Kolhapur
No ratings yet
Shivaji University, Kolhapur
12 pages
Kulfoldi Kutatasi Jelentesek Gyujtemenye
No ratings yet
Kulfoldi Kutatasi Jelentesek Gyujtemenye
92 pages
ML Final Notes Unit 4,5 Rishi
No ratings yet
ML Final Notes Unit 4,5 Rishi
45 pages
Projector+itachi CP S370W
No ratings yet
Projector+itachi CP S370W
60 pages
ML and Deploying It Using Flask and Docker.
No ratings yet
ML and Deploying It Using Flask and Docker.
30 pages
Project
No ratings yet
Project
36 pages
4th Sem End Semester Question Papers
No ratings yet
4th Sem End Semester Question Papers
15 pages
Machine Learning Most Important Question For Mid Term Ipu University
No ratings yet
Machine Learning Most Important Question For Mid Term Ipu University
36 pages
L03 The Regression Pipeline - 2
No ratings yet
L03 The Regression Pipeline - 2
58 pages
Ds Unit 2
No ratings yet
Ds Unit 2
36 pages
Bda Toppers Solution
No ratings yet
Bda Toppers Solution
71 pages
Unit-4 Data Mining
No ratings yet
Unit-4 Data Mining
19 pages
01 Introduction To Yocto
No ratings yet
01 Introduction To Yocto
62 pages
ML Adv
No ratings yet
ML Adv
51 pages
Act 7
No ratings yet
Act 7
18 pages
DIT865 2018 Mar Solution
No ratings yet
DIT865 2018 Mar Solution
9 pages
ML QB With Answer
No ratings yet
ML QB With Answer
20 pages
Comparative Analysis of Water and Oil Media On Temperature Stability in PID Control-Based Digital Thermometer Calibrator
No ratings yet
Comparative Analysis of Water and Oil Media On Temperature Stability in PID Control-Based Digital Thermometer Calibrator
6 pages
Machine Learning
No ratings yet
Machine Learning
12 pages
ML Unit 1
No ratings yet
ML Unit 1
9 pages
Guide
No ratings yet
Guide
24 pages
Math 30-2 Unit Exam (#1)
No ratings yet
Math 30-2 Unit Exam (#1)
10 pages
Advanced Techniques in Machine Learning and Optimization
No ratings yet
Advanced Techniques in Machine Learning and Optimization
8 pages
Pick&Place Station Assembly Instructions
No ratings yet
Pick&Place Station Assembly Instructions
20 pages
Interview Preparing - ML Draft
No ratings yet
Interview Preparing - ML Draft
12 pages
Technical Questions and Answers
No ratings yet
Technical Questions and Answers
12 pages
Python Theory Notes
No ratings yet
Python Theory Notes
28 pages
ML 1
No ratings yet
ML 1
17 pages
Introduction and Basics of Machine Learning
No ratings yet
Introduction and Basics of Machine Learning
9 pages
ML
No ratings yet
ML
18 pages
Machine Learning
No ratings yet
Machine Learning
16 pages
M. SC I Sem Course Structure
No ratings yet
M. SC I Sem Course Structure
9 pages
??????? ???????? ??????????!
No ratings yet
??????? ???????? ??????????!
16 pages
DM Assignment 2
No ratings yet
DM Assignment 2
23 pages
Machine Learning
No ratings yet
Machine Learning
7 pages
ML 2
No ratings yet
ML 2
10 pages
Propylparabens Uv-Vis 1
No ratings yet
Propylparabens Uv-Vis 1
12 pages
Put MLT
No ratings yet
Put MLT
12 pages
Supervised Learning
No ratings yet
Supervised Learning
14 pages
Unit-5 MECH 3-2
No ratings yet
Unit-5 MECH 3-2
14 pages
Machine Learning Assignment
No ratings yet
Machine Learning Assignment
5 pages
Mubbashir Assignment ML
No ratings yet
Mubbashir Assignment ML
10 pages
Types Machine Learning Algorithm New
No ratings yet
Types Machine Learning Algorithm New
29 pages
Sheet Five Conduction MEP 212s
No ratings yet
Sheet Five Conduction MEP 212s
4 pages
Assignment
No ratings yet
Assignment
5 pages
Pyq ML
No ratings yet
Pyq ML
8 pages
SSRN 3702236
No ratings yet
SSRN 3702236
8 pages
Machine Learning
No ratings yet
Machine Learning
8 pages
Answer
No ratings yet
Answer
5 pages
Sharp Photodevices Application Cirquits
No ratings yet
Sharp Photodevices Application Cirquits
7 pages
Data Analyst Interview Questionaries
No ratings yet
Data Analyst Interview Questionaries
16 pages
Let's Begin With:: Differentiate Between Supervised and Unsupervised Learning
No ratings yet
Let's Begin With:: Differentiate Between Supervised and Unsupervised Learning
26 pages
Presentation On Supervised Learning
No ratings yet
Presentation On Supervised Learning
8 pages
AI ML K6rn1i 54 Merged
No ratings yet
AI ML K6rn1i 54 Merged
6 pages
Maximum Mark: 30: Cambridge International Advanced Subsidiary and Advanced Level
No ratings yet
Maximum Mark: 30: Cambridge International Advanced Subsidiary and Advanced Level
4 pages
AIML105
No ratings yet
AIML105
5 pages
HW 3
No ratings yet
HW 3
5 pages
Predictive Analytics Answer Key
No ratings yet
Predictive Analytics Answer Key
3 pages
ML QP
No ratings yet
ML QP
3 pages
List of Deliverabls-Aker Proposal
No ratings yet
List of Deliverabls-Aker Proposal
4 pages
Unit1 1
No ratings yet
Unit1 1
3 pages
Cavity Vent Valve
No ratings yet
Cavity Vent Valve
2 pages
What Are The Differences Between IE1-IE4
No ratings yet
What Are The Differences Between IE1-IE4
2 pages
32 + 44 B10F-Ball-Valve
No ratings yet
32 + 44 B10F-Ball-Valve
1 page
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet

Machine Learning Case Study

Uploaded by

Machine Learning Case Study

Uploaded by

AIM: Case Study: Machine Learning For Data Science

QUESTION : Explain Supervised unsupervised and

ANSWER: Supervised Models

 Establish a relationship between input features and a continuous

 Fit a mathematical function to the data, aiming to minimize the

 Predicts continuous numerical values.

 Groups similar data points together.

 Creates a tree-like model to make decisions based on rules.

 Ensemble of decision trees that reduces overfitting.

 Inspired by the human brain and made up of interconnected

7.Support Vector Machines (SVMs):

 Finds the optimal hyperplane to separate data points into

The result of this is that the model performs extremely well on

How to avoid overfitting:

 Group genes based on expression patterns helping identify

 Group similar protein structures to understand its evolution.

 Identify various zones in a city such as residential, commercial,

 Group similar travel patterns to optimize public transport

3.Art and design

 Clustering can be used to group images based on their artistic style,

 Generate color pallets for design projects

 Group stars based on their special characteristics, helping

 Identify groups of galaxies known as galaxy clusters.

 Exploratory Data Analysis (EDA):

 Checked for Missing Data:

QUESTION Interpret the given case study

ANSWER  Data Loading: The dataset has been loaded.

 Exploratory Data Analysis (EDA):

 Multiple Iterations: The training and testing were repeated 10 times to

 Evaluation Metrics: The model's performance was evaluated using R-

 Best Result Output: The best hyperplane parameters, score (0.819),

CONCLUSION Understood basic terminologies related to machine learning such as

You might also like