0% found this document useful (0 votes)

19 views6 pages

ML MAKAUT Unit-3

Uploaded by

ritikaghosh0402

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views6 pages

ML MAKAUT Unit-3

Uploaded by

ritikaghosh0402

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Model n

selection

Model selection involves trying different machine learning tools, testing them out, picking the best one for your
problem, and making sure it works well before using it for your task. It's about finding the right key to unlock
the door to solving your problem.
Model selection is about picking the best machine learning tool for a job. Here's a simple breakdown:

1. Choose Different Tools: Think of trying out different tools (like decision trees, neural networks) to see
which one works best.
2. Test Them: Test each tool to see how well they work. It's like trying different keys to find the one that
unlocks a door.
3. Pick the Best: After testing, choose the tool that works the best for your specific problem.
4. Check It Again: Double-check that the chosen tool works well on a separate test to make sure it's really
good.
5. Consider What You Know: Sometimes, what you know about the problem can help decide which tool
is best.
6. Keep It Simple: Prefer simpler tools if they work as well as complex ones. Simple tools are often easier
to understand and use.
Statistical Learning Theory
Statistical Learning Theory is a framework in machine learning that focuses on understanding and
analyzing the learning process from a statistical perspective. It explores the theoretical foundations and
principles underlying the behavior of machine learning algorithms.
Key Concepts in Statistical Learning
Theory:
1. Generalization: It deals with a model's ability to perform well on unseen or new data after being trained
on a specific dataset. The aim is to create models that generalize well beyond the training data.
2. Bias-Variance Tradeoff: Balancing bias (errors from overly simple assumptions) and variance
(sensitivity to fluctuations in the dataset) to build models that have low prediction error.
3. Model Complexity: Understanding the impact of model complexity on its ability to fit the training data
and generalize to new data. Simple models might underfit, while overly complex ones might overfit.

statistical model
A statistical model is a mathematical representation or description of relationships between different
variables
predictionswithin a dataset. It's a tool used in statistics and machine learning to understand and make
or inferences
the data.
about
Key Components of a Statistical Model:
1. Variables: It consists of variables (features, predictors) that are used to describe or predict the
behavior of another variable (target, outcome).
2. Parameters: These are the unknown quantities in the model that need to be estimated from the
data. They define the structure or characteristics of the model.
3. Assumptions: Models are built based on certain assumptions about the relationships between
variables. These assumptions can vary depending on the model type.
4. Predictive Power: The model's ability to accurately predict or explain outcomes or responses
in new or unseen data

Generalization:
• Definition: Generalization refers to a model's ability to perform well on new, unseen data,
beyond the data it was trained on.

• Significance: It indicates how effectively a model captures underlying patterns and

relationships in the data rather than just memorizing specific examples from the training set.

• Avoiding Overfitting: A model that generalizes well does not overfit the training data, meaning
it can make accurate predictions on new, real-world data.

• Importance: The ultimate goal of building a statistical model is to achieve good generalization,
ensuring its applicability beyond the dataset used for training.
Validation:
• Purpose: Validation involves assessing a model's performance and generalization ability using
a separate dataset (not used during training), usually referred to as a validation set.

• Types of Validation:

• Train-Test Split: Dividing the dataset into training and testing sets. The model is trained
on the training set and evaluated on the testing set.

• Cross-Validation: Dividing the dataset into multiple subsets (folds) and performing
multiple train-test splits. It averages model performance across different subsets.
• Evaluation Metrics: In validation, various evaluation metrics (accuracy, precision, recall, etc.)
are used to measure how well the model performs on the validation data.

• Parameter Tuning: Validation helps in hyperparameter tuning and model selection by comparing
different models' performances on the validation set.
Precision:

• Definition: Precision measures the accuracy of positive predictions made by the model. It
represents the ratio of correctly predicted positive observations to the total predicted positive
observations.

• Formula: preci=True Positives /(True Positives + False Posit )

• Interpretation: It answers the question: "Of all the positive predictions made by the model,
how many were correct?"

Recall (Sensitivity):
• Definition: Recall (or sensitivity) measures the model's ability to correctly identify positive
instances. It's the ratio of correctly predicted positive observations to the total actual positive
observations.

• Formula:

• Interpretation: It answers the question: "Of all the actual positive instances, how many did the
model correctly identify?"
F1 Score:

• Definition: F1 score is the harmonic mean of precision and recall. It provides a balance
between precision and recall, especially when there is an imbalance between the classes.

• Formula:

• Interpretation: F1 score combines precision and recall into a single metric. It's useful when
both precision and recall are equally important, and achieving a balance between them is
necessary.

Training Set:
• Purpose: The training set is the portion of the dataset used to train or teach the machine
learning model. The model learns patterns, relationships, and features from this data.

• Usage: The training set comprises a significant part of the dataset, and the model is fitted or
trained on this data to minimize the training error.

• Model Building: The model learns from the training data by adjusting its parameters to make
accurate predictions on future, unseen data.

Validation Set:

• Purpose: The validation set is used to assess the performance of the model during the training
phase and tune hyperparameters.

• Usage: It's a separate dataset from the training set, and the model does not directly learn from it.
Instead, it helps in optimizing the model by evaluating its performance on data it hasn't seen
before.

• Hyperparameter Tuning: Validation sets aid in selecting the best-performing model by

adjusting hyperparameters based on validation performance, preventing overfitting on the
training data.

Test Set:

• Purpose: The test set is reserved to evaluate the final performance of the trained model after
model selection and hyperparameter tuning.

• Usage: It's a completely unseen dataset by the model during training and validation. The test set
provides an unbiased estimate of the model's generalization and predictive ability on new,
realworld data.

• Performance Assessment: The model's performance metrics on the test set indicate how well it
generalizes to new, unseen data and avoids overfitting.
Components of a Confusion Matrix:
A confusion matrix is a table used to evaluate the performance of a classification model. It summarizes
the performance of a machine learning model by presenting the counts of correct and incorrect

predictions made by the model on a dataset.

1. True Positives (TP): The number of instances correctly predicted as positive by the model.
2. True Negatives (TN): The number of instances correctly predicted as negative by the model.
3. False Positives (FP): The number of instances incorrectly predicted as positive by the model
(actually negative).
4. False Negatives (FN): The number of instances incorrectly predicted as negative by the model
(actually positive)
Ensemble
Ensemble methods are machine learning techniques that combine multiple individual models to create a
more robust and accurate predictive model. Three common ensemble methods are Boosting, Bagging,
and Random
Forests.
Boosting:

• Idea: Boosting is a sequential learning technique where models are trained iteratively, and each
new model focuses on correcting the errors made by the previous ones.

• Process: At each iteration, the algorithm assigns higher weights to incorrectly predicted
instances, forcing subsequent models to pay more attention to those instances.

• Popular Algorithms: AdaBoost, Gradient Boosting, XGBoost.

• Strength: Boosting often leads to highly accurate models by focusing on challenging instances.

Bagging (Bootstrap Aggregating):

• Idea: Bagging involves training multiple models independently on random subsets of the
training data and then combining their predictions.

• Process: Each model is trained on a bootstrapped sample (subset of the training data created by
random sampling with replacement).

• Popular Algorithm: Random Forest, which creates an ensemble of decision trees using
bagging.

• Strength: Reduces variance and helps in creating a more stable and robust model.

Random Forests:

• Idea: Random Forest is an ensemble method based on the concept of bagging. It creates
multiple decision trees during training and combines their predictions for more accurate results.
• Process: Each tree is trained on a subset of the features (randomly selected subset of features at

each split) and a bootstrapped sample of the training data.

• Strength: Reduces overfitting and provides better accuracy by combining the predictions of
multiple decision trees.
.

For Confusion Matrix Numerical: https://www.youtube.com/watch?v=e3TmD8GpXrQ

For Download More Notes Join Our Telegram Channel

https://t.me/thecoderbro
Scan to Join -

Rev2 Service Manual CRANEX3D Eng
100% (5)
Rev2 Service Manual CRANEX3D Eng
176 pages
ML Unit-3 - RTU
No ratings yet
ML Unit-3 - RTU
20 pages
Ai Unit 5
No ratings yet
Ai Unit 5
13 pages
Model Evaluation in ML
No ratings yet
Model Evaluation in ML
12 pages
Evaluating Machine Learning Algorithms and Model Selection
No ratings yet
Evaluating Machine Learning Algorithms and Model Selection
10 pages
AI & ML Notes
No ratings yet
AI & ML Notes
22 pages
Unit III 1
No ratings yet
Unit III 1
21 pages
SML Updated UNIT 4
No ratings yet
SML Updated UNIT 4
44 pages
Model Selection NEW
No ratings yet
Model Selection NEW
24 pages
Lecture 9 - Evaluations
No ratings yet
Lecture 9 - Evaluations
68 pages
Lec 8
No ratings yet
Lec 8
35 pages
Mod8 DM
No ratings yet
Mod8 DM
13 pages
AIML-Unit 5 Notes-Assignment 5
No ratings yet
AIML-Unit 5 Notes-Assignment 5
24 pages
Evaluating A Machine Learning Model
No ratings yet
Evaluating A Machine Learning Model
14 pages
T1 ML QB Soln
No ratings yet
T1 ML QB Soln
23 pages
Training Evaluation
No ratings yet
Training Evaluation
42 pages
Machine Learning Basics
No ratings yet
Machine Learning Basics
32 pages
04 - Model Selection
No ratings yet
04 - Model Selection
62 pages
ML Unit IV
No ratings yet
ML Unit IV
70 pages
Unit 4
No ratings yet
Unit 4
34 pages
Model Evaluation and Selection
No ratings yet
Model Evaluation and Selection
49 pages
Model Evaluation
No ratings yet
Model Evaluation
18 pages
Basics of ML and Evaluation
No ratings yet
Basics of ML and Evaluation
42 pages
ML - Module 5
No ratings yet
ML - Module 5
80 pages
2-Training and Testing Models, Evaluation Metrics-01-07-2023
No ratings yet
2-Training and Testing Models, Evaluation Metrics-01-07-2023
23 pages
Chapter 7 - LAST
No ratings yet
Chapter 7 - LAST
29 pages
ML 2 PPT Unit 2
No ratings yet
ML 2 PPT Unit 2
214 pages
Vsat2k - ML - Ch1a Evaluation of Learning Algorithms - Jan 2025
No ratings yet
Vsat2k - ML - Ch1a Evaluation of Learning Algorithms - Jan 2025
19 pages
UNIT-2 ML
No ratings yet
UNIT-2 ML
10 pages
What Are The Basic Concepts in Machine Learning
No ratings yet
What Are The Basic Concepts in Machine Learning
3 pages
Machine Learning Models: by Mayuri Bhandari
No ratings yet
Machine Learning Models: by Mayuri Bhandari
48 pages
ML Chap 2
No ratings yet
ML Chap 2
60 pages
CS 620 / DASC 600 Introduction To Data Science & Analytics: Lecture 8-Performance Evaluation
No ratings yet
CS 620 / DASC 600 Introduction To Data Science & Analytics: Lecture 8-Performance Evaluation
62 pages
AIMl TA2
No ratings yet
AIMl TA2
4 pages
机器学习
No ratings yet
机器学习
41 pages
Evaluating Model Performance Unit 6
No ratings yet
Evaluating Model Performance Unit 6
33 pages
Strategy Deck
No ratings yet
Strategy Deck
16 pages
Approach Towards Model Evaluation, Model Selection
No ratings yet
Approach Towards Model Evaluation, Model Selection
13 pages
ML Fundamentals
No ratings yet
ML Fundamentals
15 pages
Final ML
No ratings yet
Final ML
2 pages
ML 5
No ratings yet
ML 5
26 pages
Machine Learning HC
No ratings yet
Machine Learning HC
4 pages
CH-5 ML
No ratings yet
CH-5 ML
36 pages
CSC413 Lecture Note
No ratings yet
CSC413 Lecture Note
32 pages
Predictive Unit 1
No ratings yet
Predictive Unit 1
22 pages
ChatGPT - Machine Learning Overview
No ratings yet
ChatGPT - Machine Learning Overview
34 pages
L2 - Problems in ML & Performance Evaluation
No ratings yet
L2 - Problems in ML & Performance Evaluation
30 pages
Model Validation & Data Partition
No ratings yet
Model Validation & Data Partition
14 pages
ML 01
No ratings yet
ML 01
24 pages
Train, Test, Validation Split
No ratings yet
Train, Test, Validation Split
9 pages
Lecture - 2 Classification (Machine Learning Basic and KNN)
No ratings yet
Lecture - 2 Classification (Machine Learning Basic and KNN)
94 pages
Unit - 2 Deep Learning
No ratings yet
Unit - 2 Deep Learning
26 pages
Performance Evaluation
No ratings yet
Performance Evaluation
24 pages
ML 1 PPT Unit 1
No ratings yet
ML 1 PPT Unit 1
93 pages
dbms-10 Marks
No ratings yet
dbms-10 Marks
32 pages
MLT - MKC
No ratings yet
MLT - MKC
10 pages
Machine - Learning - Unit - 1
No ratings yet
Machine - Learning - Unit - 1
70 pages
Week 4 - Intro To ML
No ratings yet
Week 4 - Intro To ML
37 pages
Machine Learning Note
No ratings yet
Machine Learning Note
40 pages
MACHINE LEARNING FOR BEGINNERS: A Practical Guide to Understanding and Applying Machine Learning Concepts (2023 Beginner Crash Course)
From Everand
MACHINE LEARNING FOR BEGINNERS: A Practical Guide to Understanding and Applying Machine Learning Concepts (2023 Beginner Crash Course)
Elaine Tate
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Event Schedule Techetc 2k23-1
No ratings yet
Event Schedule Techetc 2k23-1
1 page
Student Notice - 30 Nov 2024
No ratings yet
Student Notice - 30 Nov 2024
1 page
Hetc-2024-646 (17.12.2024)
No ratings yet
Hetc-2024-646 (17.12.2024)
1 page
Trigonometry and Coordinate Geometry Hetc 23-07-22
No ratings yet
Trigonometry and Coordinate Geometry Hetc 23-07-22
17 pages
What Is A Support Vector Machine
No ratings yet
What Is A Support Vector Machine
3 pages
Applet
No ratings yet
Applet
41 pages
Aptitude Skills Revision Hetc
No ratings yet
Aptitude Skills Revision Hetc
6 pages
Subject: Project Management and Entrepreneurship (Hsmc-701)
No ratings yet
Subject: Project Management and Entrepreneurship (Hsmc-701)
10 pages
PM 8
No ratings yet
PM 8
3 pages
Functions
No ratings yet
Functions
12 pages
24c16 - Memoria
No ratings yet
24c16 - Memoria
8 pages
Data Page
No ratings yet
Data Page
7 pages
Digital Initiative For Farmers: Rebooting Public Libraries
No ratings yet
Digital Initiative For Farmers: Rebooting Public Libraries
8 pages
Data Sheet Acronis SCS Cyber Backup 12.5 Hardened Edition EN US 230627
No ratings yet
Data Sheet Acronis SCS Cyber Backup 12.5 Hardened Edition EN US 230627
2 pages
Tally Question
No ratings yet
Tally Question
59 pages
CC 2023
No ratings yet
CC 2023
2 pages
? GP-02-Kit - Easy UART Control, NMEA Protocol Ready.
No ratings yet
? GP-02-Kit - Easy UART Control, NMEA Protocol Ready.
9 pages
Changing Colors Tutorial
No ratings yet
Changing Colors Tutorial
6 pages
Digital Marketing
No ratings yet
Digital Marketing
41 pages
Programming Cognitive Robots - Levesque - 2019 Ed
No ratings yet
Programming Cognitive Robots - Levesque - 2019 Ed
203 pages
PBL Last .
No ratings yet
PBL Last .
11 pages
Subnetting A Network With IP Addresses To Share Among Different Sites
No ratings yet
Subnetting A Network With IP Addresses To Share Among Different Sites
5 pages
HashMap HashSet LeetCode Questions
No ratings yet
HashMap HashSet LeetCode Questions
2 pages
Blotter System Docs
No ratings yet
Blotter System Docs
31 pages
Stok Status Report
No ratings yet
Stok Status Report
52 pages
Practical List
No ratings yet
Practical List
7 pages
Texecom Premier 412 816 832 User Guide
No ratings yet
Texecom Premier 412 816 832 User Guide
24 pages
Atmel
No ratings yet
Atmel
6 pages
٢٠٢٥ ٠٥ ٠٥
No ratings yet
٢٠٢٥ ٠٥ ٠٥
369 pages
Cs 083 HP 2 Mat Cs Final
No ratings yet
Cs 083 HP 2 Mat Cs Final
10 pages
PWC Communication Tools and B.U.D.S. (Spark Series) - Shop Manual Supplement smr2016-108
No ratings yet
PWC Communication Tools and B.U.D.S. (Spark Series) - Shop Manual Supplement smr2016-108
6 pages
SBS Product Catalog 2018
No ratings yet
SBS Product Catalog 2018
53 pages
B - Tech - 2nd - Year - AIML, AIDS, Computer - SC - and - Design - 2022 - 23 - Revised
No ratings yet
B - Tech - 2nd - Year - AIML, AIDS, Computer - SC - and - Design - 2022 - 23 - Revised
14 pages
Generic Best Practices For Hackathon
No ratings yet
Generic Best Practices For Hackathon
1 page
Anomaly-Based IDS To Detect Attack Using Various...
No ratings yet
Anomaly-Based IDS To Detect Attack Using Various...
5 pages
Datasheet 708E Series v1 02 20221205
No ratings yet
Datasheet 708E Series v1 02 20221205
3 pages
Install Shield User Guide
No ratings yet
Install Shield User Guide
2,504 pages

ML MAKAUT Unit-3

Uploaded by

ML MAKAUT Unit-3

Uploaded by

Model n

• Significance: It indicates how effectively a model captures underlying patterns and

• Formula: preci=True Positives /(True Positives + False Posit )

• Hyperparameter Tuning: Validation sets aid in selecting the best-performing model by

predictions made by the model on a dataset.

• Popular Algorithms: AdaBoost, Gradient Boosting, XGBoost.

Bagging (Bootstrap Aggregating):

each split) and a bootstrapped sample of the training data.

For Confusion Matrix Numerical: https://www.youtube.com/watch?v=e3TmD8GpXrQ

For Download More Notes Join Our Telegram Channel

You might also like