FAM Unit5
FAM Unit5
TYPES OF
LEARNING
TYPES OF LEARNING
Supervised Learning
Unsupervised Learning
Semi-supervised Learning
Reinforcement Learning
SUPERVISED LEARNING
Its use of labeled datasets to train algorithms
that to classify data or predict outcomes
accurately.
It relies on guidance and supervision
Ex. Exit poll
• Supervised learning involves training a machine
from labeled data.
• Labeled data consists of examples with the
correct answer or classification.
• The machine learns the relationship between
inputs (fruit images) and outputs (fruit labels).
• The trained machine can then make predictions
on new, unlabeled data.
SUPERVISED LEARNING
SUPERVISED LEARNING ALGORITHMS
1. Linear Regression: Used for regression task, it models the
relationship between dependent variable and one or more
independent variables.
Example-
1. Linear Regression
2. Regression Trees
3. Non-Linear Regression
4. Bayesian Linear Regression
CLASSIFICATION
Classification algorithms are used when the
output variable is categorical.
Example
1. Spam Filtering,
2. Random Forest
3. Decision Trees
4. Logistic Regression
5. Support vector Machines
DIFFERENCE BETWEEN
CLASSIFICATION AND REGRESSION
Computational
Simpler method Computationally complex
Complexity
Model We can test our model. We can not test our model.
1. Accuracy
Accuracy is used to measure the performance of the
model. It is the ratio of Total correct instances to the total
instances.
Accuracy= TP+TN
TP+TN+FP+FN
2. Precision
Precision is a measure of how accurate a model’s positive
predictions are. It is defined as the ratio of true positive
predictions to the total number of positive predictions
made by the model.
Precision= TP
TP+FP
3. Recall
It is the ratio of the number of true positive (TP)
instances to the sum of true positive and false
negative (FN) instances.
Out of all positive classes how our model
predicted correctly
Recall= TP
TP+FN
Recall must be high as possible.
4. F1-Score
F1-score is used to evaluate the overall
performance of a classification model. It is the
harmonic mean of precision and recall,
F1-Score= 2⋅Precision⋅Recall
Precision+Recall
AUC-ROC CURVE
The AUC-ROC curve, or Area Under the
Receiver Operating Characteristic curve, is a
graphical representation of the performance
of a binary classification model at various
classification thresholds.
It is commonly used in machine learning to
assess the ability of a model to distinguish
between two classes, typically the positive
class (e.g., presence of a disease) and the
negative class (e.g., absence of a disease).
RECEIVER OPERATING CHARACTERISTICS
(ROC) CURVE
ROC stands for Receiver Operating
Characteristics, and the ROC curve is the
graphical representation of the effectiveness of
the binary classification model. It plots the true
positive rate (TPR) vs the false positive rate
(FPR) at different classification thresholds.
Area Under Curve (AUC) Curve:
AUC stands for the Area Under the Curve, and
the AUC curve represents the area under the
ROC curve. It measures the overall performance
of the binary classification model. As both TPR
and FPR range between 0 to 1, So, the area will
always lie between 0 and 1, and
A greater value of AUC denotes better model
performance.
LOG LOSS
Logarithmic Loss, commonly known as Log
Loss or Cross-Entropy Loss, is a crucial metric
in machine learning, particularly in
classification problems. It quantifies the
performance of a classification model by
measuring the difference between predicted
probabilities and actual outcomes.
CROSS VALIDATION
Cross validation is a technique used in
machine learning to evaluate the performance
of a model on unseen data. It involves
dividing the available data into multiple folds
or subsets, using one of these folds as a
validation set, and training the model on the
remaining folds. This process is repeated
multiple times, each time using a different
fold as the validation set.
Finally, the results from each validation step
are averaged to produce a more robust
estimate of the model’s performance.
The main purpose of cross validation is to
prevent overfitting, which occurs when a
model is trained too well on the training data
and performs poorly on new, unseen data.
Data Efficient: Cross validation allows the use of all the available data
for both training and validation, making it a more data-efficient
method compared to traditional validation techniques.
Disadvantages:
Computationally Expensive: Cross validation can be computationally
expensive, especially when the number of folds is large or when the
model is complex and requires a long time to train.
Time-Consuming: Cross validation can be time-consuming, especially
when there are many hyperparameters to tune or when multiple
models need to be compared.