0% found this document useful (0 votes)

9 views54 pages

Unit IV

The document discusses various types of data used in machine learning, including training, validation, and test data, and emphasizes the importance of data quality, quantity, and diversity. It outlines different validation techniques such as hold-out, k-fold, and stratified k-fold cross-validation, along with their advantages and disadvantages. Additionally, it introduces bootstrapping as a resampling method for improving model accuracy and mentions Support Vector Machines as a supervised learning algorithm.

Uploaded by

alphawin88

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views54 pages

Unit IV

Uploaded by

alphawin88

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 54

NEERAJ KHARYA, DEPARTMENT OF

COMPUTER APPLICATIONS, BIT DURG 1

Data and DataSet
 Training data. This type of data builds up the machine learning algorithm.
The data scientist feeds the algorithm input data, which corresponds to an
expected output. The model evaluates the data repeatedly to learn more
about the data’s behavior and then adjusts itself to serve its intended
purpose.

 Validation data. During training, validation data infuses new data into the
model that it hasn’t evaluated before. Validation data provides the first test
against unseen data, allowing data scientists to evaluate how well the model
makes predictions based on the new data. Not all data scientists use
validation data, but it can provide some helpful information to optimize
hyper parameters, which influence how the model assesses data.

 Test data. After the model is built, testing data once again validates that it
can make accurate predictions. If training and validation data include labels
to monitor performance metrics of the model, the testing data should be
unlabeled. Test data provides a final, real-world check of an unseen dataset
to confirm that the ML algorithm was trained effectively.
NEERAJ KHARYA, DEPARTMENT OF
COMPUTER APPLICATIONS, BIT DURG 2
Data and DataSet

There is some semantic ambiguity between validation data and testing data. Some
organizations call testing datasets “validation datasets.” Ultimately, if there are three
datasets to tune and check ML algorithms, validation data typically helps tune the
algorithm and testing data provides the final assessment.

Random noise (i.e. data points that make it difficult to see a

pattern), low frequency of a certain categorical variable, low
frequency of the target category (if target variable is
categorical) and incorrect numeric values etc. are just some of the ways
data can mess up a model NEERAJ KHARYA, DEPARTMENT OF
COMPUTER APPLICATIONS, BIT DURG 3
Data and DataSet
 An ML algorithm is only as good as its training data — as the saying goes,
“garbage in, garbage out." Effective ML training data is built upon three key
components:
 Quantity. A robust ML algorithm needs lots of training data to
properly learn how to interact with users and behave within the
application
 Quality. Volume alone will only take your ML algorithm so far. The
quality of the data is just as important. This means collecting real-
world data, such as voice utterances, images, videos, documents,
sounds and other forms of input on which your algorithm might rely.
 Diversity. The third piece of the pie is diversity of data, which is
essential to eliminate the dreaded problem of AI bias, where the
application works better for a certain segment of the population
than others. With AI bias, the ML algorithm delivers results that can
be seen as prejudiced against a certain gender, race, age group,
language or culture, depending on how it manifests. a
NEERAJ KHARYA, DEPARTMENT OF
COMPUTER APPLICATIONS, BIT DURG 4
Validation
 Developing the machine learning model is not enough to rely
on its predictions, we need to check the accuracy and validate
the same to ensure the precision of results given by the model
and make it usable in real-life applications.
 Choosing the right validation method is also especially
important to ensure the accuracy and biases of the validation
process.
 In Machine Learning, several models are put to use to make
algorithms work and support the programming of artificial
intelligence.
 Machine learning models are not always stable and we have to
evaluate the stability of the machine learning model. That is
where Cross Validation comes into the picture.

NEERAJ KHARYA, DEPARTMENT OF

COMPUTER APPLICATIONS, BIT DURG 5
Validation

NEERAJ KHARYA, DEPARTMENT OF

COMPUTER APPLICATIONS, BIT DURG 6
Validation

NEERAJ KHARYA, DEPARTMENT OF

COMPUTER APPLICATIONS, BIT DURG 7
Types of Validation Technique
 Validation techniques in machine learning are used to get the error
rate of the ML model, which can be considered as close to the true
error rate of the population.
 If the data volume is large enough to be representative of the
population, you may not need the validation techniques. However, in
real-world scenarios, we work with samples of data that may not be
a true representative of the population. This is where validation
techniques come into the picture.
 Different validation techniques:
◦ Non Exhaustive Technique
 Hold-out
 K-fold cross-validation
 Stratified k-fold cross validation
◦ Exhaustive Technique
 LOOCV
 Leave P out CV
 Bootstrapping

NEERAJ KHARYA, DEPARTMENT OF

COMPUTER APPLICATIONS, BIT DURG 8
Hold-out Validation Method
 In this method, we randomly divide our data into two: Training
and Test/Validation set i.e. a hold-out set. We then train the
model on the training dataset and evaluate the model on the
Test/Validation dataset.
 The model evaluation techniques used on the validation
dataset to compute the error depends on the kind of problem
we are working with such as MSE being used for Regression
problems while various metrics providing with the
misclassification rate helping in finding the error for
classification problems.
 Typically the training dataset is bigger than the hold-out
dataset. Typical ratios used for splitting the data set include
60:40, 80:20 etc
 This method is only used when we only have one model to
evaluate and no hyper-parameters to tune.
NEERAJ KHARYA, DEPARTMENT OF
COMPUTER APPLICATIONS, BIT DURG 9
Hold-out Validation Method

•The limitation of such a method is that the error found in the test
dataset can highly depend on the observations included in the
train and test dataset.
• Also if the train or test dataset are not able to represent the
actual complete data then the results from the test sets can be
skewed.

NEERAJ KHARYA, DEPARTMENT OF

COMPUTER APPLICATIONS, BIT DURG 10
Hold-out Validation Method
 This method is not effective for comparing multiple models and
tuning their hyper parameters which leads us to another very
popular form of the hold-out method which includes the
splitting of data into not two, but three separate sets.

NEERAJ KHARYA, DEPARTMENT OF

COMPUTER APPLICATIONS, BIT DURG 11
Hold-out Validation Method
 When we tune the hyper-parameters based on the validation
set, we end up slightly over fitting our model based on the
validation set.
 The accuracy we receive from the validation set is not
considered final and another hold-out dataset which is the test
dataset is used to evaluate the final selected model and the
error found here is considered as the generalization error.
 The Holdout method, is not enough and we need a more
advanced validation technique that can be more unbiased and
can save the model from over fitting and such a technique is
k-fold cross validation

NEERAJ KHARYA, DEPARTMENT OF

COMPUTER APPLICATIONS, BIT DURG 12
k Fold Validation Method
 In this instance, the dataset is broken into k number of folds
wherein one fold will be used as the test set and the rest will
be used as the training dataset and this will be
repeated n number of times as specified by the user.
 In a regression the average of the results (e.g. RMSE, R-Squared,
etc.) will be used as the final result.
 In k-fold cross-validation, we make an assumption that all
observations in the dataset are nicely distributed in a way that
the data are not biased.
 We randomly divide the dataset into k equal sized parts where
we leave out a part k and fit the model on the other combined
k-1 parts.

NEERAJ KHARYA, DEPARTMENT OF

COMPUTER APPLICATIONS, BIT DURG 13
k Fold Validation Method
 The model is evaluated on the left over k part and this process
is repeated k times so that each part is used as testing set. The
results from each fold are then combined and averaged to
come up with the final error.
 The advantage is that entire data is used for training and
testing. The error rate of the model is average of the error rate
of each iteration. This technique can also be called a form the
repeated hold-out method.
 The procedure has a single parameter called k that refers to
the number of groups that a given data sample is to be split
into. As such, the procedure is often called k-fold cross-
validation.

NEERAJ KHARYA, DEPARTMENT OF

COMPUTER APPLICATIONS, BIT DURG 14
k Fold Validation Method

NEERAJ KHARYA, DEPARTMENT OF

COMPUTER APPLICATIONS, BIT DURG 15
k fold Validation Method
Advantage
 Checking Model Generalization: Cross-validation gives the idea
about how the model will generalize to an unknown dataset
Checking Model Performance: Cross-validation helps to
determine a more accurate estimate of model prediction
performance
 Checking Overfitting: Cross-validation can be used for checking
whether the model has been over fitted.
 Hyper parameter tuning: Cross-validation can be used for
selection of the best set of hyperparameters.
Disadvantage
 Higher Training Time: with cross-validation, we need to train the
model on multiple training sets.
 Expensive Computation: Cross-validation is computationally very
expensive as we need to train on multiple training sets.
NEERAJ KHARYA, DEPARTMENT OF
COMPUTER APPLICATIONS, BIT DURG 16
Random Vs Stratified Sampling
 Suppose you want to take a survey and decided to call 1000
people from a particular state, If you pick either 1000 male
completely or 1000 female completely or 900 female and 100
male (randomly) to ask their opinion on a particular
product.Then based on these 1000 opinion you can’t decide
the opinion of that entire state on your product. This is
random sampling.
 But in Stratified Sampling, Let the population for that state be
52% male and 48% female, Then for choosing 1000 people from
that state if you pick 520 male ( 52% of 1000 ) and 480 female (
48% for 1000 ) i.e 520 male + 480 female (Total=1000 people)
to ask their opinion. Then these groups of people represent the
entire state.This is called as Stratified Sampling.


NEERAJ KHARYA, DEPARTMENT OF

COMPUTER APPLICATIONS, BIT DURG 17
Stratified k fold Validation Method
 Stratified k-fold cross-validation is same as just k-fold cross-
validation, But in Stratified k-fold cross-validation, it does
stratified sampling instead of random sampling.

NEERAJ KHARYA, DEPARTMENT OF

COMPUTER APPLICATIONS, BIT DURG 18
Stratified k fold Validation Method
 Cross-validation implemented using stratified sampling ensures
that the proportion of the feature of interest is the same
across the original data, training set and the test set.
 This ensures that no value is over/under-represented in the
training and test sets, which gives a more accurate estimate of
performance/error.

NEERAJ KHARYA, DEPARTMENT OF

COMPUTER APPLICATIONS, BIT DURG 19
Leave-P-Out cross validation / LoOCV
 When using this exhaustive method, we take p number of
points out from the total number of data points in the
dataset(say n).
 While training the model we train it on these (n – p) data
points and test the model on p data points.
 We repeat this process for all the possible combinations of p
from the original dataset.
 Then to get the final accuracy, we average the accuracies from
all these
 LoOCV (Leave one Out CV) is a simple variation of Leave-P-
Out cross validation and the value of p is set as one. This
makes the method much less exhaustive as now for n data
points and p = 1, we have n number of combinations. iterations.

NEERAJ KHARYA, DEPARTMENT OF

COMPUTER APPLICATIONS, BIT DURG 20
Leave-P-Out cross validation / LoOCV
 LOOCV is the case of Cross-Validation where just a single
observation is held out for validation.
 The model is evaluated for every held out observation. The
final result is then calculated by taking the mean of all the
individual evaluations.
 There are two problems with LOOCV.
1. It can be computationally expensive to use LOOCV,
particularly if the data size is large and also if the model
takes substantial time to complete the learning just once.
This is because we are iteratively fitting the model on the
whole training set.
2. The other problem with LOOCV is that it can be subject
to high variance or over fitting as we are feeding the model
almost all the training data to learn and just a single
observation to evaluate.
NEERAJ KHARYA, DEPARTMENT OF
COMPUTER APPLICATIONS, BIT DURG 21
Leave-P-Out cross validation / LoOCV

NEERAJ KHARYA, DEPARTMENT OF

COMPUTER APPLICATIONS, BIT DURG 22
Bootstrapping
 A statistical concept, Bootstrapping is a resampling method used to
stimulate samples out of a data set using the replacement technique.
The process of bootstrapping allows one to infer data about the
population, derive standard errors, and ensure that data is tested
efficiently.
 The Bootstrap Sampling Method is a very simple concept and is a
building block for some of the more advanced machine learning
algorithms like AdaBoost and GBoost.
 Technically speaking, the bootstrap sampling method is a resampling
method that uses random sampling with replacement.
 This technique involves repeatedly sampling a dataset with random
replacement. A statistical test that falls under the category of
resampling methods, this method ensures that the statistics
evaluated are accurate and unbiased as much as possible.

NEERAJ KHARYA, DEPARTMENT OF

COMPUTER APPLICATIONS, BIT DURG 23
Bootstrapping
 Invented by Bradley Efron, the bootstrapping method is
known to generate new samples or resamples out of the
already existing samples in order to measure the accuracy of a
sample statistic.
 Using the replacement technique, the method creates new
hypothetical samples that help in the testing of an estimated
value.
 For the samples that are chosen in the representative sample
size, they are referred to as the ‘Bootstrapped samples’ or the
bootstrap sample size. On the other hand, the samples that are
not chosen are referred to as the ‘Out-of-the-bag’ samples that
serve as the testing dataset.

NEERAJ KHARYA, DEPARTMENT OF

COMPUTER APPLICATIONS, BIT DURG 24
Bootstrapping
 Following are the steps that are involved in the Bootstrapping
method -
1. Randomly choose a sample size.
2. Pick an observation from the training dataset in random order.
3. Combine this observation with the sample chosen earlier.

NEERAJ KHARYA, DEPARTMENT OF

COMPUTER APPLICATIONS, BIT DURG 25
Bootstrapping

NEERAJ KHARYA, DEPARTMENT OF

COMPUTER APPLICATIONS, BIT DURG 26
Support Vector Machine
 Support Vector Machine” (SVM) is a supervised machine
learning algorithm that can be used for both classification or
regression challenges.
 In the SVM algorithm, we plot each data item as a point in n-
dimensional space (where n is a number of features you have)
with the value of each feature being the value of a particular
coordinate. Then, we perform classification by finding the
hyper-plane that differentiates the two classes very well.
 The main goal of SVM is to divide the datasets into classes to
find a maximum marginal hyperplane (MMH) and it can be
done in the following two steps −
1. First, SVM will generate hyperplanes iteratively that segregates
the classes in best way.
2. Then, it will choose the hyperplane that separates the classes
correctly.
NEERAJ KHARYA, DEPARTMENT OF
COMPUTER APPLICATIONS, BIT DURG 27
Support Vector Machine

NEERAJ KHARYA, DEPARTMENT OF

COMPUTER APPLICATIONS, BIT DURG 28
Support Vector Machine
 The followings are important concepts in SVM −
 Support Vectors − Datapoints that are closest to the
hyperplane is called support vectors. Separating line will be
defined with the help of these data points.
 Hyperplane − As we can see in the above diagram, it is a
decision plane or space which is divided between a set of
objects having different classes.
 Margin − It may be defined as the gap between two lines on
the closet data points of different classes. It can be calculated
as the perpendicular distance from the line to the support
vectors. Large margin is considered as a good margin and small
margin is considered as a bad margin.

NEERAJ KHARYA, DEPARTMENT OF

COMPUTER APPLICATIONS, BIT DURG 29
Support Vector Machine
 SVM algorithm is implemented with kernel that transforms an
input data space into the required form. SVM uses a technique
called the kernel trick in which kernel takes a low dimensional
input space and transforms it into a higher dimensional space.
 The following are some of the types of kernels used by SVM.
1. Linear Kernel : It can be used as a dot product between any
two observations.
2. Polynomial Kernel : It is more generalized form of linear
kernel and distinguish curved or nonlinear input space.
3. Radial Basis Function (RBF) Kernel : RBF kernel, mostly
used in SVM classification, maps input space in indefinite
dimensional space.

NEERAJ KHARYA, DEPARTMENT OF

COMPUTER APPLICATIONS, BIT DURG 30
Support Vector Machine
 Pros:
◦ It works really well with a clear margin of separation
◦ It is effective in high dimensional spaces.
◦ It is effective in cases where the number of dimensions is
greater than the number of samples.
◦ It uses a subset of training points in the decision function
(called support vectors), so it is also memory efficient.
 Cons:
◦ It doesn’t perform well when we have large data set because
the required training time is higher
◦ It also doesn’t perform very well, when the data set has
more noise i.e. target classes are overlapping
◦ SVM doesn’t directly provide probability estimates, these are
calculated using an expensive five-fold cross-validation. It is
included in the related SVC method of Python scikit-learn
library.
NEERAJ KHARYA, DEPARTMENT OF
COMPUTER APPLICATIONS, BIT DURG 31
Confusion Matrix
 A confusion matrix is a table that is often used to describe
the performance of a classification model (or "classifier")
on a set of test data for which the true values are known.
 Confusion Matrix is a performance measurement for machine
learning classification.
 Classification accuracy alone can be misleading if you have an
unequal number of observations in each class or if you have
more than two classes in your dataset.
 Calculating a confusion matrix can give you a better idea of
what your classification model is getting right and what types
of errors it is making.

NEERAJ KHARYA, DEPARTMENT OF

COMPUTER APPLICATIONS, BIT DURG 32
Confusion Matrix
 The following are the most basic terms which are not rating,
just a whole number:
◦ True Positives (TP): These are cases in which we predicted yes (they
have the disease), and they do have the disease.
◦ True Negatives (TN): We predicted no, and they don't have the
disease.
◦ False Positives (FP): We predicted yes,
but they don't actually have the disease.
(Also known as a "Type I error.")
◦ False Negatives (FN): We predicted no,
but they actually do have the disease.
(Also known as a "Type II error.")

NEERAJ KHARYA, DEPARTMENT OF

COMPUTER APPLICATIONS, BIT DURG 33
Confusion Matrix
 Let’s understand TP, FP, FN, TN in terms of pregnancy analogy.

NEERAJ KHARYA, DEPARTMENT OF

COMPUTER APPLICATIONS, BIT DURG 34
Metrics derived from Confusion Matrix
 Accuracy: The accuracy is used to find the portion of correctly classified
values. It tells us how often our classifier is right. It is the sum of all true
values divided by total values.

 Precision: Precision is used to calculate the model's ability to classify

positive values correctly. It is the true positives divided by the total number
of predicted positive values.

 Recall/Sensitivity/True Positive Rate (TPR) : It is used to calculate the

model's ability to predict positive values. "How often does the model
predict the correct positive values?". It is the true positives divided by the
total number of actual positive values. Sensitivity tells us what proportion
of the positive class got correctly classified.

NEERAJ KHARYA, DEPARTMENT OF

COMPUTER APPLICATIONS, BIT DURG 35
Metrics derived from Confusion Matrix

 False Negative Rate (FNR) : False Negative Rate (FNR) tells us what
proportion of the positive class got incorrectly classified by the classifier.

 A higher TPR and a lower FNR is desirable since we want to correctly

classify the positive class
 Specificity / True Negative Rate (TNR) : Specificity tells us what proportion
of the negative class got correctly classified.

 False Positive Rate (FPR): FPR tells us what proportion of the negative class
got incorrectly classified by the classifier.
 A higher TNR and a lower FPR is desirable since we want to correctly
classify the negative class.

NEERAJ KHARYA, DEPARTMENT OF

COMPUTER APPLICATIONS, BIT DURG 36
Metrics derived from Confusion Matrix
 F1-Score: It is the harmonic mean of Recall and Precision. It is useful when you
need to take both Precision and Recall into account.

 Compute all metrics for the confusion matrix made for a classifier
that classifies people based on whether they speak English or
Spanish.

◦ True Positives (TP) = 86

◦ True Negatives (TN) = 79
◦ False Positives (FP) = 12
◦ False Negatives (FN) = 10

NEERAJ KHARYA, DEPARTMENT OF

COMPUTER APPLICATIONS, BIT DURG 37
Confusion Matrix for Multi-Class Classification
 confusion matrix for a multiclass problem where we have to predict
whether a person loves Facebook, Instagram or Snapchat. The confusion
matrix would be a 3 x 3 matrix like this:

NEERAJ KHARYA, DEPARTMENT OF

COMPUTER APPLICATIONS, BIT DURG 38
Metrics derived from Confusion Matrix

https://www.inabia.com/learning/quiz/confusion-matrix-quiz/

NEERAJ KHARYA, DEPARTMENT OF

COMPUTER APPLICATIONS, BIT DURG 39
AUC-ROC Curve
 The Receiver Operator Characteristic (ROC) curve is an
evaluation metric for binary classification problems.
 It is a probability curve that plots the TPR against FPR at
various threshold values and essentially separates the
‘signal’ from the ‘noise’.
 The Area Under the Curve (AUC) is the measure of the
ability of a classifier to distinguish between classes and is used
as a summary of the ROC curve.
 An excellent model has AUC near to the 1 which means it has
a good measure of separability. A poor model has an AUC near
0 which means it has the worst measure of separability.
 The higher the AUC, the better the performance of the model
at distinguishing between the positive and negative classes.

NEERAJ KHARYA, DEPARTMENT OF

COMPUTER APPLICATIONS, BIT DURG 40
AUC-ROC Curve
 The ROC curve is plotted with TPR against the FPR where TPR
is on the y-axis and FPR is on the x-axis.

 In a ROC curve, a higher X-axis value indicates a higher

number of False positives than True negatives. While a higher Y-
axis value indicates a higher number of True positives than
False negatives. So, the choice of the threshold depends on the
ability to balance between False positives and False negatives.
NEERAJ KHARYA, DEPARTMENT OF
COMPUTER APPLICATIONS, BIT DURG 41
AUC-ROC Curve
 When AUC = 1, then the classifier is able to perfectly distinguish
between all the Positive and the Negative class points correctly.

 Red distribution curve is of the positive class (patients with disease) and the green distribution
curve is of the negative class(patients with no disease).

 If, however, the AUC had been 0, then the classifier would be
predicting all Negatives as Positives, and all Positives as Negatives.
 This is an ideal situation. When two curves don’t overlap at all means
model has an ideal measure of separability. It is perfectly able to
distinguish between positive class and negative class.
NEERAJ KHARYA, DEPARTMENT OF
COMPUTER APPLICATIONS, BIT DURG 42
AUC-ROC Curve
 When 0.5<AUC<1, there is a high chance that the classifier will be
able to distinguish the positive class values from the negative class
values. This is so because the classifier is able to detect more
numbers of True positives and True negatives than False negatives and
False positives.

 When two distributions overlap, we introduce type 1 and type 2

errors. Depending upon the threshold, we can minimize or maximize
them. When AUC is 0.7, it means there is a 70% chance that the
model will be able to distinguish between positive class and negative
class.
NEERAJ KHARYA, DEPARTMENT OF
COMPUTER APPLICATIONS, BIT DURG 43
AUC-ROC Curve
 When AUC=0.5, then the classifier is not able to distinguish
between Positive and Negative class points. Meaning either the
classifier is predicting random class or constant class for all the
data points.

 This is the worst situation. When AUC is approximately 0.5, the

model has no discrimination capacity to distinguish between
positive class and negative class.
 So, the higher the AUC value for a classifier, the better its
ability to distinguish between positive and negative classes.
NEERAJ KHARYA, DEPARTMENT OF
COMPUTER APPLICATIONS, BIT DURG 44
AUC-ROC Curve
 When AUC is approximately 0, the model is actually
reciprocating the classes. It means the model is predicting a
negative class as a positive class and vice versa.

 Sensitivity and Specificity are inversely proportional to each

other. So when we increase Sensitivity, Specificity decreases,
and vice versa.

NEERAJ KHARYA, DEPARTMENT OF

COMPUTER APPLICATIONS, BIT DURG 45
Naïve Bayesian Classifier
 The different naive Bayes classifiers differ mainly by the
assumptions they make regarding the distribution of P(xi | y)..
 Gaussian Naive Bayes classifier : continuous values
associated with each feature are assumed to be distributed
according to a Gaussian distribution.
 Multinomial Naive Bayes: Feature vectors represent the
frequencies with which certain events have been generated by
a multinomial distribution. This is the event model typically
used for document classification.
 Bernoulli Naive Bayes: In the multivariate Bernoulli event
model, features are independent booleans (binary variables)
describing inputs.

NEERAJ KHARYA, DEPARTMENT OF

COMPUTER APPLICATIONS, BIT DURG 46
Advantages : Naïve Bayesian Classifier
 They require a small amount of training data to estimate the
necessary parameters.
 Naive Bayes learners and classifiers can be extremely fast
compared to more sophisticated methods.
 Naive Bayes has very low computation cost.
 It can efficiently work on a large dataset.
 It performs well in case of discrete response variable compared
to the continuous variable.
 It can be used with multiple class prediction problems.
 It also performs well in the case of text analytics problems.

NEERAJ KHARYA, DEPARTMENT OF

COMPUTER APPLICATIONS, BIT DURG 47
Disadvantages : Naïve Bayesian Classifier
 The assumption of independent features. In practice, it is
almost impossible that model will get a set of predictors which
are entirely independent.
 If there is no training tuple of a particular class, this causes zero
posterior probability. In this case, the model is unable to make
predictions. This problem is known as Zero
Probability/Frequency Problem.

NEERAJ KHARYA, DEPARTMENT OF

COMPUTER APPLICATIONS, BIT DURG 48
Applications : Naïve Bayesian Classifier
 Real time Prediction: Naive Bayes is an eager learning classifier
and it is sure fast. Thus, it could be used for making predictions in
real time.
 Multi class Prediction: This algorithm is also well known for multi
class prediction feature. Here we can predict the probability of
multiple classes of target variable.
 Text classification/ Spam Filtering/ Sentiment Analysis: Naive
Bayes classifiers mostly used in text classification (due to better
result in multi class problems and independence rule) have higher
success rate as compared to other algorithms
 Recommendation System: Naive Bayes Classifier
and Collaborative Filtering together builds a Recommendation
System that uses machine learning and data mining techniques to
filter unseen information and predict whether a user would like a
given resource or not
NEERAJ KHARYA, DEPARTMENT OF
COMPUTER APPLICATIONS, BIT DURG 49
Discriminative Learning
 Discriminative Model makes predictions on the unseen data
based on conditional probability and can be used either for
classification or regression problem statements
 Generative models are useful for unsupervised learning
 The discriminative model is used particularly for supervised
machine learning. Also called a conditional model, it learns the
boundaries between classes or labels in a dataset.
 Examples of Discriminative Models
• Logistic regression
• Scalar Vector Machine (SVMs)
• Traditional neural networks
• Nearest neighbor
• Conditional Random Fields (CRFs)
• Decision Trees and Random Forest
NEERAJ KHARYA, DEPARTMENT OF
COMPUTER APPLICATIONS, BIT DURG 50
Discriminative Learning
 Generative model focuses on the distribution of a dataset to
return a probability for a given example.
 Generative models are useful for unsupervised learning
 Since these types of models often rely on the Bayes theorem
to find the joint probability, so generative models can tackle a
more complex task than analogous discriminative models.
 These models are used in unsupervised machine learning as a
means to perform tasks such as
• Probability and Likelihood estimation,
• Modeling data points,
• To describe the phenomenon in data,
• To distinguish between classes based on these probabilities

NEERAJ KHARYA, DEPARTMENT OF

COMPUTER APPLICATIONS, BIT DURG 51
Generative Learning
 Some Examples of Generative Models
◦ Naïve Bayes
◦ Bayesian networks
◦ Markov random fields
◦ Hidden Markov Models (HMMs)
◦ Latent Dirichlet Allocation (LDA)
◦ Generative Adversarial Networks (GANs)
◦ Autoregressive Model
 Major drawback – If there is a presence of outliers in
the dataset, then it affects these types of models to a
significant extent.

NEERAJ KHARYA, DEPARTMENT OF

COMPUTER APPLICATIONS, BIT DURG 52
Generative & Discriminative Learning

NEERAJ KHARYA, DEPARTMENT OF

COMPUTER APPLICATIONS, BIT DURG 53
Generative & Discriminative Learning
 Performance
 Generative models need fewer data to train compared with
discriminative models since generative models are more biased as
they make stronger assumptions i.e, assumption of conditional
independence.
 Based on Missing Data
 Generative models can work with these missing data, while on the
contrary discriminative models can’t. This is because, in generative
models, still we can estimate the posterior by marginalizing over the
unseen variables. However, for discriminative models, we usually
require all the features X to be observed.
 Based on Accuracy Score
 If the assumption of conditional independence violates, then at that
time generative models are less accurate than discriminative models.
NEERAJ KHARYA, DEPARTMENT OF
COMPUTER APPLICATIONS, BIT DURG 54

BSMA 2022 Curriculum
100% (1)
BSMA 2022 Curriculum
2 pages
Applications of Social Media and Social Network Analysis - Lecture Notes in Social Networks PDF
100% (1)
Applications of Social Media and Social Network Analysis - Lecture Notes in Social Networks PDF
247 pages
AASHTO M300 Inorganic Zinc-Rich Primer
100% (2)
AASHTO M300 Inorganic Zinc-Rich Primer
8 pages
Sartorius PR5510 X4
No ratings yet
Sartorius PR5510 X4
4 pages
SAP PM Configuration 3
100% (1)
SAP PM Configuration 3
30 pages
Architecture and Sociology
No ratings yet
Architecture and Sociology
11 pages
14 AAU - Level 6 - Test - Challenge - Unit 4
100% (13)
14 AAU - Level 6 - Test - Challenge - Unit 4
5 pages
Structure and Written Expression: Section Two
100% (1)
Structure and Written Expression: Section Two
26 pages
Motivational Cognitive Behavioural Therapy Distinctive Features 1st Edition Optimized EPUB Download
100% (19)
Motivational Cognitive Behavioural Therapy Distinctive Features 1st Edition Optimized EPUB Download
16 pages
LT-LT-: Satellite Tracer
No ratings yet
LT-LT-: Satellite Tracer
70 pages
ESTANERO - April 12 LP Mam Duenas
No ratings yet
ESTANERO - April 12 LP Mam Duenas
6 pages
Final
No ratings yet
Final
145 pages
Module3-Ensemble Learning
No ratings yet
Module3-Ensemble Learning
107 pages
GAGEtrak Pro 8 Intro Guide
No ratings yet
GAGEtrak Pro 8 Intro Guide
119 pages
ML Mod 5
No ratings yet
ML Mod 5
58 pages
Model Generalization
No ratings yet
Model Generalization
117 pages
ML 5
No ratings yet
ML 5
26 pages
ML - Module 5
No ratings yet
ML - Module 5
80 pages
Machine Learning
No ratings yet
Machine Learning
63 pages
Lect 03 Evaluation Part 2
No ratings yet
Lect 03 Evaluation Part 2
40 pages
ML Pyq Ans
No ratings yet
ML Pyq Ans
37 pages
Cross Validation Techniques
No ratings yet
Cross Validation Techniques
27 pages
EDA Module 2
No ratings yet
EDA Module 2
28 pages
Understanding Datasets Features Selection Train Test Validation Sets L12
No ratings yet
Understanding Datasets Features Selection Train Test Validation Sets L12
25 pages
CH 05 Optimization Technique
No ratings yet
CH 05 Optimization Technique
58 pages
Cofusion Matrix Cross - Validation
No ratings yet
Cofusion Matrix Cross - Validation
34 pages
Lec 10
No ratings yet
Lec 10
36 pages
Unit 4
No ratings yet
Unit 4
34 pages
CSL0777 L08
No ratings yet
CSL0777 L08
29 pages
Unit IV
No ratings yet
Unit IV
51 pages
5 DL
No ratings yet
5 DL
33 pages
6 Model Evalution
No ratings yet
6 Model Evalution
16 pages
Practical Issues
No ratings yet
Practical Issues
30 pages
IML 8 - Grid Search and Cross Validation
No ratings yet
IML 8 - Grid Search and Cross Validation
22 pages
Statistical Learning: Master in Data Science For Management
No ratings yet
Statistical Learning: Master in Data Science For Management
47 pages
Module 6 - ML
No ratings yet
Module 6 - ML
30 pages
ML Unit4 Notes
No ratings yet
ML Unit4 Notes
20 pages
Whiplash Project
No ratings yet
Whiplash Project
11 pages
T1 ML QB Soln
No ratings yet
T1 ML QB Soln
23 pages
ML-4 Cross Validation in Machine Learning
No ratings yet
ML-4 Cross Validation in Machine Learning
13 pages
ML Unit4 Notes
No ratings yet
ML Unit4 Notes
20 pages
List Steps in Data Preparation. Give Short Description of Each Step
No ratings yet
List Steps in Data Preparation. Give Short Description of Each Step
20 pages
Unit 3
No ratings yet
Unit 3
13 pages
First Cut Draft LS1.4
No ratings yet
First Cut Draft LS1.4
11 pages
ML Nithish
No ratings yet
ML Nithish
16 pages
ML PPT Lect - 4
No ratings yet
ML PPT Lect - 4
16 pages
P-2.1.2 Cross Validation and Regularization
No ratings yet
P-2.1.2 Cross Validation and Regularization
37 pages
ch-3 FML
No ratings yet
ch-3 FML
14 pages
DrWeb Crash
No ratings yet
DrWeb Crash
12 pages
ML.1Lecture.2 (Old)
No ratings yet
ML.1Lecture.2 (Old)
23 pages
Lecture-5-HCL-DSE - Sumita Narang-2
No ratings yet
Lecture-5-HCL-DSE - Sumita Narang-2
40 pages
Chapter-3-Common Issues in Machine Learning
No ratings yet
Chapter-3-Common Issues in Machine Learning
20 pages
Lec 16
No ratings yet
Lec 16
18 pages
Unit 5 New
No ratings yet
Unit 5 New
9 pages
Unit V
No ratings yet
Unit V
12 pages
IDML Presentation
No ratings yet
IDML Presentation
12 pages
Unit 2
No ratings yet
Unit 2
28 pages
CSO504 Machine Learning: Evaluation and Error Analysis Validation and Regularization Koustav Rudra 22/08/2022
No ratings yet
CSO504 Machine Learning: Evaluation and Error Analysis Validation and Regularization Koustav Rudra 22/08/2022
28 pages
Data Mining - UOG (HH) - Final - F23-1
No ratings yet
Data Mining - UOG (HH) - Final - F23-1
10 pages
Xiiaiuniticapstone Projectpartii
No ratings yet
Xiiaiuniticapstone Projectpartii
11 pages
Lecture Note #6 - PEC-CS701E
No ratings yet
Lecture Note #6 - PEC-CS701E
11 pages
Cross Validation
No ratings yet
Cross Validation
5 pages
Evaluation of Golden Proportions Among Three Ethnic Groups of Indian Females
No ratings yet
Evaluation of Golden Proportions Among Three Ethnic Groups of Indian Females
7 pages
Chapter 5
No ratings yet
Chapter 5
3 pages
4 Model Order
No ratings yet
4 Model Order
10 pages
Chapter2 1 33
No ratings yet
Chapter2 1 33
18 pages
ML 5
No ratings yet
ML 5
14 pages
Nursing Care Assignment
No ratings yet
Nursing Care Assignment
8 pages
Mosi Debat
No ratings yet
Mosi Debat
8 pages
Visual COBOL Question and Answers PDF
No ratings yet
Visual COBOL Question and Answers PDF
33 pages
Design and Optimization of Spur Gear: Second Review
No ratings yet
Design and Optimization of Spur Gear: Second Review
44 pages
Monday Tuesday Wednesday Thursday Friday
No ratings yet
Monday Tuesday Wednesday Thursday Friday
8 pages
ML Module Iii
No ratings yet
ML Module Iii
12 pages
Permodelan Proses Bisnis Untuk Procurement Suku Cadang Impor (Studi Pada PT Berkah Industri Mesin Angkat Surabaya)
No ratings yet
Permodelan Proses Bisnis Untuk Procurement Suku Cadang Impor (Studi Pada PT Berkah Industri Mesin Angkat Surabaya)
10 pages
Model Validation
No ratings yet
Model Validation
5 pages
Cross Validation in ML
No ratings yet
Cross Validation in ML
5 pages
Aa BPG 375001
No ratings yet
Aa BPG 375001
36 pages
Offline Schedule-Siioc2023 Version2
No ratings yet
Offline Schedule-Siioc2023 Version2
5 pages
Contourline / Pureline Warming Drawer: 8 Shown Above: Esw 6114
No ratings yet
Contourline / Pureline Warming Drawer: 8 Shown Above: Esw 6114
5 pages
Shamjith UiUx Design Resume
No ratings yet
Shamjith UiUx Design Resume
1 page
Introduction To Aerospace Engineering
No ratings yet
Introduction To Aerospace Engineering
5 pages
Guidelines ITR 2020-21-For Mentor and Students
No ratings yet
Guidelines ITR 2020-21-For Mentor and Students
2 pages
TRCS - Assignment Issued To Students
No ratings yet
TRCS - Assignment Issued To Students
4 pages
Write Up of Mech Dept For NAAC
No ratings yet
Write Up of Mech Dept For NAAC
3 pages
Storage Tank Protection Using VCI 2
No ratings yet
Storage Tank Protection Using VCI 2
9 pages
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
César Pérez López
No ratings yet
MACHINE LEARNING FOR BEGINNERS: A Practical Guide to Understanding and Applying Machine Learning Concepts (2023 Beginner Crash Course)
From Everand
MACHINE LEARNING FOR BEGINNERS: A Practical Guide to Understanding and Applying Machine Learning Concepts (2023 Beginner Crash Course)
Elaine Tate
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet

Unit IV

Uploaded by

Unit IV

Uploaded by

NEERAJ KHARYA, DEPARTMENT OF

COMPUTER APPLICATIONS, BIT DURG 1

Random noise (i.e. data points that make it difficult to see a

NEERAJ KHARYA, DEPARTMENT OF

NEERAJ KHARYA, DEPARTMENT OF

NEERAJ KHARYA, DEPARTMENT OF

NEERAJ KHARYA, DEPARTMENT OF

NEERAJ KHARYA, DEPARTMENT OF

NEERAJ KHARYA, DEPARTMENT OF

NEERAJ KHARYA, DEPARTMENT OF

NEERAJ KHARYA, DEPARTMENT OF

NEERAJ KHARYA, DEPARTMENT OF

NEERAJ KHARYA, DEPARTMENT OF

NEERAJ KHARYA, DEPARTMENT OF

NEERAJ KHARYA, DEPARTMENT OF

NEERAJ KHARYA, DEPARTMENT OF

NEERAJ KHARYA, DEPARTMENT OF

NEERAJ KHARYA, DEPARTMENT OF

NEERAJ KHARYA, DEPARTMENT OF

NEERAJ KHARYA, DEPARTMENT OF

NEERAJ KHARYA, DEPARTMENT OF

NEERAJ KHARYA, DEPARTMENT OF

NEERAJ KHARYA, DEPARTMENT OF

NEERAJ KHARYA, DEPARTMENT OF

NEERAJ KHARYA, DEPARTMENT OF

NEERAJ KHARYA, DEPARTMENT OF

NEERAJ KHARYA, DEPARTMENT OF

NEERAJ KHARYA, DEPARTMENT OF

 Precision: Precision is used to calculate the model's ability to classify

 Recall/Sensitivity/True Positive Rate (TPR) : It is used to calculate the

NEERAJ KHARYA, DEPARTMENT OF

 A higher TPR and a lower FNR is desirable since we want to correctly

NEERAJ KHARYA, DEPARTMENT OF

◦ True Positives (TP) = 86

NEERAJ KHARYA, DEPARTMENT OF

NEERAJ KHARYA, DEPARTMENT OF

NEERAJ KHARYA, DEPARTMENT OF

NEERAJ KHARYA, DEPARTMENT OF

 In a ROC curve, a higher X-axis value indicates a higher

 When two distributions overlap, we introduce type 1 and type 2

 This is the worst situation. When AUC is approximately 0.5, the

 Sensitivity and Specificity are inversely proportional to each

NEERAJ KHARYA, DEPARTMENT OF

NEERAJ KHARYA, DEPARTMENT OF

NEERAJ KHARYA, DEPARTMENT OF

NEERAJ KHARYA, DEPARTMENT OF

NEERAJ KHARYA, DEPARTMENT OF

NEERAJ KHARYA, DEPARTMENT OF

NEERAJ KHARYA, DEPARTMENT OF

You might also like