Unit1 ML NGP
Unit1 ML NGP
Techniques
(KAI 601)
Recommended Books:
1. Tom M. Mitchell,―Machine Learning, McGraw-Hill Education (India) Private Limited,
2013.
5. Bishop, C., Pattern Recognition and Machine Learning. Berlin: Springer- Verlag.
Course Outcomes
• CO1. To understand the need for machine learning for various problem solving.
• CO2. To understand a wide variety of learning algorithms and how to evaluate models generated
from data.
• CO4. To design appropriate machine learning algorithms and apply the algorithms to a real-world
problems.
• CO5. To optimize the models learned and report on the expected accuracy that can be achieved by
applying the models.
Unit-1
• “Learning denotes changes in a system that ... enable a system to do the same task
more efficiently the next time.” - Herbert Simon
• Some learning is immediate, induced by a single event (e.g. being burned by a hot
stove), but much skill and knowledge accumulates from repeated experiences.
Types of Learning
1. Visual (Spatial) :By representing information and with
images, students are able to focus on meaning, such as
architecture, engineering, project management, or design.
2. Aural (Auditory-Musical): If you need someone to
tell you something out loud to understand it, you are an
auditory learner. such as musician, recording engineer,
speech pathologist, or language teacher.
3. Verbal (Linguistic): People who find it easier to express
themselves by writing or speaking can be regarded as a verbal learner.
4. Physical (Kinesthetic) :In this style, learning happens
when the learner carries out a physical activity, rather
than listening to a lecture or watching a demonstration.
Types of Learning (Cont…)
5. Logical (Mathematical) :When you like using your
brain for logical and mathematical reasoning,
you’re a logical learner. You easily recognize patterns
and can connect seemingly meaningless concepts easily.
such as scientific research, accountancy, bookkeeping
or computer programming.
6. Social (Interpersonal) : If you’re at best in socializing
and communicating with people, both verbally and
non-verbally, this is what you are; a social learner.
People often come to you to listen and ask for
advice. counseling, teaching, training and coaching,
sales, politics, and human resources among others.
Why ML is the future?
While designing a Learning system various design issues and approaches must be
considered.
Then, Confusion Matrix of all the methods(K-nearest neighbor, Random Forest, Logistic Regression) were compared to
select one.
But sometimes two or more confusion matrices are very similar and make it hard to choose which Machine Learning
method is a better fit for this data?
So, we have more sophisticated metrics, Like Sensitivity, Specificity, ROC and AUC, that can help us in making a decision.
• The size of the confusion matrix is determined by the number of
things we want to predict.
• In the first example, we were only trying to predict two things: if
someone had heart disease or not. So, that gave us a confusion
matrix with 2 rows and 2 columns.
• Now, if in next example we have 3 things to choose from, then we
have confusion matrix with 3 rows and 3 columns.
X_train, X_test, y_train, y_test = train_test_split(X, y,test_size=0.25)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
cm = confusion_matrix(y_test,y_pred)
AUC (Area Under The Curve) -ROC (Receiver Operating Characteristics) curve.
The ROC curve is plotted with TPR against the FPR where TPR is on the y-axis and FPR is on the x-axis.
PR Curve
• It is plotting the precision against the recall for each threshold.
• A no-skill classifier is one that cannot discriminate between the
classes and would predict a random class or a constant class in all
cases.
• It is desired that the algorithm should have both high precision, and
high recall. However, most machine learning algorithms often involve
a trade-off between the two. A good PR curve has greater AUC (area
under curve).
How to read
a PR Curve
F1 Score
• It is described as the harmonic mean of the precision and recall of a
classification model. The two metrics contribute equally to the score,
ensuring that the F1 metric correctly indicates the reliability of a
model.
F1 Score
• The F1 score integrates precision and recall into a single metric to gain
a better understanding of model performance.
• Accuracy is used when the True Positives and True negatives are more
important while F1-score is used when the False Negatives and False
Positives are crucial.
• Accuracy can be used when the class distribution is similar while F1-
score is a better metric when there are imbalanced classes.
• It ranges from 0 to 1, where 1 indicates perfect precision and recall,
and 0 means neither perfect precision nor recall.
• As a general rule of thumb, an F1 score of 0.7 or higher is often
considered good. In some applications, a higher F1 score may be
required, mainly if precision and recall are both essential and a high
cost is associated with false positives and false negatives.
There are mainly two types of errors in machine learning, which are:
•Reducible errors: These errors can be reduced to improve the model accuracy. Such errors
can further be classified into bias and Variance.
•Irreducible errors: These errors will always be present in the model regardless of which
algorithm has been used. The cause of these errors is unknown variables whose value can't
be reduced.
Bias and Variance are the components of generalization error.
What is Bias?
• It can be defined as an inability of machine learning algorithms to capture the true relationship
between the data points.
• While making predictions, a difference occurs between prediction values made by the model
and actual values/expected values, and this difference is known as bias errors or Errors due to
bias.
• Each algorithm begins with some amount of bias because bias occurs from assumptions in the
model, which makes the target function simple to learn.
• Generally, a linear algorithm has a high bias, as it makes them learn fast. Whereas a nonlinear
algorithm often has low bias.
There are four possible combinations of bias and variances, which are represented by the below diagram:
Low-Bias, Low-Variance:
The combination of low bias and low variance shows an ideal machine
learning model. However, it is not possible practically.
Low-Bias, High-Variance: With low bias and high variance, model
predictions are inconsistent and accurate on average. This case occurs
when the model learns with a large number of parameters and hence
leads to an overfitting
High-Bias, Low-Variance: With High bias and low variance, predictions
are consistent but inaccurate on average. This case occurs when a model
does not learn well with the training dataset or uses few numbers of the
parameter. It leads to underfitting problems in the model.
High-Bias, High-Variance:
With high bias and high variance, predictions are inconsistent and also
inaccurate on average.
How to identify High variance or High Bias?
Bias-Variance Trade-Off
How to identify High variance or High Bias?
High variance can be identified if the model has Low training error and high test error.
High Bias can be identified if the model has High training error and the test error is almost similar to
training error.
Bias-Variance Trade-Off
While building the machine learning model, it is really important to take care of bias and variance in order
to avoid overfitting and underfitting in the model. If the model is very simple with fewer parameters, it
may have low variance and high bias. Whereas, if the model has a large number of parameters, it will have
high variance and low bias. So, it is required to make a balance between bias and variance errors, and this
balance between the bias error and variance error is known as the Bias-Variance trade-off.
Bias-Variance Trade-Off
For an accurate prediction of the model, algorithms need a low variance and low bias. But this is not
possible because bias and variance are related to each other:
Ideally, we need a model that accurately captures the regularities in training data and simultaneously
generalizes well with the unseen dataset.
Unfortunately, doing this is not possible simultaneously. Because a high variance algorithm may perform
well with training data, but it may lead to overfitting to noisy data.
Whereas, high bias algorithm generates a much simple model that may not even capture important
regularities in the data.
So, we need to find a sweet spot between bias and variance to make an optimal model.
Three commonly used methods for finding the sweet spot between simple and complicated models are:
1. Regularization
2. Bagging
3. Boosting
What is Entropy in Machine Learning
Entropy is the machine learning metric that measures the unpredictability or impurity in the system.
When information is processed in the system, then every piece of information has a specific value to make and can be
used to draw conclusions from it. So if it is easier to draw a valuable conclusion from a piece of information, then
entropy will be lower in Machine Learning, or if entropy is higher, then it will be difficult to draw any conclusion from
that piece of information.
Entropy is the measurement of disorder or impurities in the information processed in machine learning. It determines
how a decision tree chooses to split data.
We can understand the term entropy with any simple example: flipping a coin. When we flip a coin, then there can be
two outcomes. However, it is difficult to conclude what would be the exact outcome while flipping a coin because there
is no direct relation between flipping a coin and its outcomes. There is a 50% probability of both outcomes; then, in
such scenarios, entropy would be high. This is the essence of entropy in machine learning.