Module 01
Module 01
(1-3)
(Introduction to Machine Leaming)...Page
AIDS)
Machine Learning (MU- Sem 6 - ECS & development of
Machine focuses on the study and
and also make
1.1 MACHINE LEARNING algorithms that can learn from data
predictions on data.
different from Mitchell as A
UQ. What is Machine learning? How it is Machine learning is defined by Tom
respect to
data mining ? program learns from experience 'E' with
measure P,
I (Ref. MU(Comp.) - May 17, 5.Marks, May I9,
5 Marks) some class of tasks T' and performance
measured by P
example if its performance on tasks in T' as
UO. Define Machine learning and explain with
importance of Machine Learning. improves with E." Here E' represents the past
experienced data and "T' represents the tasks such as
(Ref. MU (Comp.) - Dec. 19,5 Marks)
prediction, classification, etc. Example of P', we
A machine that is intellectually capable as much as might want to increase accuracy in prediction.
early
humans, have always attracted writers and Machine learning mainly focuses on the design and
about artificial
computer scientist who were excited development of computer programs that can teach
intelligence and machine learning. themselves to grow and change when exposed to new
The first machine learning system was developed in the data.
1950s. In 1952, Samuel has developed a program to
Using machine learning we can collect information
play checkers. The program was able to observe from a dataset by asking the computer to make some
positions at game and learn the model that gives better sense from data. Machine learning is turning data into
moves for machine player. information.
In 1957, Frank Rosenblatt designed the Perceptron, Data
which is a simple classifier but when it is combined in Computer Program
Output
large numbers, in a network, it became a powerful tool.
Minsky in 1960, came up with limitation of perceptron. Fig. 1.1.1: Machine Learning
He showed that the X-OR problem could not be
The Fig. 1.1.1 is the schematic representation of the
represented by perceptron and such inseparable data ML system. ML system takes the training data and
distribution cannot be handled and following this background knowledge as the input. Background
Minsky's work neural network research went to
knowledge and data helps the Learner program to
dormant until 1980s. provide a solution for a particular task or problem.
Performance corresponding to the solution can be also
Machine learning became very famous in 1990s,due to
the introduction of statistics. Computer science and measurèd. ML system comprises of mainly two
components, Learner and a Reasoner. Learner use the
statistics combination lead to probabilistic approaches training data and background knowledge to build the
in Arificial intelligence. This area is further shifted to model and this can be used by reasoner to provide the
data driven techniques. As Huge amount of data is solution for a task.
available, scientists started to design intelligent Machine learning can be applied to many applications
systems that are able to analyze and learn from data. such as politics to geosciences. It is a tool that can be
Machine learning is a category of Artificial applied to many problems. Any application which
needs to extract some information from data and also
Intelligence. In machine learning computers has the
takes some action on data, can benefit from machine
ability to learn themselves, explicit programming is not learning methods.
required.
(M6-131) Lb Tech-Neo
Publications..ASACHIN SHAH Venture
Machine Learning (MU - Sem 6 - ECS& AIDS) (Introduction to Machine Leaming)..Page no. (1-5)
Example : First we will see some terminologies that In classification task the target variable takes a discrete
are frequently used in machine learning methods.Let's value, and in the task of regression its value could be
take an example that we want to design a classification continuous.
system that will classify the instances in to either In a training dataset we have the value of target
Acceptable or Unacceptable. This kind of system is a variable. The relationship that exists between the
fascinating topic often related with machine learning features and the target variable used by machine for
called expert systems.
learning. The target variable is the evaluation of the
Four features of the various cars are stored in Car.
Test data
Class A
Haxagon ASquare
Triangle
(1Dg)Fig. 1.3.6 : Unsupervised learning Clustering is amethod of grouping the objects into
clusters such that objects with most similarities
Unsupervised learning is a set of algorithms where the remains into a group and has less or no similarities
only information being uploaded is inputs. with the objects of another group.
The device itself, then, is responsible for grouping Cluster analysis finds the commonalities between the
together and creating ideal outputs based on the data it
discovers. Often, unsupervised learning algorithms data objects and categorizes them as per the presence
und absence of those commonalities.
have certain goals, but they are not controlled in any
manner.
The main differences between Supervised and Unsupervised learning are given below :
Table : 1.3.1
Supervised learning model takes direct feedback to check if Unsupervised learning model does not take any feedback.
it is predicting correct output or not.
Supervised learning model predicts the output. Unsupervised learning model finds the hidden patterns in
data.
In supervised learning, input data is provided to the model In unsupervised learning, only input data is provided to the
along with the output. model.
The goal of supervised learning is to train the model so that The goal of unsupervised learning is to find the hidden
it can predict the output when it is given new data. patterns and useful insights from the unknown dataset.
Supervised learning needs supervision to train the model. Unsupervised learning does not need any supervision to train
the model.
Supervised Jearning can be categorized in Classification and Unsupervised Learning can be classified in Clustering and
Regression problems. Associations problems,
Supervised learning can be used for those cases where we Unsupervised learning can be used for those cases where we
know the input as well as corresponding outputs. have only input data and no corresponding output data.
Supervised learning model produces an accurate result. Unsupervised leurning model may give less accurate result as
compared to supervised learning.
t includes various algorithms such as Linear Regression, It includes various algorithms such as Clustering, KNN, and
Logistic Regression, Support Vector Machine, Multi-class Apriori algorithm.
Classification, Decision tree, Bayesian Logic, etc.
a 1.3.3 Reinforcement Learning environment and learns to act within that." How a
Robotic dog learns the movement of his arms is an
GQ. What is Reinforcement Learning? Explain with an example of Reinforcement learning.
example.
It is a core part of Artificial intelligence, and all AI
Reinforcement Learning is a feedback-based Machine agent works on the concept of reinforcement learning.
learning technique in which an agent learns to behave Here we do not need to pre-program the agent, as it
in an environmnent by performing the actions and learns from its own experience without any human
intervention.
seeing the results of actions. For each good action, the
agent gets positive feedback, and for each bad action, Example : Suppose there is an AI agent present within
the agent gets negative feedback or penalty. a maze environment, and his goal is to find the
In Reinforcement Learning, the agent learns diamond. The agent interacts with the environment by
automatically using feedbacks without any labeled performing some actions, and based on those actions,
data, unlike supervised learning. the state of the agent gets changed, and it also receives
a reward or penalty as feedback.
Since there is no labelled data, so the agent is bound to
learn by its experience only. The agent continues doing these three things (take
RL solves a specific type of problem where decision action, change state/remain in the same state, and get
making is sequential, and the goal is long-term, such as feedback), and by doing these actions, he learns and
game-playing, robotics, etc. explores the environment.
The agent interacts with the environment and explores The agent learns that what actions lead to positive
it by itself. The primary goal of an agent in feedback or rewards and what actions lead to negative
reinforcement learning is to improve the performance Tecdback penalty. As a positive reward, the agent
by getting the maximum positive
rewards. positive point, and as a penalty, it gets a negative point.
The agent learns with the procc88 of hit and trial, and
in
based on the experience, it learns to perform the task
"Reinforcement
a better way. Hence, we can say that
learning is atype of machine learning method where
interacts with he
intelligent agent (computer program)
Venture
(M6-131) Tech-Neo Publications...A SACHIN SHAH
Machine Learning (MU - Sem 6 - ECS &AIDS) (ntroduction to Machine Leaming)...Page no. (1-13)
Environment There are mainly three ways to implement
reinforcement-learning in ML, which are :
1. Value-based : The value-based approach is about to
find the optimal value function, which is the maximum
Reward, Actions value at a state under any policy. Therefore, the agent
State expects the long-term return at any state(s) under
policy a.
2. Policy-based : Policy-based approach is to find the
optimal policy for the maximum future rewards without
using the value function. In this approach, the agent
Agent
tries to apply such a policy that the action performed in
(1D4) Fig. 1.3.7 each step helps to maximize the future reward. The
policy-based approach has mainly two types of policy:
For machine learning, the environment is typically
Deterministic : The same action is produced by the
represented by an "MDP" or Markov Decision Process.
policy () at any state.
These algorithms do not necessarily assume
Stochastic : In this policy, probability determines the
knowledge, but instead are used when exact models are produced action.
infeasible. In other words, they are not quite as precise
3. Model-based : In the model-based approach, a virtual
or exact, but they will still serve a strong method in
model is created for the environment, and the agent
various applications throughout different technology explores that environment to learn it. There is no
systems.
particular solution or algorithm for this approach
The key features of Reinforcement Learning are because the model representation is different for each
mentioned below. environment.
In RL, the agent is not instructed about the Here are important characteristics of reinforcement
environment and what actions need to be taken.
learning
It is based on the hit and trial process.
There is no supervisor, only a real number or
The agent takes the next action and changes states reward signal
according to the feedback of the previous action.
Sequential decision making
The agent may get a delayed reward.
Time plays a crucial role in Reinforcement
The environment is stochastic, and the agent needs problems
toexplore it to reach to get the maximum positive
rewards. Fecdback is always delayed, not instantaneous
Agent's actions determine the subsequent data it
A 1.3.3(A) Approaches to Implement receives
Reinforcement Learning
(2) Parameters may affect the speed of learming. dependent. are independent of
Therefore, you each other, so labels
(3) Realistic environments can have partial observability.
should give labels to are given for every
(4) Too much Reinforcement may lead to an overload of all the dependent decision.
states whích can diminish the results. decisions.
(5) Realistic environments can be non-stationary. Best suited Supports and work It is mostly operated
better in AI, where with an interactive
a 1.3.3(C) Applications of Reinforcement human interaction is software systemn or
Learning prevalent. applications.
Here are applications of Reinforcerment Learning : Example Chess game Object recognitioa
(1) Robotics for industrial automation.
1.4 ISSUES IN MACHINE LEARNING
(2) Busincss stratcgy planning
(3) Machine learning and data processing UQ, What are the issues in Machine learning?
(4) Aircraft control and robotmotion control (Ref. MU (Comp.) May 15, 5 Marks)
(5) IL helps you o creatle training systems that provide 1. Which algorithm we have to select to leurn geerl
CUsom instruction and malerials ccording to the larget functions from specific training dataset? Whut
requirernent of students. should be the settings for particular algoriths, so as e
1.3.3(D) Relnforcement Learning Vs. converge to the desired function, given sufficient
raining data? Whicth algorihs perform best for which
Supervised Learning
type of problems und representations?
GQ What is the difference between Reinforcement 2. How much training data is sufticient? What should be
Learning and Supervised Learning? the general aount of data that can be found to relale
the contidence in leurned hypotheses lo the amount
Iraining experience and the character of the learner's
hypothesis space?
(M6-131)
a Tech -Neo Publications..A SACHIN SHAH Venture
Machine Leaming (MU -Sem 6 - ECS &AIDS)
(ntroduction to Machine Learning)..Page no. (1-15)
3. Prior knowledge held by the learner is used at
which (b) If you have chosen unsupervised learning, then
time and manner to guide the process of
generalizing next you necd to focus on what is your aim?
from examples? If we have
approximately correct If you want to fit your data into some discrete
knowledge, will it helpful even when it is only
approximately correct? groups, then use Clustering
4 What is the best strategy for choosing a useful next If you want to find numerical estimate of how
training experience, and how does the choice of this strong the fit into each group, then use density
strategy after the complexity of the learning problem? estimation algorithm
2. Data : Are the features continuous or nominal ? Are
5. To reduce the task of learning to one or more function
there missing values in features? If yes, what is a
approximation problems, what will be the best
reason for missing values? Are there outliers in the
approach? What specific functions should the system
data? To narrow the algorithm selection process, all of
attempt to learn? Can this process itself be automated?
these features of your data can help you.
6 To improve the knowledge representation and to learn Table1.5.1 :Selection of Algorithm
the target function, how can the learner automatically
alter its representation? Supervised Unsupervised
Learning Learning
1.5 HOW TO CHOOSE THE RIGHT Discrete Classification Clustering
ALGORITHM ? Continuous Regression Density Estimation
Good clean data from the first tWO steps is given as Fig. 1.6.1: Typical example of Machine Learning
Application
input to the algorithm. The algorithm extracts
information or knowledge. This knowledge is mostly 1.7 APPLICATIONS OF MACHINE
stored in a format that is readily useable by machine LEARNING
for next 2 steps.
In case of unsupervised learning, training step is not UQ. Write short note on: Machine learming applications.
there because target value is not present. Complete data (Ref. MU (Comp.) -May 16, May 17, 10 Marks)
is used in the next step.
1. Learning Associations
6. Test the algorithm
A supermarket chain-one an example of retail
In this step the information learned in the previous step application of machine learning is basket analysis,
is used. When you are checking an algorithm, you will which is finding associations between products bought
test it to find out whether it works properly or not. In by customers:
supervised case, you have some known values that can If people who buy Ptypically also buy Qand if there is
be used to evaluate the algorithm. a customer who buys Q and does not buy P, he or she
is a potential P customer. Once we identify such
In case of unsupervised, you may have to use some customers, we can target them for cross-selling.
other matrices to evaluate the success. In either case, if In finding an association rule, we are interested in
you are not satisfied, you can again go back to step 4, learning a conditional probability of the form P (Q/P)
change some things and test again. where Q is the product we would like to condition on
P, which are the product l products which we know
Mostly problem occurs in collection or preparation of that customer has already purchased.
data and you will have to go back to step 1.
P(Milk /Bread) = 0.7
Machine Learning system fits a model to the past data Fig, 1.7.2: Regression for prediction of price of flat
to be able to calculate the risk for a new application 4. Unsupervised Learning
and then decides to accept or refuse it accordingly. One of the important unsupervised learning problem is
If income > Q, and savings >Q2
clustering. In clustering dataset is partitioned in to
Then low - risk ELSE high - risk meaningful sub classes known as clusters. For
Other ciassification examples are Optical character example, suppose you want to decorate your home
using given items.
recognition, face recognition, medical diagnosis,
speech recognition and biometric. Now you will classify them using unsupervised
Low-Risk
learning (n0 prior knowledge) and this classification
Savings
can be on the basis of color of items, shape of items,
material used for items, type of items or whatever way
High-Risk
AA youwould like.
5. Reinforcement Learning
There are some of the applications where output of
Income
system is a sequence of actions. In such applications
the sequence of correct actions instead of single action
Fig. 1.7.1l :Classification for credit scoring is important in order toreach goal. An action is said to
3. Regression be good if it is part of good policy. Machine learning
program generates a policy by learning previous good
Suppose we want to design a system that can predict
action scquences. Such methods are called
the price of a flat. Let's take the inputs as the area of
reinforcement methods
the flat, location and purchase ycar and other
information that affects the rate of flat. The output is A good example of reinforcement learning is chess
the price of the flat. The applications where output is playing. In artificial intelligence and machine learning,
numeric are regression problems. one of the most important research area is game
(M6-131)
Tech-Neo Publications. ASACHIN SHAH Ve
Machine Learning (MU - Sem 6 - ECS &AIDS) (Introducion to Machine Learning).Page no.(1-19)
A 1.8.4 How to Balance the Validation and Howcver, when a machine learning model is deployed
Test Datasets to the "rcal world" and is making predictions, typically
the model will not perform any augmentation or
Preserve imbalanced classes
regularization on its input. To mirror the real world, a
If you are working on a classification problem with model should not perform augmentation
imbalanced classes-such as a dataset where one class regularization on the validation or test dataset.
is 99% of the dataset and the other class is 1% of the
There are a few exceptions to the rule:
dataset-then you might consider improving the training
If the validation and/or test datasets are too small
process by oversampling the smaller class. But for your
validation and test datasets, you want to measure your for a model to reliably evaluate, then it might
make sense to use data augmentation to add data
model's performance against the same class balance
that your model would encounter in the real world. samples.
If the entire training dataset is computer
Validation and test datasets should have "newer"
samples generated-like a dataset of images generated from
a video game-then it may be reasonable for the
If you are training a model on time series data, validation and test datasets to also be entirely
typically your goal is to predict something about the computer-generated.
future using data from the past or present.
Cross Validation
In order to properly evaluate a time series model, your
training/validation/test split must obey the "arrow of In machine learning, we couldn't fit the model on the
time": training data and can't say that the model will work
All of the data samples in your validation dataset accurately for the real data. For this, we must assure
should be newer than your training dataset. that our model got the correct patterns from the data,
All of the data samples in your test dataset should be and it is not getting up too much noise. For this
newer than your validation dataset. purpose, we use the cross-validation technique.
If your training dataset contains data samples that are Cross-validation is a technique in which we train our
newer than your validation dataset, then your model's model using the subset of the data-set and then
validation accuracy will be misleadingly high. Your evaluate using the complementary subset of the data
model is effectively traveling backward in time if it set.
set but leaves only one data-point of the available data Test
set and then iterates for each data-point. It has some
Fig:1.8.1: Cross Validation
advantages as well as disadvantages also.
An advantage of using this method is that we make use 1.8.5 Advantages of train/test split
of all data points and hence it is low bias.
(1) This runs K times faster than Leave One Out cross
The major drawback of this method is that it leads to
validation because K-fold cross-validation repeats the
higher variation in the testing model as we are testing train/test split K-times.
against one data point.
(2) Simpler to examine the detailed results of the testing
If the data point is an outlier it can lead to higher
process.
variation. Another drawback is it takes a lot of
execution time as it iterates over the number of data 1.8.6 Advantages of cross-validation
points' times. (1) More accurate estimate of out-of-sample accuracy.
K-Fold Cross Validation
(2) More efficient" use of data as every observation is
In this method, we split the data-set into k number of
subsets(known as folds) then we perform training on used for both training and testing.
the all the subsets but leave one(k-1) subset for the
a 1.8.7 Training Error
evaluation of the trained model. In this method, we
iterate k times with a different subset reserved for In machine learning, training a predictive model
testing purpose each time. means finding a function which maps a set of values x to a
It is always suggested that the value of k should be 10 value y. If we apply the model to the data it was trained on,
as the lower value of k is takes towards validation and we are calculating the training error.
higher value of k leads toLOOCV method. If wecalculate the error on data which was unknown
Example
in the training phase, we are calculating the test
The Fig 1.8.1 shows an cxample of the training subscts error.
Training error is calculated as follows:
and evaluation subsets gencrated in k-fold cross
validation. Here, we have total 25 instances. Erain = neror (Ip (X), Y)
In first iteration we use the first 20 percent of data for
In the above cquation n
evaluation, and the remaining 80 percent for training represents the number of
((1-5] testing and (5-25] training) while in the second training examples. fp (X;)represents the predicted value and
iteration we use the second subset of 20 percent for Y; represents the true or actual values, error (fp (X;), Y) is
evaluation, and the remaining three subsets of the data Used to represent that these two values are same or
not and if
for training([5-10] testing and [1-5 and 10-251 not then these values differs by
how much.
training), and so on.
Adiligent student will strive to practice well and test Many of the techniques in deep learning are heuristics
his abilities using exams from previous years. and tricks aimed at guarding against over fitting.
Nonetheless, doing wellon past exams is noguarantee When we have simple models and abundant data, we
that he willexcel when it matters. expect the generalization error to resemble the training
For instance, the student might try to prepare by rote error. When we work with more complex models and
learning the answers to the exam questions. "This fewer examples, we expect the training error to go
requires the student to memorize many things. She
down but the generalization gap to grow.
might even remember the answers for past exams
perfectly.
(M6-131) Tech-Neo Publications...A SACHIN SHAH Venture
Machine Leaming (MU - Sem 6 - ECS & AIDS) (Introduction to Machine Learning)..Page no. (1-221
Bias - Assumptions made by a model to make a model does not categorize the data corectly, becaus
function easier to learn. (The algorithms error rate on of too many details and noise.
the training set is algorithms bias.)Variance - If you The causes of over fitting are the non-parametnc and
train your data on training data and obtain a very low
non-linear methods because these types of machine
error, upon changing the data and then training the
same previous model you experience high error, this is
learning algorithms have more freedom in building the
variance. (How much worse the algorithm does on the model based on the dataset and therefore they can
test set than the training set is known as the algorithms really build unrealistic models.
variance.) A solution to avoid over fitting is using a
linear
Under fitting algorithm if we have linear data or using the
A statistical model or a machine learning algorithm is parameters like the maximal depth if we are using
said to have under fitting when it cannot capture the decision trees.
underlying trend of the data.
In anutshell, Overitting - High variance and
Under fitting destroys the accuracy of our machine low bias
Jearning model. Its occurrence simply means that our Techniques to reduce overfitting:
model or the algorithm does not fit the data wel 1.
Increase training data.
enough.
2.
Reduce model complexity.
(M6-131)
Tech-Neo Publications..A SACHIN SHAH Venture
Machine Leaming (MU- Sem6 - ECS &AIDS) (Introduction to Machine Learning)..Page no. (1-23)
3 Early stopping during the training phase (have an cyc In order to get a good fit, we will stop at a point just
over the loss over the training period as soon as loss
before where the error starts increasing. At this point
begins to increase stop training). the model is said to have good skills on train1ng
4. Ridge Regularization and Lasso Regularization datasets as well as our unscen testing dataset.
5. Use dropout for neural networks to tackle over fitting. IeT Bias-variance trade-off
ldeally,the case when the model makes the predictions So what is the right measure? Depending on the model
with 0 error, is said to have a good fit on the data, This
at hand, a performance that lies between over fitting and
situation is achievable at a spot between over fitting
under fitting is more desirable. This trade-off is the most
and under fitting.
integral aspect of Machine Learning model training. As we
In order to understand it we will have to look at the
discussed, Machine Learning models fulfil their purpose
performance of our model with the passage of time,
when they generalize well. Generalization is bound by the
while it is learning from training dataset.
two undesirable outcomes - high bias and high variance.
With the passage of time, our model will keep on Detecting whether the model suffers from either one is the
learning and thus the eror for the model on the
sole responsibility of the model developer.
training and testing data will keep on decreasing.
If it will learn for too long, the model will become
more prone to overfitting due to the presence of noise
and less useful details. Hence the performance of our
model will decrease.
XX XXXX
Xx Xx X XX
Under-fitting Approplrate-fitting Over-fltting
(too simple to (forcefitting-too
explain the variance) good to be true)
classification model :
(M6-131)
achine Learning (MU- Sem 6 - ECS &AlDS) (Introduction to Machine Learning).Page no. (1-25)
To understand different metrics, we must understand the number of False Negatives as a minimum.
the Confusion matrix. A confusion matrix is a table Thus, we have different metrics like recall,
that is often used to describe the performance of a precision, Fl-score etc.
classification model (or "classifier") on a set of test Thus, Accuracy using above values will be
data for which the true values are known. (500+300y(500+50+150+300) = 800/1000 = 80%
Predicted Predicted
2. Precision and Recall
TP TP
Actual
TN FA Precision
TP + FP Recall= TP + FN
TN- True negatives (actual 0 predicted 0) & TP- True negatives for any practical use since we don't want our
positives (actual 1 predicted 1) model to mark a patient suffering from cancer as safe.
FP- False positives (actual Opredicted 1) & FN- False On the other hand, predicting a healthy patient as
Negatives (actual 1predicted 0) cancerous is not a big issue since, in further diagnosis,
it will be cleared that he does not have cancer. Recall is
Consider the following values for the confusion
also known as Sensitivity.
matrix
Thus, Recall using above values willbe 500/(500+150)
True negatives (TN) = 300
= 500/650= 76.92%
True positives (TP) = 500
Precision is useful when we want to reduce the number
False negatives (FN) = 150 of False Positives. Consider a system that prricts
False positives (FP) = 50 whether the e-mail received is spam or not. Taking
spam as apositive class, we do not want our system to
1, Accuracy
TP + TN predict non-spam e-mails (important e-mails) as spam,
Accuracy = TP + FP+ FN + TN ie., the aim is to reduce the number of False Positives.
will be
Accuracy is defined as the ratio of the number of Thus, Precision using above values
correct predictions and the total number of 500/(500+50) = 50O/550 = 90.90%
predictions. It lies between [0,1]. In general,
3. Speciflcity
higher accuracy means a better model (TP and
Specificity is defined as the ratio of True negatives and
TN must be high).
True negatives + False positives. We want the value of
However, accuracy is not a useful metric in case specificity to be high. Its value lies between [0,1].
of an imbalanced dataset (datasets with uneven True Negatives
distribution of classes). Say we have a data of Specificity = True Negatives + False Positives
be
1000 patients out of which S50 are having cancer Thus, Specificity using above values will
predicts
and 950 not, a dumb model which always 300/(300+50) = 300/350 = 85.71%
accuracy of 95%, but it
as no cancer will have the 4. F1-score
want
is of no practical use since in this case, we
Tech-Neo Publication...A SACHIN SHAH Venture
(M6-131)
Machine Leaming (MU- Sem 6 - ECS &AIDS) (Introduction to Machine Learning)..Page no. (1-26)
F = 2x precision X recall agrecment occurring by chance. Cohen's kappa
precision + recall
mcasurcs the agrcement between two raters who each
F-score is a metric that combines both Precision and
classify Nitems into Cmutually exclusive cate gories.
Recall and equals to the harmonic mean of prccision
Cohen's kappa coefficient is defined and given by the
and recall. Its value lies between [0,1] (more the valuc
following function :
better the Fl-score).
Po- Pe
K =
Using values of precision=0.9090 and recall=0.7692, 1-Pe
Fl-score = 0.8333 = 83.33%
Where:
5. AUC-ROC
Po = relative observed agreement among raters.
AUC (Area Under The Curve)- ROC (Receiver
Pe = the hypothetical probability of chance
Operating Characteristics) curve is one of the most
agreement.
important evaluation metrics for checking any
classification model's performance. Po and pe are computed using the observed data to
It is plotted between FPR (X-axis) and TPR (Y-axis). If calculate the probabilities of each observer randomly
the value is less than 0.5 than the model is even worse saying each category. If the raters are in complete
than a random guessing model. agreement then k = 1. If there is no agreement among
TP
the raters other than what would be expected by chance
True Positive Rate (TPR) =
FP + FN (as given by pe), k<0.
FP
False Positive Rate (FPR) = S Example
FP+ TN
Comparing ROC Curves
Ex. 1.10.1: Suppose that you were analyzing data related to
0.9
a group of 50 people applying for a grant. Each grant
0.8
0.74
proposal was read by two readers and each reader either said
yesHsodonl
0.6 Yes" or No" to the proposal. Suppose the disagreement
0.5 - count data were as follows, where A and B are readers, data
0.44 on the diagonal slanting left shows the count of agreements
0.3
0.2
Worthless and the data on the diagonal slanting right, disagreements :
Good
0.1 Excellent
0
0 0.10.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Yes No
A
False positive rate Yes 20 5
Fig. 1.10.1 No 10 15