Classification Algorithm: Supervised Learning Technique Training Data
Classification Algorithm: Supervised Learning Technique Training Data
Classification
Labeled Data algorithm
Training
Testing
Learned
Labeled Data Classification
model
True Positives(TP): Classifier correctly predicts subject as positive for given class. for e.g.For Cat vs
Dog classification Cat is predicted as Cat.
True Negatives(TN): Classifier correctly predicts subject as negative for given class. for e.g. For
Cat vs Dog classification Dog is predicted as Not cat.
False Positives(FP):Classifier incorrectly predicts subject as positive for given class for e.g. For Cat
vs Dog classification Dog is predicted as Cat. This is also known as “Type I” Error.
False Negatives(FN):
Classifier incorrectly predicts subject as negative for given class for e.g. For Cat vs Dog classification
Cat is predicted as Not Cat. This is also known as “Type II” Error.
Evaluating a Classification model (Cont)
Confusion Matrix:
Example
We have built a new spam filter and want to evaluate how good it is . Given below is the
confusion matrix for the spam filter for 100 e-mails.
Predicted
spam Not spam
Actual spam 10 2
Not spam 15 73
a) What is the number 15 here? Put it in plain English
b) Calculate the precision for the spam filter. What is the interpretation of having this value for
precision i.e How would you explain this to someone doesn’t know how precision is
calculated but still uses e-mail and gets spam e-mail.
c) Calculate the recall for the spam filter. What is the interpretation of having this value for
recall i.e How would you explain this to someone doesn’t know how recall is calculated but
still uses e-mail and gets spam e-mail.
d) You can see that the precision is very good for this spam filter but the recall is not so good.
What does it mean to have high precision and low recall (hint: think about how you
interpret precision and recall and apply it to the context of spam filter). What might the
possible reason you are seeing these results?
e) What does it mean to have high recall and low precision for a spam filter? Which of the
two do you think is better i.e high precision and low recall or high recall and low precision
f) What is the overall accuracy of the spam filter? What do you mean when you say this
spam filter has his value of accuracy?
Example
Evaluating a Classification model (Cont)
Log Loss or Cross-Entropy Loss:
• It is used for evaluating the performance of a
classifier, whose output is a probability value
between the 0 and 1.
• For a good binary Classification model, the value
of log loss should be near to 0.
• The value of log loss increases if the predicted
value deviates from the actual value.
• The lower log loss represents the higher accuracy
of the model.
For Binary classification, cross-entropy can be
calculated as:
Evaluating a Classification model (Cont)
AUC-ROC curve:
• ROC curve stands for Receiver Operating Characteristics
Curve and AUC stands for Area Under the Curve.
• It is a graph that shows the performance of the classification
model at different thresholds.
• To visualize the performance of the multi-class classification model,
we use the AUC-ROC Curve.
• The ROC curve is plotted with TPR and FPR, where TPR (True Positive
Rate) on Y-axis and FPR(False Positive Rate) on X-axis.
Evaluating a Classification model
AUC-ROC curve:
Biological Neuron
Artificial Neuron
Artificial Neuron
A neuron is a mathematical function modeled on the working of
biological neurons
• It is an elementary unit in an artificial neural network
• One or more inputs are separately weighted
• Inputs are summed and passed through a nonlinear function to
produce output
• Every neuron holds an internal state called activation signal
• Each connection link carries information about the input signal
• Every neuron is connected to another neuron via connection link
A typical activation function works as follows:
ì+ 1 for X > t
n
X = å wi xi Y =í
i =1
î 0 for X £ t
Each node i has a weight, w associated with it. The input to node i is x .
i i
t is the threshold.
So if the weighted sum of the inputs to the neuron is above the
threshold, then the neuron fires.
Activation Functions
Perceptron's
A perceptron is a single neuron that classifies a set of inputs into one of
two categories (usually 1 or -1).
If the inputs are in the form of a grid, a perceptron can be used to
recognize visual images of shapes.
The perceptron usually uses a step function, which returns 1 if the
weighted sum of inputs exceeds a threshold, and 0 otherwise.
𝒉 Activation functions
How do we train?
𝒚
4 + 2 = 6 neurons (not counting inputs)
[3 x 4] + [4 x 2] = 20 weights
4 + 2 = 6 biases
26 learnable parameters
𝒙
The Chain Rule
The Chain Rule is a technique for differentiating
composite functions.
Composite functions are made up of layers of
functions inside of functions.
Steps to apply chain rule
0 Identify inner and outer functions.
0 Derive outer function, leaving the inner
function alone.
0 Derive the inner function.
Chain Rule: One Independent Variable and
13-20
Theorem 13.7 Chain Rule: Two Independent Variables
13-21
Chain Rule Example
Training
Forward it Back-
Sample Update the
labeled data through the
network, get
propagate network
(batch) the errors weights
predictions
z j = f ( z - in j )
and sends this signal to all units in the layer above
(outputunits)
Training Algorithm (Cont)
Step 5: Each output unit (Yk ,k=1,…..,m) sums its weighted input signals,
p
y_ink=wOk+
åz w
j =1
j jk
yk=f(y_ink).
Backpropagation of error:
Step 6: Each output unit ( Yk ,k=1,…,m) receives a target
pattern corresponding to the input training patern
computes its error information term.
¶ k = (tk - yk ) f ' ( y _ ink ),
calculates its weight correction term (used to update wjk later),
DwOk = ad k Z j ,
calculates its bias correction term ( used to update wOk later).
DwOk = ad k
and sends to units in the layer below,
Training Algorithm (Cont)
Step 7: Each hidden units (Zj, j=1,…,p) sums its delta inputs from
units in the layer above).
m
O_inj= å ¶ k w jk
k =1