0% found this document useful (0 votes)
52 views28 pages

Classification Algorithm: Supervised Learning Technique Training Data

The Classification algorithm is a supervised learning technique that identifies the category of new observations based on training data. It learns from labeled examples to predict the class of unlabeled examples. Some key points: - Classification outputs categorical labels, not numerical values. Examples include spam/not spam, cat/dog. - Methods include binary classifiers with two classes and multi-class classifiers with more than two classes. - Models are trained on labeled data and tested on unlabeled data to evaluate performance using metrics like confusion matrices, precision, recall, and accuracy. - Neural networks like perceptrons can perform classification tasks and are trained using methods like backpropagation to adjust weights.

Uploaded by

mirahaem5
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views28 pages

Classification Algorithm: Supervised Learning Technique Training Data

The Classification algorithm is a supervised learning technique that identifies the category of new observations based on training data. It learns from labeled examples to predict the class of unlabeled examples. Some key points: - Classification outputs categorical labels, not numerical values. Examples include spam/not spam, cat/dog. - Methods include binary classifiers with two classes and multi-class classifiers with more than two classes. - Models are trained on labeled data and tested on unlabeled data to evaluate performance using metrics like confusion matrices, precision, recall, and accuracy. - Neural networks like perceptrons can perform classification tasks and are trained using methods like backpropagation to adjust weights.

Uploaded by

mirahaem5
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Classification Algorithm

The Classification algorithm is a Supervised Learning technique that is


used to identify the category of new observations on the basis of
Training data.
Such as, Yes or No, 0 or 1, Spam or Not Spam, cat or dog, etc.
Classes can be called as targets/labels or categories.
Unlike regression, the output variable of Classification is a category,
not a value, such as "Green or Blue", "fruit or animal", etc.
y=f(x), where y = categorical output

Classification
Labeled Data algorithm

Training
Testing

Learned
Labeled Data Classification
model

Methods that can learn from and make test on data


Classification Algorithm
Binary Classifier: If the classification problem has only two possible
outcomes, then it is called as Binary Classifier.
Examples: YES or NO, MALE or FEMALE, SPAM or NOT SPAM, CAT or
DOG, etc.
Multi-class Classifier: If a classification problem has more than two
outcomes, then it is called as Multi-class Classifier.
Example: Classifications of types of crops, Classification of types of
music.
Learners in Classification Problems
Lazy Learners: Lazy Learner firstly stores the training
dataset and wait until it receives the test dataset. In
Lazy learner case, classification is done on the basis
of the most related data stored in the training
dataset. It takes less time in training but more time
for predictions.
Example: K-NN algorithm, Case-based reasoning
Eager Learners: Eager Learners develop a
classification model based on a training dataset
before receiving a test dataset. Opposite to Lazy
learners, Eager learners take less time in training and
more time in prediction. Example: Decision Trees,
Naïve Bayes, ANN.
Evaluating a Classification model
Confusion Matrix:
The confusion matrix provides us a matrix/table as output and describes the
performance of the model.
It is also known as the error matrix.
The matrix consists of predictions result in a summarized form, which has a total
number of correct predictions and incorrect predictions. The matrix looks like as
below table:

True Positives(TP): Classifier correctly predicts subject as positive for given class. for e.g.For Cat vs
Dog classification Cat is predicted as Cat.
True Negatives(TN): Classifier correctly predicts subject as negative for given class. for e.g. For
Cat vs Dog classification Dog is predicted as Not cat.
False Positives(FP):Classifier incorrectly predicts subject as positive for given class for e.g. For Cat
vs Dog classification Dog is predicted as Cat. This is also known as “Type I” Error.
False Negatives(FN):
Classifier incorrectly predicts subject as negative for given class for e.g. For Cat vs Dog classification
Cat is predicted as Not Cat. This is also known as “Type II” Error.
Evaluating a Classification model (Cont)
Confusion Matrix:
Example
We have built a new spam filter and want to evaluate how good it is . Given below is the
confusion matrix for the spam filter for 100 e-mails.

Predicted
spam Not spam
Actual spam 10 2
Not spam 15 73
a) What is the number 15 here? Put it in plain English
b) Calculate the precision for the spam filter. What is the interpretation of having this value for
precision i.e How would you explain this to someone doesn’t know how precision is
calculated but still uses e-mail and gets spam e-mail.
c) Calculate the recall for the spam filter. What is the interpretation of having this value for
recall i.e How would you explain this to someone doesn’t know how recall is calculated but
still uses e-mail and gets spam e-mail.
d) You can see that the precision is very good for this spam filter but the recall is not so good.
What does it mean to have high precision and low recall (hint: think about how you
interpret precision and recall and apply it to the context of spam filter). What might the
possible reason you are seeing these results?
e) What does it mean to have high recall and low precision for a spam filter? Which of the
two do you think is better i.e high precision and low recall or high recall and low precision
f) What is the overall accuracy of the spam filter? What do you mean when you say this
spam filter has his value of accuracy?
Example
Evaluating a Classification model (Cont)
Log Loss or Cross-Entropy Loss:
• It is used for evaluating the performance of a
classifier, whose output is a probability value
between the 0 and 1.
• For a good binary Classification model, the value
of log loss should be near to 0.
• The value of log loss increases if the predicted
value deviates from the actual value.
• The lower log loss represents the higher accuracy
of the model.
For Binary classification, cross-entropy can be
calculated as:
Evaluating a Classification model (Cont)
AUC-ROC curve:
• ROC curve stands for Receiver Operating Characteristics
Curve and AUC stands for Area Under the Curve.
• It is a graph that shows the performance of the classification
model at different thresholds.
• To visualize the performance of the multi-class classification model,
we use the AUC-ROC Curve.
• The ROC curve is plotted with TPR and FPR, where TPR (True Positive
Rate) on Y-axis and FPR(False Positive Rate) on X-axis.
Evaluating a Classification model
AUC-ROC curve:
Biological Neuron

The human brain is made up of billions of simple processing units –


neurons.

• Inputs are received on dendrites, and if the input levels are


over a threshold, the neuron fires, passing a signal through
the axon to the synapse which then connects to another
neuron.
Artificial Neural Networks (ANN)
Information processing paradigm inspired by biological nervous systems

ANN is composed of a system of neurons connected by synapses


ANN learn by example
Adjust synaptic connections between neurons
History
1943: McCulloch and Pitts model neural networks based on their
understanding of neurology.
Neurons embed simple logic functions:
• a or b
• a and b
Perceptron (Rosenblatt 1958)
• Association units A1, A2, … extract features from user input
• Output is weighted and associated
• Function fires if weighted sum of input exceeds a threshold.
Back-propagation learning method (Werbos 1974)
• Three layers of neurons
• Input, Output, Hidden
• Better learning rule for generic three layer networks
Biological Neuron vs. Artificial Neuron

Biological Neuron Artificial Neuron


Cell Nucleus (Soma) Node
Dendrites Input
Synapse Weights or
interconnections
Axon Output

Artificial Neuron
Artificial Neuron
A neuron is a mathematical function modeled on the working of
biological neurons
• It is an elementary unit in an artificial neural network
• One or more inputs are separately weighted
• Inputs are summed and passed through a nonlinear function to
produce output
• Every neuron holds an internal state called activation signal
• Each connection link carries information about the input signal
• Every neuron is connected to another neuron via connection link
A typical activation function works as follows:
ì+ 1 for X > t
n
X = å wi xi Y =í
i =1
î 0 for X £ t
Each node i has a weight, w associated with it. The input to node i is x .
i i
t is the threshold.
So if the weighted sum of the inputs to the neuron is above the
threshold, then the neuron fires.
Activation Functions
Perceptron's
A perceptron is a single neuron that classifies a set of inputs into one of
two categories (usually 1 or -1).
If the inputs are in the form of a grid, a perceptron can be used to
recognize visual images of shapes.
The perceptron usually uses a step function, which returns 1 if the
weighted sum of inputs exceeds a threshold, and 0 otherwise.

There are two types of Perceptron's: Single layer and Multilayer.

Single layer Perceptron's can learn only linearly separable patterns.


Multilayer Perceptron's or feedforward neural networks with two or more
layers have the greater processing power.
Training Perceptron's
Learning involves choosing values for the weights
The perceptron is trained as follows:
First, inputs are given random weights (usually between –0.5 and 0.5).
An item of training data is presented. If the perceptron mis-classifies it,
the weights are modified according to the following:
wi ¬ wi + (a ´ xi ´ (t - o ))
where t is the target output for the training example, o is the output
generated by the perceptron and a is the learning rate, between 0
and 1 (usually small such as 0.1)

Cycle through training examples until successfully classify all examples


Each cycle known as an epoch
Multilayer Neural Networks
Multilayer neural networks can classify a
range of functions, including non linearly Weights
separable ones.
Each input layer neuron connects to all
neurons in the hidden layer.
The neurons in the hidden layer connect 𝒉 = 𝝈(𝐖𝟏 𝒙 + 𝒃𝟏 )
to all neurons in the output layer.
𝒚 = 𝝈(𝑾𝟐 𝒉 + 𝒃𝟐 )

𝒉 Activation functions

How do we train?
𝒚
4 + 2 = 6 neurons (not counting inputs)
[3 x 4] + [4 x 2] = 20 weights
4 + 2 = 6 biases
26 learnable parameters
𝒙
The Chain Rule
The Chain Rule is a technique for differentiating
composite functions.
Composite functions are made up of layers of
functions inside of functions.
Steps to apply chain rule
0 Identify inner and outer functions.
0 Derive outer function, leaving the inner
function alone.
0 Derive the inner function.
Chain Rule: One Independent Variable and

13-20
Theorem 13.7 Chain Rule: Two Independent Variables

13-21
Chain Rule Example
Training
Forward it Back-
Sample Update the
labeled data through the
network, get
propagate network
(batch) the errors weights
predictions

Optimize (min. or max.) objective/cost function 𝑱(𝜽)


Generate error signal that measures difference
between predictions and target values

Use error signal to change the weights and get


more accurate predictions
Subtracting a fraction of the gradient moves you
towards the (local) minimum of the cost function
Training Algorithm
Step 0 : Initialize weights
(Set to random variables with zero mean and variance one)
Step 1: While stopping condition is false do Step 2-9.
Step 2:
For each training pair do Steps 3-8.
Feed forward
Step 3: Each input unit(Xi,i=1,..,n) receives input
signal xi and broadcasts this signal to all units in the
layer above(the hidden units)
Step 4: Each hidden unit (Zj j=1,…,p) sums its weighted input signals
n
z-in j = voj + å xi vij
i =1
applies its activation function to compute its output signal

z j = f ( z - in j )
and sends this signal to all units in the layer above
(outputunits)
Training Algorithm (Cont)
Step 5: Each output unit (Yk ,k=1,…..,m) sums its weighted input signals,
p
y_ink=wOk+
åz w
j =1
j jk

and applies its activation function to compute its output signal.

yk=f(y_ink).
Backpropagation of error:
Step 6: Each output unit ( Yk ,k=1,…,m) receives a target
pattern corresponding to the input training patern
computes its error information term.
¶ k = (tk - yk ) f ' ( y _ ink ),
calculates its weight correction term (used to update wjk later),
DwOk = ad k Z j ,
calculates its bias correction term ( used to update wOk later).
DwOk = ad k
and sends to units in the layer below,
Training Algorithm (Cont)
Step 7: Each hidden units (Zj, j=1,…,p) sums its delta inputs from
units in the layer above).
m

O_inj= å ¶ k w jk
k =1

multiplies by the derivative of its activation function to calculate its error


information term,
¶ j = ¶ _ in j f ' ( z _ in j ),
calculates its weight correction term(used to update vij later),
Dvij = a¶ j xi
and calculates its bias correction term(used to update voj later),
Dvoj = a¶ j
Training Algorithm (Cont)
Update weights and bias
Step 8: Each output units(Yk,k=1,….,m) updates its bias
and weights(j=0,…,p):
wjk(new)=wjk(old)+ Each hidden unit(Zj j=1,….,p) updates its bias and
weights(i=0,….,n):
Step 9: Test stopping condition
Example
Given the following multiple neural network /deep learning with a
training data points i=[i1=0.1,i2=0.2,i3=0.7],target value
l=[l1=1.0,l2=0.0,l3=0.0],learning rate=0.8 and the bias value at each
layers b=[1.0,1.0,1.0]
• Initialize the weights randomly.
• Forward pass the inputs and calculate the cost.
• Apply backpropagation and adjust the weights accordingly for 2
epoch

You might also like