Slides
Slides
Classification
Presented by:
Dr Noureddin Sadawi
Remember!
● Transforming Attributes
– Numerical to Categorical (Binning or Discretization)
– Categorical to Numerical (Encoding or Continuization)
Linear/Non-linear Separability
● Naive Bayesian
● Decision Tree
– Covariance Matrix
● Linear Discriminant Analysis
● Logistic Regression
– Similarity Functions
● K Nearest Neighbours
– Others
● Artificial Neural Network
● Support Vector Machine
The ZeroR Classifier
Presented by:
Dr Noureddin Sadawi
Presented by:
Dr Noureddin Sadawi
Presented by:
Dr Noureddin Sadawi
P(Yes) = 9 / 14
P(No) = 5 / 14
Example
Example
● Let's assume we have a day with:
Outlook = Rainy
Temp = Mild
Humidity = Normal
Windy = True
Likelihood of Yes = P(Outlook=Rainy|Yes)*P(Temp=Mild|Yes)*P(Humidity=Normal|
Yes)*P(Windy=True|Yes)*P(Yes) =
2/9 * 4/9 * 6/9 * 3/9 * 9/14 = 0.014109347
Likelihood of No = P(Outlook=Rainy|No)* P(Temp=Mild|No)*P(Humidity=Normal|
No)*P(Windy=True|No)*P(No) =
3/5 * 2/5 * 1/5 * 3/5 * 5/14 = 0.010285714
● Now we normalize:
P(Yes) = 0.014109347/(0.014109347+0.010285714) = 0.578368999
P(No) = 0.010285714/(0.014109347+0.010285714) = 0.421631001
Example
● Let's assume we have a day with:
Outlook = Rainy
Temp = Mild
Humidity = Normal
Windy = True
Likelihood of Yes = P(Outlook=Rainy|Yes)*P(Temp=Mild|Yes)*P(Humidity=Normal|
Yes)*P(Windy=True|Yes)*P(Yes) =
2/9 * 4/9 * 6/9 * 3/9 * 9/14 = 0.014109347
Likelihood of No = P(Outlook=Rainy|No)* P(Temp=Mild|No)*P(Humidity=Normal|
No)*P(Windy=True|No)*P(No) =
3/5 * 2/5 * 1/5 * 3/5 * 5/14 = 0.010285714
● Now we normalize:
P(Yes) = 0.014109347/(0.014109347+0.010285714) = 0.578368999
P(Yes) = 0.010285714/(0.014109347+0.010285714) = 0.421631001
The zero-frequency problem
Presented by:
Dr Noureddin Sadawi
Presented by:
Dr Noureddin Sadawi
Presented by:
Dr Noureddin Sadawi
● Finally, taking the natural log of both sides, we can write the equation
in terms of log-odds (logit) which is a linear function of the predictors
● The coefficient (b1) is the amount the logit (log-odds) changes with a
one unit change in x
● As mentioned before, logistic regression can handle any
number of numerical and/or categorical variables
Linear Regression & Logistic Regression
●
There are several analogies between linear regression and logistic
regression
●
Just as ordinary least square regression is the method used to estimate
coefficients for the best fit line in linear regression, logistic regression uses
maximum likelihood estimation (MLE) to obtain the model coefficients that
relate predictors to the target
● After this initial function is estimated, the process is repeated until LL (Log
Likelihood) does not change significantly
Pseudo R-squared
Presented by:
Dr Noureddin Sadawi
● With K=3, there are two Default=Y and one Default=N out of three closest
neighbors. The prediction for the unknown case is again Default=Y
Standardized Distance
Presented by:
Dr Noureddin Sadawi
Diagram
Distance measures for cont. variables
p1=(w1,x1,y1,z1), p2=(w2,x2,y2,z2)
Euc. Dist. (p1,p2) = sqrt((w1-w2)^2+(x1-x2)^2+(y1-y2)^2+(z1-z2)^2)
Example Dataset
Presented by:
Dr Noureddin Sadawi
Biological Background
● An artificial neutral network (ANN) is a system that is based on the biological neural
network, such as the brain
● The brain has approximately 100 billion neurons, which communicate through electro-
chemical signals (the neurons are connected through junctions called synapses)
● Each neuron receives thousands of connections with other neurons, constantly receiving
incoming signals to reach the cell body
● If the resulting sum of the signals surpasses a certain threshold, a response is sent
through the axon
● The ANN attempts to recreate the computational mirror of the biological neural network,
although it is not comparable since the number and complexity of neurons used in a
biological neural network is many times more than those in an artificial neutral network
Perceptron (an artificial neuron)
Source: http://www.mathsisfun.com/equation_of_line.html
Transfer (Activation) Functions
Presented by:
Dr Noureddin Sadawi
Java Implementation
● We will have three variables x, y and z (features)
● Each instance will belong to either class 1 or 0
● We will have 100 randomly generated instances (50 of class 0 and
50 of class 1) you can read instances from an input file if you wish
● We start with random weights and bias
● We loop through instances and update weights and bias (the
process involves computing local & global error)
● We continue until stopping condition is satisfied (a solution is found
OR max # of iterations is reached)
● I have modified Richard Knop's C code which can be found here:
https://github.com/RichardKnop/ansi-c-perceptron
Transfer (Activation)
Functions
In Artifician Neural Networks
Presented by:
Dr Noureddin Sadawi
Source: http://www.heatonresearch.com
Artificial Neural Networks
The Multi-Layer Perceptron
MLP
Presented by:
Dr Noureddin Sadawi
Presented by:
Dr Noureddin Sadawi
Presented by:
Dr Noureddin Sadawi
Choose the centers randomly from the training set, compute the
spread for the RBF function using the normalization method
and Find the weights using the pseudo-inverse method
Use Clustering for finding the centers, normalization to choose
Spreads and LMS algorithm for finding the weights (Hybrid
Learning Process)
Apply the gradient descent method for finding centers, spread
and weights, by minimizing the (instantaneous) squared error
Support Vector Machine
(SVM) Classification
Part 3
Presented by:
Dr Noureddin Sadawi
multiplying by y just changes the sign for the two cases of being on either
side of the decision surface
● Since lies on the decision boundary, it satisfies:
● Therefore:
●
We solve for r and get:
● If we compute r for a support vector, then the margin width is
2*r
● The margin is invariant to scaling of parameters because it
is normalized by the length of
– Therefore, we could use unit vectors by requiring that
● Map data into new space, then take the inner product of
the new vectors
● The image of the inner product of the data is the inner
product of the images of the data
● Two kernel functions are shown below