0% found this document useful (0 votes)
10 views35 pages

COMP-377Week7 v1.1

The document covers various concepts in artificial neural networks, focusing on logistic regression, support vector machines, and the perceptron model. It explains how artificial neurons function, including their structure, activation functions, and the perceptron algorithm for binary classification. Additionally, it introduces the ADALINE model as an improvement over the perceptron, emphasizing the use of a linear activation function for weight updates.

Uploaded by

Noveen Mirza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views35 pages

COMP-377Week7 v1.1

The document covers various concepts in artificial neural networks, focusing on logistic regression, support vector machines, and the perceptron model. It explains how artificial neurons function, including their structure, activation functions, and the perceptron algorithm for binary classification. Additionally, it introduces the ADALINE model as an improvement over the perceptron, emphasizing the use of a linear activation function for weight updates.

Uploaded by

Noveen Mirza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

AI for Software Developers

Artificial Neural
Networks – Part 1
Lecture 6 Review
❑ Logistic Regression ❑ The standard logistic function
➢ linear regression is not a (also known as the sigmoid
𝟏
good solution for predicting function), 𝒇 𝒙 = 𝟏+𝒆−𝒙 , where e is
binary-valued labels the base of the natural logarithm, is
(𝑦 (𝑖) ∈ {0,1}). the function that can be used to
model the binary response.
➢ Logistic regression can be
❑ If we solve 𝑃 𝑦 = 1 𝑥 =
used for predicting binary- 𝟏
valued labels – acts as a −(𝛽 + 𝛽 ∙𝑥)
for 𝛽0 + 𝛽1 ∙ 𝑥 + ϵ on
𝟏+𝒆 0 1

classification algorithm. the right side, we obtain logistic


regression model as a linear
model for the log odds:
𝑝
𝑙𝑜𝑔𝑖𝑡 𝑝 = log( ) = 𝛽0 + 𝛽1 ∙ 𝑥 + ϵ
1−𝑝

2 7/18/2021 Neural Networks


Lecture 6 Review
❑ Logistic Regression ❑ Support Vector Machines
➢ In logistic regression, the ➢ Find the best linear classifier
cost function is basically a by maximizing the margin (The
measure of how often you distance to the nearest point
on either side of the line).
predicted 1 when the true
𝟏 𝟐
answer was 0, or vice ❑ The algorithm minimizes 𝒘
𝟐
versa. with the constraints that:
❑ softmax regression allows us w′x − b ≥ 1 when 𝑦 𝑖 = +1
to handle multiclass w′x − b ≤ −1 when 𝑦 𝑖 = −1
classification. ❑ For nonlinear datasets – create
classifier by increasing the number
of dimensions where we can draw a
hyperplane to separate them.

3 7/18/2021 Neural Networks


Lesson 7 Objectives

❑ Explain Artificial Neuron model


❑ Explain the perceptron model
❑ Define the loss functions
❑ Apply Perceptron model for solving simple classification
tasks
❑ Derive the Adaptive Linear Neurons (ADALINE)
algorithm and delta rule
❑ Use ADALINE for solving simple classification tasks

4 7/18/2021 AI for Software Developers


Artificial Neuron Model

❑ The first mathematical method for representing the learning function


of the human brain can be assigned to McCulloch and Pitts, in their
1943 paper A Logical Calculus of Ideas Immanent in Nervous
Activity:

5 7/18/2021 AI for Software Developers


Neuron Model

❑ In biology terms, a neuron is a cell that can transmit and process


chemical or electrical signals.
❑ The neuron is connected with other neurons to create a network:
➢ Every neuron has an
input, called the dendrite,
a cell body, and an output
called the axon.
➢ Outputs connect to inputs
of other neurons and
the network develops.
➢ The cell body determines
the weight of the input signal.
6 7/18/2021 AI for Software Developers
Machine Learning Hands-on for developers and technical Professionals, Wiley
Artificial Neuron Model

❑ Artificial neurons are designed to mimic aspects of their biological


counterparts.

❑ The artificial neuron receives one or more inputs at neural


dendrites and sums them to produce an output representing a
neuron's action potential which is transmitted along its axon.

7 7/18/2021 AI for Software Developers


https://en.wikipedia.org/wiki/Artificial_neuron
Artificial Neuron Model

❑ An artificial neuron:
➢ takes a number of inputs
➢ calculates a simple weighted sum
➢ adds a bias
➢ decides whether it should be
“fired” or not
❑ A neuron maintains a
set (a vector) of weights.
❑ Each input to the neuron
is multiplied by its
corresponding weight that determines how important a specific
input signal is to this neuron.
8 7/18/2021 AI for Software Developers
https://en.wikipedia.org/wiki/Artificial_neuron
Artificial Neuron Model

❑ Each neuron also has a bias that’s added to the sum of the
weighted inputs before making the decision to “fire” or not.
➢ The bias can be seen as a modifier to the threshold of the
neuron's activation.
❑ We use an activation function to decide whether the neuron
should activate (“fire”) or not.
❑ Rosenblatt proposed a simple rule to compute the output.
❑ The neuron's output, 0 or 1, is determined by whether the weighted
sum is less than or greater than some threshold value.
0 𝑖𝑓 ෍ 𝑤𝑖 𝑥𝑖 ≤ threshold
𝑜𝑢𝑡𝑝𝑢𝑡 =
1 𝑖𝑓 ෍ 𝑤𝑖 𝑥𝑖 > threshold

9 7/18/2021 AI for Software Developers


Artificial Neuron Model

❑ Let’s express the activation function in a vector notation:


➢ Let w be the weights vector:
𝒘 = 𝑤1 , 𝑤2 , 𝑤3
➢ Let x be the input vector of features:
𝑥1
𝒙 = 𝑥2
𝑥3
❑ If b is the bias, y the neuron’s output and f the activation
function, then:
𝒚 = 𝐟(𝒘𝟏 𝒙𝟏 +𝒘𝟐 𝒙𝟐 +𝒘𝟑 𝒙𝟑 + 𝐛) = 𝐟(𝐰 ∙ 𝐱 + 𝐛)
❑ Activation functions takes the weighted sum of inputs plus the
bias as input and perform the necessary computation to decide
whether the neuron should activate (“fire”) or not.

10 7/18/2021 AI for Software Developers


Perceptron Model

❑ The original model for an artificial neuron was called the


perceptron (Rosenblatt, 1958), and its activation function was a
step function:

0 𝑖𝑓 w ⋅ x + b ≤ 0
𝑜𝑢𝑡𝑝𝑢𝑡 = ቊ
1 𝑖𝑓 w ⋅ x + b > 0

❑ Here the bias b = -threshold.


❑ You can think of the bias as a measure of how easy it is to get the
perceptron to output 1 (or fire).
❑ The w parameters are unknown - this is what we have to learn.

11 7/18/2021 AI for Software Developers


Perceptron model

❑ The Perceptron is a model of a single neuron that can be used


for two-class classification problems and provides the foundation
for much larger networks.

12 7/18/2021 AI for Software Developers


Perceptron Algorithm

1. Initialize all weights w to 0.


2. Iterate through the training data (set of instances):
➢ For each training instance, classify the instance:
#calculate the weighted sum
weighted_sum = np.dot(features, self.weights) + self.bias
# use step function as activation function
if weighted_sum > 0:
return 1
else:
return 0

13 7/18/2021 AI for Software Developers


Perceptron Algorithm

a) If the prediction (the output of the classifier) was correct, don’t do


anything.
b) If the prediction was wrong, modify the weights by using the update
rule:
𝑤𝑗 += (𝑦 𝑖 − 𝑓 𝑥 𝑖 )𝑥𝑗 (𝑖)
# Make a prediction based on current weights.
prediction = self.predict(features)
# Update the weights if the prediction is wrong.
if prediction != label:
gradient = label - prediction; # how far off are we?
for i in range(len(self.weights)):
self.weights[i] += gradient * features[i] #The perceptron update rule
self.bias += gradient;
return self # return weights and bias

14 7/18/2021 AI for Software Developers


Perceptron Algorithm

3. Repeat step 2 until the perceptron correctly classifies every


instance or the maximum number of iterations has been reached.
❑ This model is good for representing logic gates (NAND), which
produces an output that’s false only if all its inputs are true.

15 7/18/2021 AI for Software Developers


Linear Classifier Example

import numpy as np
class PerceptronModel:
def __init__(self, no_of_featues):
self.weights = np.zeros(no_of_featues)
self.bias = 0.

def predict(self,features):
#calculate the weighted sum
weighted_sum = np.dot(features, self.weights) + self.bias
# use step function as activation function
if weighted_sum > 0:
return 1
else:
return 0

16 7/18/2021 AI for Software Developers


Linear Classifier Example

def train(self, features, label):

# Make a prediction based on current weights.


prediction = self.predict(features)
# Update the weights if the prediction is wrong.
if prediction != label:
gradient = label - prediction; # how far off are we?
for i in range(len(self.weights)):
self.weights[i] += gradient * features[i] #The perceptron update rule

self.bias += gradient;

return self # return weights and bias

17 7/18/2021 AI for Software Developers


Linear Classifier Example

# Create the model


p = PerceptronModel(2);
# Train the model with input with AND results.
for i in range(0,100):
p.train(np.array([1, 1]), 1)
p.train(np.array([1, 0]), 0)
p.train(np.array([0, 1]), 0)
p.train(np.array([0, 0]), 0)
#
print(p.predict(np.array([0, 0]))) # 0

18 7/18/2021 AI for Software Developers


Linear Classifier Example

# Friendly or not friendly example using teeth and size features


# X1 data - teeth number - is scaled to [0,1]
X1 = np.array([0.27, 0.09, 0.00, 0.23, 0., 1.00, 0.32])
# X2 data - size scaled to [0,1]
X2 = np.array([0.50, 0.48, 0.12, 0.00, 1.00, 0.73, 0.33])
# labels data or the output
labels = np.array([1, 1, 1, 0, 1, 0, 0])
# Create the perceptron model with two features
p = PerceptronModel(2);
# Train the model using 100 iterations
for i in range (0,100):
for j in range(len(X1)):
p.train(np.array([X1[j], X2[j]]), labels[j]);

19 7/18/2021 AI for Software Developers


Linear Classifier Example

print("weights and bias: ")


print(p.bias)
print(p.weights)
# y = (-w1/w2)x + (-bias/w2) = a*x + b
# line coefficients
a = -weights[0] / weights[1]
b = -bias / weights[1]
print('a and b: ')
print(a)
print(b)
# Now we can use it to categorize samples it's never seen.
# For example: something with 29 teeth and a size of 23 cm, likely to be nice ?
predictionResult = p.predict(np.array([0.76,0.07])) # teeth number, size of the animal
print(predictionResult)

20 7/18/2021 AI for Software Developers


Linear Classifier Example

import matplotlib.pyplot as pl
%matplotlib inline

x = np.linspace(0,1);
# function to compute the line y=a*x+b
f = lambda x: a * x + b;

fig =pl.figure()
figa = pl.gca();

pl.plot(X1,X2,'bo'); # plot the points


pl.plot(x,f(x),'r') # plot the line

21 7/18/2021 AI for Software Developers


Linear Classifier Example

# Linearly separate the points by the line


points = np.zeros([len(X1),1]);
for i in range(len(X1)):
if(f(X1[i]) > X2[i]):
# Point is below line
points[i] = 1;
pl.plot(X1[i],X2[i],'go')
else:
# Point is above line
points[i] = -1;
pl.legend(['Above','Separator','Below'],loc=0)
pl.title('Selected points with their separating line.')
figa.axes.get_xaxis().set_visible(False)
figa.axes.get_yaxis().set_visible(False)

22 7/18/2021 AI for Software Developers


Adaline and the Delta Rule

❑ Adaptive Linear Neurons (ADALINE) is an improvement of


Perceptron because it updates the weights based on a linear
activation function rather than a unit step function.
❑ It is also referred to as Widrow-Hoff rule or Delta rule.
➢ This linear activation function is just the identity function of the
net input g(wT x) = wT x.
❑ One of the biggest advantages of the linear activation function over
the unit step function is that it is differentiable.
❑ The fact that the activation function is differentiable allows us to
define a cost function J(w) that we can minimize in order to
update our weights.

23 7/18/2021 AI for Software Developers


Perceptron versus Adaline

24 7/18/2021 AI for Software Developers


https://sebastianraschka.com/faq/docs/diff-perceptron-adaline-neuralnet.html
Derivation of ADALINE Rule

❑ For each instance in the training set, we classify that instance (that
is, prediction mode) and compare the predicted output to the
desired output.
❑ A good metric to use for comparing desired versus actual results
is mean squared error (MSE):
➢ It is the sum, over all the data points, of the square of the
difference between the predicted output, 𝒇(𝒙(𝒊) , and desired
output,𝒚(𝒊) , divided by the number of data points, n:
𝒏
𝟏 (𝒊) (𝒊) 𝟐
𝑬= ෍ 𝒚 − 𝒇(𝒙 )
𝒏
𝒊=𝟏
➢ This is the cost function we already know from the linear
regression model!

25 7/18/2021 AI for Software Developers


Derivation of ADALINE Rule

❑ The cost function is a function of the weights and biases of the


model:
𝒏
𝟏 𝟐
𝑱(𝒘) = ෍ 𝒚(𝒊) − 𝒇(𝒙(𝒊) )
𝟐
𝒊=𝟏
𝟏
➢ Here is used to simplify the calculations.
𝟐
❑ The goal in training the perceptron is to minimize the cost
function - to ideally zero.
❑ We can use the gradient descent method to find the minimum.
❑ Then we can update the weights using the formula seen before:
𝑤 = 𝑤 + ∆𝑤,
𝑤ℎ𝑒𝑟𝑒 ∆𝑤 = −𝜂𝛻𝐽 𝑤 𝑎𝑛𝑑 𝜂 is the step size or learning rate.

26 7/18/2021 AI for Software Developers


Derivation of ADALINE Rule

❑ The partial derivative of the cost function for a particular weight 𝑤𝑗


can be calculated as follows:
𝜕𝐽 𝜕 𝟏 𝒏 𝟐 𝟏 𝜕
❑ = 𝒊
σ𝒊=𝟏 𝒚 − 𝒇 𝒙 𝒊
= σ𝒏𝒊=𝟏 ቀ𝒚 𝒊 −
𝜕𝑤𝑗 𝜕𝑤𝑗 𝟐 𝟐 𝜕𝑤𝑗
𝟐 𝟏 𝜕
𝒇 𝒙 ቁ = σ𝒏𝒊=𝟏 𝟐( 𝒚 𝒊 − 𝒇 𝒙 𝒊
𝒊 (𝒚 𝒊 − 𝒇 𝒙 𝒊 =σ𝒏𝒊=𝟏( 𝒚 𝒊 −
𝟐 𝜕𝑤𝑗
𝜕
𝒇 𝒙𝒊 (𝒚 𝒊 -σ𝑚
𝑗=1 (𝑤𝑗 𝑥 𝑖 𝑗 + 𝑏))=σ𝑛𝑖=1 (𝒚 𝒊 − 𝒇 𝒙 𝒊 (-𝑥 𝑖 𝑗 )
𝜕𝑤𝑗

❑ We can now calculate ∆𝑤𝑗 = −𝜂𝛻𝐽 𝑤𝑗 =−𝜂 σ𝑛𝑖=1 (𝒚 𝒊 −𝒇 𝒙 𝒊 (-


𝑥 𝑖 𝑗 ) = 𝜂 σ𝑛𝑖=1 (𝒚 𝒊 − 𝒇 𝒙 𝒊 𝑥 𝑖 𝑗
∆𝑤𝑗 +=𝜂 σ𝑛𝑖=1 (𝒚 𝒊
−𝒇 𝒙 𝒊
𝑥 𝑖
𝑗
❑ This formula is known as ADALINE (Adaptive Linear Neurons)
rule.
27 7/18/2021 AI for Software Developers
ADALINE Algorithm

1. Initialize the weights with zeros or small random numbers.


2. Select an input vector X and present it to the network.
3. Compute the output of the network for the input vector specified and the
values of the weights.
5. Compute the cost function based on mean-squared error, comparing the
model output y with the correct output o:
𝐸 = (𝑜 − 𝑦)2
This seems very much like a regression problem!
6. Adjust the weights with the following gradient descent recursion:
𝑤 = 𝑤 + 𝜂 (o-y)x
6. Return to step 2:

28 7/18/2021 AI for Software Developers


ADALINE rule

❑ Two main differences from the perceptron rule:


1. Here, the output “f” is a real number and not a class
label as in the perceptron learning rule.
2. The weight update is calculated based on all
samples in the training set (instead of updating the
weights incrementally after each sample), which is
why this approach is also called “batch” gradient
descent.

29 7/18/2021 AI for Software Developers


ADALINE example

❑ Adaline Test -
https://sebastianraschka.com/Articles/2015_singlelayer_neurons.html
❑ See the entire code on eCentennial
import numpy as np

class AdalineGD(object):

def __init__(self, eta=0.01, epochs=50):


self.eta = eta
self.epochs = epochs

30 7/18/2021 AI for Software Developers


ADALINE example

def train(self, X, y):

self.w_ = np.zeros(1 + X.shape[1])


self.cost_ = []

for i in range(self.epochs):
output = self.net_input(X)
errors = (y - output)
self.w_[1:] += self.eta * X.T.dot(errors)
self.w_[0] += self.eta * errors.sum()
cost = (errors**2).sum() / 2.0
self.cost_.append(cost)
return self

31 7/18/2021 AI for Software Developers


ADALINE example

def net_input(self, X):


return np.dot(X, self.w_[1:]) + self.w_[0]

def activation(self, X):


return self.net_input(X)

def predict(self, X):


return np.where(self.activation(X) >= 0.0, 1, -1)

32 7/18/2021 AI for Software Developers


ADALINE example

33 7/18/2021 AI for Software Developers


Limitations of early models

❑ The book Perceptrons by Minsky and Papert in 1969:


➢ Perceptrons could only work on linearly separable problems.
➢ Perceptron model cannot for example represent the XOR
model, implying that larger networks have similar limitations!
❑ Neither class (cross nor
point) is linearly
separable; that is,
we cannot separate
the two with any linear
function on the plane.
❑ It’s often believed that the interpretations of this book caused a
decline in neural net research in the 1970s and early 1980s

34 7/18/2021 AI for Software Developers


References

❑ Textbook
❑ Vahid Mirjalili; Sebastian Raschka, Python Machine Learning -
Third Edition, by, Published by Packt Publishing, 2019
❑ Andriy Burkov The Hundred-Page Machine Learning Book.
❑ Single-Layer Neural Networks and Gradient Descent - Sebastian
Raschka
❑ https://www.learnpython.org/

35 7/18/2021 AI for Software Developers

You might also like