0% found this document useful (0 votes)
25 views110 pages

Intro To Neural Networks Explained For Beginners: Sajjad Mustafa

The document provides an introduction to neural networks aimed at beginners, covering key concepts such as perceptrons, linear equations, activation functions, and multi-class classification. It explains the training process of perceptrons and introduces methods like one-hot encoding and the softmax function for handling categorical data. Additionally, it discusses error functions and the importance of maximizing probabilities in model training.

Uploaded by

yeshwanth vemula
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views110 pages

Intro To Neural Networks Explained For Beginners: Sajjad Mustafa

The document provides an introduction to neural networks aimed at beginners, covering key concepts such as perceptrons, linear equations, activation functions, and multi-class classification. It explains the training process of perceptrons and introduces methods like one-hot encoding and the softmax function for handling categorical data. Additionally, it discusses error functions and the importance of maximizing probabilities in model training.

Uploaded by

yeshwanth vemula
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 110

Intro to Neural Networks

Explained for beginners

Sajjad Murtaza
Kashif
Mustafa
AI Sciences Instructor

@AISciencesLearn
Problem
10
Criteria
9 Series1; 9

academic Marks
6

2 2

0
2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5
Marks in Test
Problem
10 Criteria
9 Series1; 9

7 ?
academic Marks
6 6

2 2

0
2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5
Marks in Test
Problem
10 Criteria
9

academic Marks
6

0
2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5
Marks in Test
Problem
10 Criteria
9

academic Marks
6

0
2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5
Marks in Test
Problem
10 Criteria
9

academic Marks
6

0
2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5
Marks in Test
Linear Equation
10 Criteria
9 • 3x + y – b = 0
8

7
academic Marks

5
• 3*test +academia – b = 0
4

2 • If score is > 0 i.e Positive 


1 Accept
0
2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 • If score is < 0 i.e Negative 
Marks in Test
Reject
Linear Equation Vectorized Form
10 Criteria
9 w1x1 + w2x2 + b = 0
8

7
If:
academic Marks

5
• W = (w1,w2)
4
• x = (x1,x2)
3 • y = label:
2 • 0 if False
1 • 1 if True
0
2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 • Then:
Marks in Test
• Wx + b = 0
• ŷ  1 if Wx + b>0
• ŷ  0 if Wx + b<0
Higher Dimensional Space

w1X + w2Y + w3Z + b = 0

w1x1 + w2x2 + w3x3 + b = 0

Wx + b = 0

• Wx + b = 0
• ŷ  1 if Wx + b>0
• ŷ  0 if Wx + b<0
N Dimensional Space
Test Academia . . . . . . . . N
W1x1 + W2x2 + ….. + WnXn + b = 0
Emp1 5 2 …. 8
W = (w1, w2,…, w3)
Emp2 8 7 6 9 x = (x1, x2, … , x3)

. … … … … Wx + b = 0
.
Emp n 6 5 … 7 • Wx + b = 0
• ŷ  1 if Wx + b>0
• ŷ  0 if Wx + b<0
Perceptron

X1
W1

Wx + b = 0
1 if Wx + b>0
X2 W2 0 if Wx + b<0

b
1
Perceptron
X1
W1

Wx + b = 0
1 if Wx + b>0
X2 Yes / No
W2 0 if Wx + b<0

….

Wn
Xn
b

1
Human Brain
X1
W1

Wx + b = 0
1 if Wx + b>0
X2 Yes / No
W2 0 if Wx + b<0

….

Wn
Xn
b

1
Using Logical Gates for P1 P2 P1 AND P2
Perceptron 1 1 1

AND Gate 1 0 0

0 1 0

0 0 0
Perceptron 1 1/0

AND

Perceptron 2 1/0
Using Logical Gates for P1 P2 P1 AND P2
Perceptron 1 1 1

OR Gate 1 0 1

0 1 1

0 0 0
Perceptron 1 1/0

OR

Perceptron 2 1/0
Perceptron’s Training
Perceptron’s Training
Perceptron’s Training
When a Negative point is Positively
2x1 + 3x2 – 7 > 0 Labeled, then we Subtract

• Line Eq: 2x1 + 3x2 – 7 = 0


X2
2x1 + 3
x2 – 7
• Wrong Point: (3,4)
=0 3,4

2 3 -7
3 4 1 (Subtract)

2x1 + 3x2 – 7 < 0 -1 -1 -8


X1
Rapid Change may Misclassify other
Point
Perceptron’s Training When a Negative point is Positively
2x1 + 3x2 – 7 > 0 Labeled, then we Subtract

• Line Eq: 2x1 + 3x2 – 7 = 0


X2
2x1 + 3
x2 – 7
• Wrong Point: (3,4)
=0 3,4 • Learning Rate: 0.1 (0--1)

2 3 -7
- 3(0.1) 4(0.1) 1(0.1) (Subtract)

2x1 + 3x2 – 7 < 0 1.7 2.6 -7.1


X1
New Line: 1.7x1 + 2.6x2 - 7.1 = 0
Perceptron’s Training When a Positive point is Negatively
2x1 + 3x2 – 7 > 0 Labeled, then we Add

• Line Eq: 2x1 + 3x2 – 7 = 0


X2
2x1 + 3
x2 – 7
• Wrong Point: (4,1)
4,1 =0 • Learning Rate: 0.1 (0--1)

2 3 -7
+ 4(0.1) 1(0.1) 1(0.1) (Add)

2x1 + 3x2 – 7 < 0 2.4 3.1 -6.9


X1
New Line: 2.4x1 + 3.1x2 - 6.9 = 0
Perceptron Algorithm

 Loop over all Points


 Start with random Weights
 If point is classified correctly
 Move on
 If point is misclassified
 If prediction is 0
 Change wi + axi
 If prediction is 1
 Change wi - axi
Problem With “Linear
Solutions”
10 Criteria
9
We don’t want
8 to hire an
7 employee with
academic Marks such a bad
6

5
“academic record“
4

0
2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5
Marks in Test
Problem With “Linear
Solutions”
10 Criteria
9

academic Marks
6

0
2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5
Marks in Test
Possible Solution to the Problem

10 Criteria
9

academic Marks
6

0
2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5
Marks in Test
Possible Solution to the Problem

10 Criteria
9

academic Marks
6

0
2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5
Marks in Test
Error Function
Error Function

Error : Height
Error Function

Error = 2
Error Function

Error = 2
Discrete vs Continuous

Error : Height
Log Loss Error Function

Error = 0.1+0.1+0.1+5 +
0.1+0.1+0.1+5
Activation Function
Step Function
X1
W1

Wx + b = 0 1
1 if Wx + b>0
X2 W2 0 if Wx + b<0
0
….

Wn
Xn
b

1
Step Function
X1
W1

Wx + b = 0 0.6
1 if Wx + b>0
X2 W2 0 if Wx + b<0
0.3
….

Wn 0.1
Xn
b

1
Multi-Class Classification

Chances of Rain:
60%(0.6) YES
40%(0.4) NO
Multi-Class Classification

P(Car) = 0.67 P(Bike) = 0.24 P(Bi-Cycle) = 0.09

Score = 2 Score = 1 Score = 0

2 1 0
2+1+0 2+1+0 2+1+0

Problem: Negative Numbers


Softmax Function

P(Bi-Cycle) = 0.09
P(Car) = 0.67 P(Bike) = 0.24

Score = 2 Score = 1 Score = 0

2 1 0
2+1+0 2+1+0 2+1+0

2 1 0
𝑒 𝑒 𝑒
2 1 0 2 1 0 2 1 0
𝑒 + 𝑒 +𝑒 𝑒 + 𝑒 +𝑒 𝑒 + 𝑒 +𝑒
Quiz
 Write a function in python that’ll receive a list of
numbers and returns a list that contains the Softmax
value of every number.
One Hot Encoding

Vehicle Value

2
One Hot Encoding
Vehicle Car Bike Bi-Cycle

1 0 0

0 1 0

0 0 1
Maximum Likelihood

Model A Model B
Maximum Likelihood
P(green)=0.9 P(green)=0.6
P(red)=0.1 P(green)=0.6
Model A
P(red)=0.4 P(red)=0.7 0.9 * 0.4 * 0.3 * 0.8 = 0.0864
P(green)=0.3 P(green)=0.8

P(red)=0.7 P(green)=0.2

P(red)=0.8 P(red)=0.9

Model A Model B Model B


0.6 * 0.8 * 0.7 * 0.9 = 0.3024
Error vs Probability

Error Probability
Maximizing Probabilities
P(green)=0.9
Product
P(green)=0.6
P(red)=0.1 P(green)=0.6

P(red)=0.4 P(red)=0.7
P(green)=0.3 P(green)=0.8

P(red)=0.7 P(green)=0.2
Model A
P(red)=0.8 P(red)=0.9
0.9 * 0.4 * 0.3 * 0.8 = 0.0864
Model A Model B

Products are bad Model B


Sums are Good 0.6 * 0.8 * 0.7 * 0.9 = 0.3024
Quiz
 Which Function can be used to replace product with
Sums?
A. Sin
B. Cos
C. Exp
D. Log

 Log(ab) = Log(a)+Log(b)
Cross-Entropy
Goal is to minimize the Cross-Entropy
Model A
0.9 * 0.4 * 0.3 * 0.8 = 0.0864
Log(0.9) + Log(0.4) + Log(0.3) + Log(0.8) = -1.06348625752

-Log(0.9) -Log(0.4) -Log(0.3) -Log(0.8) = 1.06348625752

Model B
0.6 * 0.8 * 0.7 * 0.9 = 0.3024
Log(0.6) + Log(0.8) + Log(0.7) + Log(0.9) = -0.51941821317

-Log(0.6) -Log(0.8) -Log(0.7) -Log(0.9) = 0.51941821317


Events vs Probability

Events Probability

Cross-
Entroppy
Cross-Entropy Formulation
Selected NOT Selected Cross-Entropy
-ln(0.8) - ln(0.1)

P1 = 0.8
P2 = 0.1
P(selected) = 0.8 P(selected) = 0.9
p1 1 - p2
Y1 = 1 Y2 = 0
• Y: 1 if selected
• Y: 0 if NOT selected
Cross-Entropy for Multi-class
Vehicle G1 G2 G3

0.8 0.3 0.2


p11 p12 p13

0.1 0.1 0.6


p21 p22 p23

0.1 0.6 0.2


p31 p32 p33
Quiz
 What is relation between cross-Entropy and
Probability?

 Directly proportional
 Inversely proportional
Minimizing the Error Function

Gradient Decent
Gradient Decent

Error : Height

Convex Function
Convex Functions
 Curve is like a bowl
 Derivatives possible
Derivatives
 Derivatives are also called Slopes.
 F(a) = 2a When: a = 1  f(a) = 2
25
When: a = 5  f(a) = 10
When: a = 5.001  f(a) = 10.002
20

Slope = height/width
15

10

0
15 20 25 30 35 40
How Gradient Decent Works
Gradient Step

 Wi’ = Wi – learningRate (d/dE)


 Wi’ = Wi – learningRate (-(y - yhat)xi )
 Wi’ = Wi + learningRate (y - yhat)xi

 b’ = b + learningRate(y – yhat)
Logistic Regression Algorithm
 Start with random weights
 w1, w2,……., wn, b
 For Every point(x1, x2,……, xn)
 Update W’
 Update b’
 Repeat until error is small

Perceptron Algorithm?
To Do
 Sigmoid activation function
 𝜎(𝑥)=1/1+𝑒−𝑥
 Output (prediction) formula
 yhat=𝜎(𝑤1𝑥1+𝑤2𝑥2+𝑏)
 Error function
 𝐸𝑟𝑟𝑜𝑟(𝑦,yhat)=−𝑦log(yhat)−(1−𝑦)log(1− yhat)
 The function that updates the weights
 𝑤𝑖⟶𝑤𝑖+𝛼(𝑦− yhat)𝑥𝑖
 𝑏⟶𝑏+𝛼(𝑦− yhat)
Perceptron VS Gradient Decent
Perceptron Algo
Gradient Decent
• Start with random weights • Loop over all Points
• Start with random Weights
• w1, w2,……., wn, b
• If point is classified correctly
• For Every point(x1, x2,……, xn) • Move on
• If point is misclassified
• Update W’
• If prediction is 0
• Update b’ • Change wi + axi
• Repeat until error is small • If prediction is 1
• Change wi - axi
Problem With “Linear
Solutions”
10 Criteria
9
We don’t want
8 to hire an
7 employee with
academic Marks such a bad
6

5
“academic record“
4

0
2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5
Marks in Test
Problem With “Linear
Solutions”
10 Criteria
9

academic Marks
6

0
2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5
Marks in Test
Possible Solution to the Problem

10 Criteria
9

academic Marks
6

0
2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5
Marks in Test
Possible Solution to the Problem

10 Criteria
9

academic Marks
6

0
2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5
Marks in Test
Non-Linear Boundaries
Non-Linear Boundaries
0.7

0.7 + 0.8 = 1.5

0.8

Sigmoid(1.5) = 0.82
Weighted Sums
0.7

6 6 * 0.7
4.2 + 3.2 - 3 = 4.4

0.8

4 4 * 0.8
Sigmoid(4.4) = 0.98
Neural Networks

-3

4
Neural Networks
5x1 - 2x2 + 8

-2

7x1 - 3x2 - 1
7

-1

-3
Neural Networks
5x1 - 2x2 + 8

7 8

-2
7

-6

7x1 - 3x2 - 1 5

-1
5
-3
Neural Networks
5x1 - 2x2 + 8
X1
5

7 8

-2
7
X2
-6

7x1 - 3x2 - 1
5
X1
7

-1
5
-3
X2
Neural Networks
5x1 - 2x2 + 8

7 5 8
X1

7 7

-6

7x1 - 3x2 - 1
5
-2

X2 -1
5 -3
Adding Bias
5
X1 7
7

-2
5
X2 -3

8
1
-6
1 1
Architecture
INPUT Layer Hidden Layer

Output Layer
5
X1 7
7

-2
5
X2 -3

8
1
-6
1 1
DEEP Neural Network

5
X1
7

-2
X2 -3

8
1

1 1 1 1
Multi-Class Classification

5
X1
7

-2
X2 -3

8
1

1 1 1 1
Feed Forward

w11
X1 w31
w12

w21
w32
X2 w22
Feed Forward

w11
X1 w31
w12

w21
w32
X2 w22
Feed Forward
Yhat = 𝜎

𝜎 ( W2 𝜎 (W1) X )
W(1)11 W(2)11
X1
W(1)12
Yhat =
W(1)21 W(2)11
X2 W(1)22

W(1)31

1 W(2)11
W (1)
32
1
Yhat = 𝜎 o W4 𝜎 o W3 𝜎 o W2 𝜎 o W1(X)
DNN Feed Forward
W1 W2 W3

X1
W4

X2

1 1 1 1
Deep Learning Algorithm
 Doing a feedforward operation.
 Comparing the output of the model with the desired
output.
 Calculating the error.
 Running the feedforward operation backwards
(backpropagation) to spread the error to each of the
weights.
 Use this to update the weights, and get a better model.
 Continue this until we have a model that is good.
Back Propagation

w11
X1 w31
w12

w21
w32
X2 w22
Back Propagation Prediction

X1
W1
Error Function

X2 E(W) =
W2

….
Wn
Xn Gradient of Error Function

1
Yhat = 𝜎 o W4 𝜎 o W3 𝜎 o W2 𝜎 o W1(X)
Back Propagation in Deep Net
W1
W
2 W3 Error Function
E(W) =
X1
W4
Gradient of Error Function

X2

1 1 1 1
Chain Rule
W1 W2 W3

X1
W4

X2

1 1 1 1
Chain Rule

f g
x A B

A = f(x) B = g o f(x)
Back Propagation
Error Function
h1
E(W) =
W(1)11 W(2)11
X1
W(1)12 h
yhat
W(1)21 h2
W(2)11
X2 W(1)22

W(1)31

1 W(2)11
W (1)
32
1
Back Propagation
h1
h = W(2)11 𝜎(h1) + W(2)21 𝜎(h2) + W(2)31
W(2)11

h2
W(2)11

W(2)11
1
X0
W0

X2 W1

….
W5

X5
Optimizations
Underfitting & Overfitting
10 Criteria 10 Criteria
9 9

8 8

7 7
academic Marks

academic Marks
6 6

5 5

4 4

3 3

2 2

1 1

0 0
2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5
Marks in Test Marks in Test
Early Stopping
Epoch 1 Epoch 10 Epoch 100 Epoch 1000
10 10 10 10
8 8 8 8
6 6 6 6
academic Marks

academic Marks

academic Marks

academic Marks
4 4 4 4
2 2 2 2
0 0 0 0
2.5 3 3.5 4Marks
4.5 5in Test
5.5 6 6.5 7 7.5 2.5 3 3.5 4Marks
4.5 5in Test
5.5 6 6.5 7 7.5 2.5 3 3.5 4Marks
4.5 5in Test
5.5 6 6.5 7 7.5 2.5 3 3.5 4Marks
4.5 5in Test
5.5 6 6.5 7 7.5

Chart Title

Underfitting
Overfitting

Elbow
Series1
Quiz

(1,1)

• x1 + x 2
(-1,-1)

• 10x1 + 10x2
Regularization ŷ = w1x1 + w2x2

• 𝜎(1+1) = 0.88
• x1 + x 2

• 𝜎(-1-1) = 0.12
(1,1)

x1

• 𝜎(10+10) =
• 10x1 + 10x2
+ x2
=0

(-1,-1)

• 𝜎(-10-10) =
0.9999999979

0.0000000021
Regularization Problem: Large Coefficients  Overfitting

x1 + x 2 10x1 + 10x2
Regularization: Solution
Large Coefficients  Overfitting
Penalize Large Weights
(w1, w2, …, wn)

+ (|w1|+ … + |wn|)
L1 Error Function =

Error Function = + (w12+ … + wn2)


L2
Usages

L1 Regularization L2 Regularization

Sparsity : (1, 0, 1, 1, 0, 0) Sparsity : (0.3, 0.9, 0.5, -0.1, 0.2, 0.2)

Good for Feature Selection Good for Training Models


Dropout Dropout = 0.2
W1 W2 W3

X1
W4

X2

1 1 1 1
Local Minima Problem

Local Minima

Local Minima

Global Minima
Local Minima Solution: Random
Restart

Local Minima

Local Minima

Global Minima
Vanishing Gradient Problem
Vanishing Gradient Problem
Product of Small Numbers is….
h1 A very Small Number
W(1)11 W(2)11
X1
W(1)12 h
yhat
W(1)21 h2
W(2)11
X2 W(1)22

W(1)31

1 W(2)11
W (1)
32
1
Vanishing Gradient Problem

Local Minima

Local Minima

Global Minima
Vanishing Gradient: Solution
Activation Function: Tanh
Vanishing Gradient: Solution
Activation Function: ReLU
Summary: Activation Functions
Final Project
Predicting Species of Iris
W2
Architect
W1
X1 W3
P(A)
X2

P(B)
X3
P(C)

X4

1
1

1
Intro to Instructor
 I am Sajjad M.
 I have Over 5 Years of Teaching Experience in
University.
 I have worked on a lot of Industrial Projects as well.
 My Area of Interest is Data Science and Deep Learning.
 I have been working with Python for over 7 years now.
 Here is my email address: [email protected]
Intro to Course
 Introducing the  Logistic Regression  Back Propagation
Problem  Multiclass  Optimizations
 Initial Solution Classification
 Underfitting/Overfitting
 N-Dimensional Space  Multiclass
Classification  Early Stopping
 Perceptron vs
Human Brain  Softmax  Regularization
 Perceptron Training  One Hot Encoding  Dropout
 Linear Solutions  Cross Entropy  Vanishing Gradient
 Gradient Decent
 Non-Linear Solutions
 Deep Neural
 Final Project
 Error Functions
Netwoks
 Sigmoid  Feed Forward
Website : www.aisciences.io

You might also like