0% found this document useful (0 votes)
38 views

Neural Networks (Representation) : 1a. Non-Linear Hypothesis

Neural networks are motivated by the brain's ability to learn complex nonlinear patterns. They represent information with interconnected groups of neurons that process input with weights and activation functions. A neural network has an input layer, hidden layers, and an output layer, with connections between layers represented by weight matrices. During forward propagation, activations are calculated layer by layer. Neural networks can learn their own internal representations of the input and perform complex tasks like classification and regression by adjusting weights during training.

Uploaded by

Julia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views

Neural Networks (Representation) : 1a. Non-Linear Hypothesis

Neural networks are motivated by the brain's ability to learn complex nonlinear patterns. They represent information with interconnected groups of neurons that process input with weights and activation functions. A neural network has an input layer, hidden layers, and an output layer, with connections between layers represented by weight matrices. During forward propagation, activations are calculated layer by layer. Neural networks can learn their own internal representations of the input and perform complex tasks like classification and regression by adjusting weights during training.

Uploaded by

Julia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Neural Networks (Representation)

1. Motivations
I would like to give full credits to the respective authors as these are my personal python notebooks
taken from deep learning courses from Andrew Ng, Data School and Udemy :) This is a simple
python notebook hosted generously through Github Pages that is on my main personal notes
repository on https://github.com/ritchieng/ritchieng.github.io. They are meant for my personal
review but I have open-source my repository of personal notes as a lot of people found it useful.
1a. Non-linear Hypothesis
 You can add more features
o But it will be slow to process
 If you have an image with 50 x 50 pixels (greyscale, not RGB)
o n = 50 x 50 = 2500
o quadratic features = (2500 x 2500) / 2

o
 Neural networks are much better for a complex nonlinear hypothesis
1b. Neurons and the Brain
 Origins
o Algorithms that try to mimic the brain
 Was very widely used in the 80s and early 90’s
o Popularity diminished in the late 90’s
 Recent resurgence
o State-of-the-art techniques for many applications
 The “one learning algorithm” hypothesis
o Auditory cortex handles hearing
 Re-wire to learn to see
o Somatosensory cortex handles feeling
 Re-wire to learn to see
o Plug in data and the brain will learn accordingly
 Examples of learning

2. Neural Networks
2a. Model Representation I
 Neuron in the brain
o Many neurons in our brain
o Dendrite: receive input
o Axon: produce output
 When it sends a m

e
ssage through the Axon to another neuron
 It sends to another neuron’s Dendrite 
 Neuron model: logistic unit
o Yellow circle: body of neuron
o Input wires: dendrites
o Output wire: axon 
 Neural Network
o 3 Layers
 1 Layer: input layer
 2 Layer: hidden layer
 Unable to observe values
 Anything other than input or output layer
 3 Layer: output
layer 
 We calculate each of the layer-2 activations based on the input values with the
bias term (which is equal to 1)
 i.e. x0 to x3
 We then calculate the final hypothesis (i.e. the single node in layer 3)
using exactly the same logic, except in input is not x values, but the
activation values from the preceding layer
 The activation value on each hidden unit (e.g. a12 ) is equal to the sigmoid
function applied to the linear combination of inputs
 Three input units
 Ɵ(1) is the matrix of parameters governing the mapping of the input units
to hidden units
 Ɵ(1) here is a [3 x 4] dimensional matrix
 Three hidden units
 Then Ɵ(2) is the matrix of parameters governing the mapping of the
hidden layer to the output layer
 Ɵ(2) here is a [1 x 4] dimensional matrix (i.e. a row vector)
 Every input/activation goes to every node in following layer
 Which means each “layer transition” uses a matrix of parameters with

the following significance 


 j (first of two subscript numbers)= ranges from 1 to the number
of units in layer l+1
 i (second of two subscript numbers) = ranges from 0 to the
number of units in layer l
 l is the layer you’re moving FROM

 
 Notation 

2a. Model Representation II


 Here we’ll look at how to carry out the computation efficiently through a vectorized implementation.
We’ll also consider why neural networks are good and how we can use them to learn complex non-
linear things
 Forward propagation: vectorized implementation
o g applies sigmoid-function element-wise to z
o This process of calculating H(x) is called forward propagation
 Worked out from the first layer
 Starts off with activations of input unit
 Propagate forward and calculate the activation of each layer
sequentially 

 Similar to logistic regression if you leave out the first layer


o Only second and third layer
o Third layer resembles a logistic regression node
o The features in layer 2 are calculated/learned, not original

features 
o Neural network, learns its own features
 The features a’s are learned from x’s
 It learns its own features to feed into logistic regression
 Better hypothesis than if we were constrained with just x1, x2, x3
 We can have whatever features we want to feed to the final logistic regression
function
 Implemention in Octave for a2
 a2 = sigmoid (Theta1 *

x); 

 Other network architectures


o Layer 2 and 3 are hidden

layers 

2. Neural Network Application


2a. Examples and Intuitions I
 XOR/XNOR
o XOR: or
o XNOR: not

or   

 AND function
o Outputs 1 only if x1 and x2 are 1
o Draw a table to determine if OR or

AND 
 NAND function
o NOT AND 
 OR

function 

2b. Examples and Intuitions II


 NOT function

o
 XNOR function
o NOT XOR
o NOT an exclusive or
 Hence we would want
 AND
Neither 

2c. Multi-class Classification


 Example: identify 4 classes
o You would want a 4 x 1 vector for h_theta(X)
o 4 logistic regression classifiers in the output
layer 

o There will be 4 output


o y would be a 4 x 1 vector instead of an
integer 

You might also like