0% found this document useful (0 votes)
26 views38 pages

Chap11 Neural Nets

Business Data Science

Uploaded by

Jakhongir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views38 pages

Chap11 Neural Nets

Business Data Science

Uploaded by

Jakhongir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 38

Chapter 11

Neural Network
Basic idea
• Combine input information in a complex & flexible neural
net “model”
• Model “coefficients” are continually tweaked in an
iterative process
• The network’s interim performance in classification and
prediction informs successive tweaks
Basic idea

X1 X2 X3 Y Input Black box


1 0 0 0
1 0 1 1 X1
1 1 0 1 Output
1 1 1 1
0 0 1 0
X2 Y
0 1 0 0
0 1 1 1 X3
0 0 0 0

Output Y is 1 if at least two of the three inputs are equal to 1.


Basic idea
Input
nodes Black box
X1 X2 X3 Y
1 0 0 0 Output
1 0 1 1 X1 0.3 node
1 1 0 1
1 1 1 1
X2 0.3
0 0 1 0
 Y
0 1 0 0
0 1 1 1 X3 0.3 t=0.4
0 0 0 0

Y  I (0.3 X 1  0.3 X 2  0.3 X 3  0.4  0)


1 if z is true
where I ( z )  
0 otherwise
Basic idea
• Model is an assembly of
inter-connected nodes
and weighted links
Input
• Output node sums up nodes Black box
each of its input value X1
Output
w1 node
according to the weights w2
of its links X2  Y
w3
• Compare output node X3 t
against some threshold
t
General structure
• Multiple layers
- Input layer (raw observations)
- Hidden layers
- Output layer
• Nodes
• Weights (like coefficients, subject to iterative
adjustment)
• Bias values (also like coefficients, but not
subject to iterative adjustment)
General structure
x1 x2 x3 x4 x5
Input Neuron i Output
Input
Layer
I1 wi1
wi2 Activation
I2
wi3
Si function Oi Oi
g(Si )
I3
Hidden
Layer
threshold, t

Output
Layer Training ANN means learning
the weights of the neurons
y
Multi-layer perceptron
Learning algorithm
• Initialize the weights (w0, w1, …, wk)
• Adjust the weights in such a way that the output of ANN
is consistent with class labels of training examples
- Objective function:

E   Yi  f ( wi , X i )
2

i
- Find the weights wi’s that minimize the above objective
function
 e.g., backpropagation algorithm
Example: Fat & Salt Content to Predict
Consumer Acceptance of Cheese
Example: Data
Moving through the
Network
Input layer
• For input layer, input = output
- e.g., for record #1:
Fat input = output = 0.2
Salt input = output = 0.9

• Output of input layer = input into hidden


layer
Hidden layer
• In this example, it has 3 nodes
• Each node receives as input the output
of all input nodes
• Output of each hidden node is a
function of the weighted sum of inputs
  p
𝑜𝑢𝑡𝑝𝑢𝑡 𝑗=𝑔(Θ j + ∑ wij x i)
i=1
Weights
• The weights q (theta) and w are typically
initialized to random values in the range
-0.05 to +0.05
• Equivalent to a model with random
prediction (in other words, no predictive
value)
• These initial weights are used in the first
round of training
Initial pass of the network
Output of node 3
if g is a Logistic function

  p
𝑜𝑢𝑡𝑝𝑢𝑡 𝑗=𝑔(Θ j + ∑ wij x i)
i=1

  1
𝑜𝑢𝑡𝑝𝑢𝑡 3 = ¿¿
1 +𝑒 ¿
Output layer
• The output of the last hidden
• layer
becomes input for the output layer
• Uses same function as above, i.e. a
function g of the weighted average

  1
𝑜𝑢𝑡𝑝𝑢𝑡 6 =
1 +𝑒¿¿ ¿
Output node

If cutoff for “1” is 0.5,


then classify as “1”
Relation to linear regression
• A net with a single output node and no hidden layers,
where g is the identity function, takes the same form as
a linear regression model

  p
^𝑦 =Θ+ ∑ wi xi
i=1
Training the model
Preprocessing Steps
• Scale variables to 0-1
• Categorical variables
- If equidistant categories, map to equidistant
interval points in 0-1 range
- Otherwise, create dummy variables
• Transform (e.g., log) skewed variables
Initial Pass Through Network
• Goal: Find weights that yield best
predictions
• The process we described below is
repeated for all records
- At each record compare prediction to actual
- Difference is the error for the output node
- Error is propagated back and distributed to all
the hidden nodes and used to update their
weights
Back Propagation (“back-prop”)

• Output from output node k: ^𝑦 𝑘


 

• Error associated with that node:


 
¿
Error is Used to Update Weights

𝑛𝑒𝑤 𝑜𝑙𝑑
 
𝜃 𝑗 =𝜃 𝑗 +𝑙 ( 𝑒𝑟𝑟 𝑗 )
 
¿
l = constant between 0 and 1, reflects the “learning
rate” or “weight decay parameter”
Case Updating
• Weights are updated after each record
is run through the network
• Completion of all records through the
network is one epoch (also called
sweep or iteration)
• After one epoch is completed, return to
first record and repeat the process
Batch Updating
• All records in the training set are fed to
the network before updating takes
place
• In this case, the error used for updating
is the sum of all errors from all records
Common Criteria to Stop Updating
• When weights change very little from
one iteration to the next
• When the misclassification rate reaches
a required threshold
• When a limit on runs is reached
XLMiner Output: Final Weights

Note: XLMiner uses two output nodes (P[1] and P[0]); diagrams show just one output
node (P[1])
Fat/Salt Example: Final Weights
XLMiner: Final Classifications
Avoiding Overfitting
• With sufficient hidden nodes and training
iterations, neural net can easily overfit
the data

• To avoid overfitting:
- Track error in validation data
- Limit iterations
- Limit complexity of network
User Inputs
Specify Network Architecture
• Number of hidden layers
- Most popular – one hidden layer
• Number of nodes in hidden layer(s)
- More nodes capture complexity, but increase
chances of overfit
• Number of output nodes
- For classification, one node per class (in binary
case can also use one)
- For numerical prediction use one
Network Architecture, cont.
• Learning Rate l
- Low values “downweight” the new
information from errors at each iteration
- This slows learning, but reduces tendency
to overfit to local structure
Advantages

• Good predictive ability


• Can capture complex relationships
• No need to specify a model
Disadvantages
• Considered a “black box” prediction machine,
with no insight into relationships between
predictors and outcome
• No variable-selection mechanism, so you have
to exercise care in selecting variables
• Heavy computational requirements if there are
many variables (additional variables
dramatically increase the number of weights to
calculate)
Summary
Neural networks can be used for classification and
prediction
Can capture a very flexible/complicated relationship
between the outcome and a set of predictors
The network “learns” and updates its model iteratively
as more data are fed into it
Major danger: overfitting
Requires large amounts of data
Good predictive performance, yet “black box” in nature

You might also like