Fitting A Neural Network Model

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 9

Fitting a Neural Network Model

Neural Network Structure


The most widely used type of neural network in data analysis is the multilayer perceptron (MLP).
Multilayer perceptron models were originally inspired by neurophysiology and the interconnections
between neurons, and they are often represented by a network diagram instead of an equation.
The basic building blocks of multilayer perceptrons are called hidden units. Hidden units are modeled
after the neuron. Each hidden unit receives a linear combination of input variables. The coefficients are
called the (synaptic) weights. An activation function transforms the linear combinations and then outputs
them to another unit that can then use them as inputs.
The network diagram is arranged in layers. The first layer, called the input layer, contains any number of
inputs. The input layer connects to a hidden layer, which consists of hidden units. The hidden layer
connects to a final layer called the target, or output, layer. A multilayer perceptron can contain additional
hidden layers with any number of hidden units.
Because of a neural networks biological roots, its components receive different names from
corresponding components of a regression model. Instead of an intercept term, a neural network has a
bias term. Instead of parameter estimates, a neural network has weight estimates.
What makes neural networks interesting is their ability to approximate virtually any continuous
association between the inputs and the target. You simply specify the correct number of hidden units and
find reasonable values for the weights. Specifying the correct number of hidden units involves some trial
and error. Finding reasonable values for the weights is done by least-squares estimation (for interval-
valued targets). Neural networks are especially useful for prediction problems where one or more of the
following are true:

No mathematical formula is known that relates inputs to outputs.

Prediction is more important than explanation.

There is a lot of training data.


Beyond the Prediction Formula
In some ways, neural networks are similar to regressions.

The most prevalent problem for neural networks is missing values. Like regressions, neural
networks require a complete record for estimation and scoring. Neural networks resolve this
complication in the same way that regression does: by imputation.

Extreme or unusual values also present a problem for neural networks. The problem is mitigated
somewhat by the activation functions in the hidden units. These functions compress extreme input
values to between -1 and +1.
In other ways, they are different from regressions.

Nonnumeric inputs pose less of a complication to a properly tuned neural network than they do to
regressions. This is mainly due to the complexity optimization process.
2

Unlike standard regression models, neural networks easily accommodate nonlinear and non-
additive associations between inputs and target. In fact, the main challenge is over-
accommodationthat is, falsely discovering nonlinearities and interactions.
Using the Neural Network Tool
By default, the Neural Network tool uses a multilayer perceptron architecture. Each input unit is
connected to each hidden unit, and each hidden unit is connected to each output unit. Sometimes it can
improve the fit of your model to add direct connections between each input unit and each output unit.
When you develop a neural network, you can make many choices: the number of inputs to use, which
basic network architecture to use, the number of hidden layers to use, the number of units per hidden
layer, the activation and combination functions to use, and so on. If you have considerable prior
information about the function to be learned, you might be able to make some of these choices based on
theoretical considerations. More often, it takes trial and error to find a good architecture.
Next you add a Neural Network node to a process flow diagram.
1. Add a Neural Network node to the diagram.
2. Connect the new node to the Impute node and the Model Comparison node as shown below.

3. Select the Neural Network node.


4. To see additional options, select View Property Sheet Advanced.
5. Examine the general properties of the node.
3

In the Continue Training field you specify whether you want to use the current estimates as the
starting values for training. When you set this property to Yes, the estimates from the previous run of the
node are used as the initial values for training. To use this property, an estimates data set must have been
created by the node before you set this property to Yes. The default value of the property is No.
You can specify one of the following criteria for selecting the best model:
Profit/Loss chooses the model that maximizes the profit or minimizes the loss for the cases in the
validation data set.
Misclassification Rate chooses the model that has the smallest misclassification rate for the validation
data set.
Average Error chooses the model that has the smallest average error for the validation data set.

6. Select in the Network row of the Properties panel. The window allows you to specify the
parameters for the neural network as follows.
The Architecture field enables you to specify a wide variety of neural networks including
Generalized linear model (GLIM)
Multilayer perceptron (MLP, which is the default)
Ordinary radial basis function with equal widths (ORBFEQ)
Ordinary radial basis function with unequal widths (ORBFUN)
Normalized radial basis function with equal heights (NRBFEH)
Normalized radial basis function with equal volumes (NRBFEV)
Normalized radial basis function with equal widths (NRBFEW)
Normalized radial basis function with equal widths and heights (NRBFEQ)
Normalized radial basis function with unequal widths and heights (NRBFUN).
The User option in the field enables the user to define a network with a single hidden layer.

Discussion of these architectures is beyond the scope of this course.

By default, the network does not include direct connections. In this case, each input unit is connected to
each hidden unit, and each hidden unit is connected to each output unit. If you set the Direct connections
value to Yes, each input unit is also connected to each output unit. Direct connections define linear layers,
4

whereas hidden neurons define nonlinear layers. Do not change the default setting for direct connections
for this example.
The Number of Hidden Units property enables you to specify the number of hidden units that you want to
use in each hidden layer. Permissible values are integers between 1 and 64. The default value is 3.

7. Select in the Optimization row of the Properties panel. This allows you to examine the training
options for the neural network.

The training options include


the maximum number of iterations allowed during the neural network training. The permissible values
are integers from 1 to 1000. The default value is 50.
the maximum amount of CPU time that you want to use during training. Permissible values are 5
minutes, 10 minutes, 30 minutes, 1 hour, 2 hours, 4 hours, or 7 hours. The default setting for the
Maximum Time property is 4 hours.
the training technique, which is the methodology used to iterate from the starting values to a solution.
Examine the Preliminary Training options in the same window.
5

Preliminary training performs preliminary runs to determine parameter starting values. The default is Yes,
with the following available options
the maximum iterations
the maximum time
the number of runs.
Disable Preliminary Training for this example.
Examine the Model
1. Run the flow from the Neural Network node and view the results.
The Fit Statistics window shows that the average squared error and the misclassification rate are similar
to the values for the regression models that you created earlier.

The Output window shows initial values for the neural network weights.
6

The Output window also shows the model optimization results (maximizing the likelihood estimates of
the model weights). In this case, the model fitting process has converged.
In analyzing a different dataset, you may see a statement saying LEVMAR needs more than x
iterations, where x is some integer value, as seen in the figure below. This is particularly interesting. It
can be interpreted to mean that the model fitting process did not converge. The output in the example
below indicates that 20 iterations occurred. Apparently, more than 20 iterations are needed for the neural
network training process to converge in this case. Convergence of the model depends only on minimizing
average squared error in the training data. The iteration plot in such a case will show that a unique
minimum has not been reached, even after 20 iterations.

Heres how you increase the maximum number of iterations for the Neural Network node.

In the properties panel for the Neural Network node, click the Ellipsis button for the Optimization
property.

In the Value column for the Maximum Iterations property, type 100, and then click OK.
Coming back to our exercise, the iteration plot will also show a divergence in average squared error for
the training data and average squared error for the validation data near iteration 1 (indicated by the
vertical blue line). The divergence implies that too many inputs are being considered, leading to
overfitting. Reducing the number of inputs reduces the number of weights and possibly improves the
model's performance.
7

The Iteration Plot window shows that the final neural network chosen is from the 23 rd iteration. Recall
that the model selection criterion was Profit/Loss. If you change the iteration plot to view the average
profit chart, the reason for the selection of this iteration becomes more obvious.

You can examine the Output window to view the final parameter estimates, or weights. If you prefer, you
can view a graphical representation of those weights.
2. In the neural network Results window, select View Model Weights - Final.
8

H11, H12, and H13 are the three hidden units in the single hidden layer. The colors in the plot represent
the relative size of the weights. If you point at a particular rectangle, it shows the variable being
connected as well as the actual value of the weight.

In this case there are 26 input variables for the model. It might be desirable to do some variable
selection prior to the neural network model. One option would be to use a decision tree to do
variable selection and then use a neural network to build the model.
3. Run the diagram from the Model Comparison node to make a final comparison of all models built.
4. When the run has completed, view the results.
5. Maximize the Score Rankings Overlay plot.
9

The plot still shows that the Regression model appears to have better lift than any of the other models for
the validation data set.

You might also like