How To Reduce Overfitting Using Weight Constraints in Keras BOM AUTOR
How To Reduce Overfitting Using Weight Constraints in Keras BOM AUTOR
Weight constraints provide an approach to reduce the overfitting of a deep learning neural network model on
the training data and improve the performance of the model on new data, such as the holdout test set.
There are multiple types of weight constraints, such as maximum and unit vector norms, and some require a
hyperparameter that must be configured.
In this tutorial, you will discover the Keras API for adding weight constraints to deep learning neural network
models to reduce overfitting.
Kick-start your project with my new book Better Deep Learning, including step-by-step tutorials and
the Python source code files for all examples.
Updated Mar/2019: fixed typo using equality instead of assignment in some usage examples.
Updated Oct/2019: Updated for Keras 2.3 and TensorFlow 2.0.
https://machinelearningmastery.com/how-to-reduce-overfitting-in-deep-neural-networks-with-weight-constraints-in-keras/ 1/11
11/11/2020 How to Reduce Overfitting Using Weight Constraints in Keras
How to Reduce Overfitting in Deep Neural Networks With Weight Constraints in Keras
Photo by Ian Sane, some rights reserved.
Tutorial Overview
This tutorial is divided into three parts; they are:
The constraints are specified per-layer, but applied and enforced per-node within the layer.
Using a constraint generally involves setting the kernel_constraint argument on the layer for the input weights
and the bias_constraint for the bias weights.
A suite of different vector norms can be used as constraints, provided as classes in the keras.constraints
module. They are:
Maximum norm (max_norm), to force weights to have a magnitude at or below a given limit.
https://machinelearningmastery.com/how-to-reduce-overfitting-in-deep-neural-networks-with-weight-constraints-in-keras/ 2/11
11/11/2020 How to Reduce Overfitting Using Weight Constraints in Keras
1 # import norm
2 from keras.constraints import max_norm
3 # instantiate norm
4 norm = max_norm(3.0)
The constraint for the recurrent weights is set via the recurrent_constraint argument to the layer.
The example below sets a maximum norm weight constraint on an LSTM layer.
https://machinelearningmastery.com/how-to-reduce-overfitting-in-deep-neural-networks-with-weight-constraints-in-keras/ 3/11
11/11/2020 How to Reduce Overfitting Using Weight Constraints in Keras
6 ...
Now that we know how to use the weight constraint API, let’s look at a worked example.
Click to sign-up and also get a free PDF Ebook version of the course.
This example provides a template for applying weight constraints to your own neural network for
classification and regression problems.
Each observation has two input variables with the same scale and a class output value of either 0 or 1. This
dataset is called the “moons” dataset because of the shape of the observations in each class when plotted.
We can use the make_moons() function to generate observations from this problem. We will add noise to the
data and seed the random number generator so that the same samples are generated each time the code is
run.
We can plot the dataset where the two variables are taken as x and y coordinates on a graph and the class
value is taken as the color of the observation.
The complete example of generating the dataset and plotting it is listed below.
https://machinelearningmastery.com/how-to-reduce-overfitting-in-deep-neural-networks-with-weight-constraints-in-keras/ 4/11
11/11/2020 How to Reduce Overfitting Using Weight Constraints in Keras
6 X, y = make_moons(n_samples=100, noise=0.2, random_state=1)
7 # scatter plot, dots colored by class value
8 df = DataFrame(dict(x=X[:,0], y=X[:,1], label=y))
9 colors = {0:'red', 1:'blue'}
10 fig, ax = pyplot.subplots()
11 grouped = df.groupby('label')
12 for key, group in grouped:
13 group.plot(ax=ax, kind='scatter', x='x', y='y', label=key, color=colors[key])
14 pyplot.show()
Running the example creates a scatter plot showing the semi-circle or moon shape of the observations in
each class. We can see the noise in the dispersal of the points making the moons less obvious.
Scatter Plot of Moons Dataset With Color Showing the Class Value of Each Sample
This is a good test problem because the classes cannot be separated by a line, e.g. are not linearly
separable, requiring a nonlinear method such as a neural network to address.
We have only generated 100 samples, which is small for a neural network, providing the opportunity to overfit
the training dataset and have higher error on the test dataset: a good case for using regularization. Further,
the samples have noise, giving the model an opportunity to learn aspects of the samples that don’t
generalize.
https://machinelearningmastery.com/how-to-reduce-overfitting-in-deep-neural-networks-with-weight-constraints-in-keras/ 5/11
11/11/2020 How to Reduce Overfitting Using Weight Constraints in Keras
The model will have one hidden layer with more nodes than may be required to solve this problem, providing
an opportunity to overfit. We will also train the model for longer than is required to ensure the model overfits.
Before we define the model, we will split the dataset into train and test sets, using 30 examples to train the
model and 70 to evaluate the fit model’s performance.
The hidden layer uses 500 nodes in the hidden layer and the rectified linear activation function. A sigmoid
activation function is used in the output layer in order to predict class values of 0 or 1.
The model is optimized using the binary cross entropy loss function, suitable for binary classification
problems and the efficient Adam version of gradient descent.
1 # define model
2 model = Sequential()
3 model.add(Dense(500, input_dim=2, activation='relu'))
4 model.add(Dense(1, activation='sigmoid'))
5 model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
The defined model is then fit on the training data for 4,000 epochs and the default batch size of 32.
1 # fit model
2 history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=4000, verbose=0)
We can evaluate the performance of the model on the test dataset and report the result.
Finally, we will plot the performance of the model on both the train and test set each epoch.
If the model does indeed overfit the training dataset, we would expect the line plot of accuracy on the training
set to continue to increase and the test set to rise and then fall again as the model learns statistical noise in
the training dataset.
1 # plot history
2 pyplot.plot(history.history['accuracy'], label='train')
3 pyplot.plot(history.history['val_accuracy'], label='test')
https://machinelearningmastery.com/how-to-reduce-overfitting-in-deep-neural-networks-with-weight-constraints-in-keras/ 6/11
11/11/2020 How to Reduce Overfitting Using Weight Constraints in Keras
4 pyplot.legend()
5 pyplot.show()
We can tie all of these pieces together; the complete example is listed below.
Running the example reports the model performance on the train and test datasets.
We can see that the model has better performance on the training dataset than the test dataset, one possible
sign of overfitting.
Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or
differences in numerical precision. Consider running the example a few times and compare the average
outcome.
Because the model is overfit, we generally would not expect much, if any, variance in the accuracy across
repeated runs of the model on the same dataset.
A figure is created showing line plots of the model accuracy on the train and test sets.
We can see that expected shape of an overfit model where test accuracy increases to a point and then
begins to decrease again.
https://machinelearningmastery.com/how-to-reduce-overfitting-in-deep-neural-networks-with-weight-constraints-in-keras/ 7/11
11/11/2020 How to Reduce Overfitting Using Weight Constraints in Keras
Line Plots of Accuracy on Train and Test Datasets While Training Showing an Overfit
There are a few different weight constraints to choose from. A good simple constraint for this model is to
simply normalize the weights so that the norm is equal to 1.0.
This constraint has the effect of forcing all incoming weights to be small.
We can do this by using the unit_norm in Keras. This constraint can be added to the first hidden layer as
follows:
We can also achieve the same result by using the min_max_norm and setting the min and maximum to 1.0,
for example:
https://machinelearningmastery.com/how-to-reduce-overfitting-in-deep-neural-networks-with-weight-constraints-in-keras/ 8/11
11/11/2020 How to Reduce Overfitting Using Weight Constraints in Keras
We cannot achieve the same result with the maximum norm constraint as it will allow norms at or below the
specified limit; for example:
The complete updated example with the unit norm constraint is listed below:
Running the example reports the model performance on the train and test datasets.
Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or
differences in numerical precision. Consider running the example a few times and compare the average
outcome.
We can see that indeed the strict constraint on the size of the weights has improved the performance of the
model on the holdout set without impacting performance on the training set.
Reviewing the line plot of train and test accuracy, we can see that it no longer appears that the model has
overfit the training dataset.
Model accuracy on both the train and test sets continues to increase to a plateau.
https://machinelearningmastery.com/how-to-reduce-overfitting-in-deep-neural-networks-with-weight-constraints-in-keras/ 9/11
11/11/2020 How to Reduce Overfitting Using Weight Constraints in Keras
Line Plots of Accuracy on Train and Test Datasets While Training With Weight Constraints
Extensions
This section lists some ideas for extending the tutorial that you may wish to explore.
Report Weight Norm. Update the example to calculate the magnitude of the network weights and
demonstrate that the constraint indeed made the magnitude smaller.
Constrain Output Layer. Update the example to add a constraint to the output layer of the model and
compare the results.
Constrain Bias. Update the example to add a constraint to the bias weight and compare the results.
Repeated Evaluation. Update the example to fit and evaluate the model multiple times and report the
mean and standard deviation of model performance.
Further Reading
This section provides more resources on the topic if you are looking to go deeper.
https://machinelearningmastery.com/how-to-reduce-overfitting-in-deep-neural-networks-with-weight-constraints-in-keras/ 10/11
11/11/2020 How to Reduce Overfitting Using Weight Constraints in Keras
Posts
Gentle Introduction to Vector Norms in Machine Learning
API
Keras Constraints API
Keras constraints.py
Keras Core Layers API
Keras Convolutional Layers API
Keras Recurrent Layers API
sklearn.datasets.make_moons API
Summary
In this tutorial, you discovered the Keras API for adding weight constraints to deep learning neural network
models.
https://machinelearningmastery.com/how-to-reduce-overfitting-in-deep-neural-networks-with-weight-constraints-in-keras/ 11/11