Chapter 3
Chapter 3
We’re creating a layer that will only accept as input 2D tensors where the first
dimension is 784 (axis 0, the batch dimension, is unspecified, and thus any
value would be accepted).
We look for a good set of values for the weight tensors involved in these tensor
operations.
Picking the right network architecture is more an art than a science, and
although there are some best practices and principles you can rely on, only
practice can help you become a proper neural-network architect.
Once the network architecture is defined, you still need to choose two things.
Loss function (objective function): The quantity that will be minimized during
training. It represents a measure of success for the task at hand.
Optimizer: Determines how the network will be updated based on the loss function.
It implements a specific variant of stochastic gradient descent (SGD).
A neural network that has multiple outputs may have multiple loss functions
(one per output).
But the gradient-descent process must be based on a single scalar loss value.
So, for multi-loss networks, all losses are combined (via averaging) into a single
scalar quantity.
Choosing the right objective function for the right problem is extremely
important:
If the objective doesn’t fully correlate with success for the task at hand, your
network will end up doing things you may not have wanted.
Imagine a stupid, omnipotent AI trained via SGD, with this poorly chosen objective
function: “maximizing the average well-being of all humans alive.”
To make its job easier, this AI might choose to kill all humans except a few and
focus on the well-being of the remaining ones. Because average well-being isn’t
affected by how many humans are left.
When it comes to problems such as classification, regression, and sequence
prediction, there are simple guidelines to choose the correct loss.
For instance, you’ll use binary crossentropy for a two-class classification
problem, categorical crossentropy for a many-class classification problem,
meansquared error for a regression problem, connectionist temporal
classification (CTC) for a sequence-learning problem, and so on.
Keras is distributed under the permissive MIT license, which means it can be
freely used in commercial projects.
Keras is used at Google, Netflix, Uber, CERN, Yelp, Square, and hundreds of
startups working on a wide range of problems.
TensorFlow, CNTK, and Theano are some of the primary platforms for deep
learningtoday.
Theano (http://deeplearning.net/software/theano) is developed by the MILA
lab at Université de Montréal
TensorFlow (www.tensorflow.org) is developed by Google,
CNTK (https://github.com/Microsoft/CNTK) is developed by Microsoft.
Any piece of code that you write with Keras can be run with any of these
backends without having to change anything in the code:
We can seamlessly switch between the two during development, which often
proves useful. For instance, if one of these backends proves to be faster for a
specific task.
Via TensorFlow (or Theano, or CNTK), Keras is able to run seamlessly on both
CPUs and GPUs.
And here’s the same model defined using the functional API:
With the functional API, you’re manipulating the data tensors that the model
processes and applying layers to this tensor as if they were functions.
Once your model architecture is defined, it doesn’t matter whether you used
a Sequential model or the functional API. All of the following steps are the
same.
The learning process is configured in the compilation step, where you specify
the optimizer and loss function(s) that the model should use, as well as the
metrics you want to monitor during training.
Finally, the learning process consists of passing Numpy arrays of input data
(and the corresponding target data) to the model via the fit() method.
It’s highly recommended, although not strictly necessary, that you run deep
learning code on a modern NVIDIA GPU.
And even for applications that can realistically be run on CPU, you’ll generally
see speed increase by a factor or 5 or 10 by using a modern GPU.
Whether you’re running locally or in the cloud, it’s better to be using a Unix
workstation.
Although it’s technically possible to use Keras on Windows (all three Keras
backends support Windows), We don’t recommend it.
If you’re a Windows user, the simplest solution to get everything running is to
set up an Ubuntu dual boot on your machine.
Note that in order to use Keras, you need to install TensorFlow or CNTK or
Theano (or all of them, if you want to be able to switch back and forth among
the three backends).
Thank You