Deep L
Deep L
By now, you've seen forward propagation and back propagation in the context
of a neural network, with a single hidden layer, as well as logistic regression, and
you've learned about vectorization, and when it's important to initialize the ways randomly.
If you've done the past couple weeks homework, you've also implemented and
seen some of these ideas work for yourself.
So by now, you've actually seen most of the ideas you need to implement a deep neural
network. What we're going to do this week, is take those ideas and put them together so
that you'll be able to implement your own deep neural network.
Because this week's problem exercise is longer, it just has been more work, I'm going to keep
the videos for this week shorter as you can get through the videos a little bit more quickly, and
then have more time to do a significant problem exercise at then end, which I hope
will leave you having thoughts deep in neural network, that if you feel proud of.
So what is a deep neural network? You've seen this picture for logistic regression and
you've also seen neural networks with a single hidden layer.
So here's an example of a neural network with two hidden layers and
a neural network with 5 hidden layers.
Parameters vs Hyperparameters
Being effective in developing your deep
Neural Nets requires that you not only
organize your parameters well but also
your hyper parameters. So what are hyper
parameters? let's take a look! So the
parameters your model are W and B and
there are other things you need to tell
your learning algorithm, such as the
learning rate alpha, because we need
to set alpha and that in turn will
determine how your parameters evolve or
maybe the number of iterations of
gradient descent you carry out. Your
learning algorithm has oth
numbers that you need to set such as the
number of hidden layers, so we call that
capital L, or the number of hidden units,
such as 0 and 1 and 2 and
so on. Then you also have the choice
of activation function. do you want to
use a RELU, or tangent or a sigmoid
function especially in the
hidden layers. So all of these things
are things that you need to tell your
learning algorithm and so these are
parameters that control the ultimate
parameters W and B and so we call all of
these things below hyper parameters.
Because these things like alpha, the
learning rate, the number of iterations,
number of hidden layers, and so on, these
are all parameters that control W and B.
So we call these things hyper parameters,
because it is the hyper parameters that
somehow determine the final
value of the parameters W and B that you
end up with. In fact, deep learning has a
lot of different hyper parameters.
In the later course, we'll see other
hyper parameters as well such as the
momentum term, the mini batch size,
various forms of regularization
parameters, and so on. If none of
these terms at the bottom make sense yet,
don't worry about it! We'll talk about
them in the second course. Because deep
learning has so many hyper parameters in
contrast to earlier errors of machine
learning, I'm going to try to be very
consistent in calling the learning rate
alpha a hyper parameter rather than
calling the parameter. I think in earlier
eras of machine learning when we didn't
have so many hyper parameters, most of us
used to be a bit slow up here and just
call alpha a parameter. Technically,
alpha is a parameter, but is a parameter
that determines the real parameters. I'll
try to be consistent in calling these
things like alpha, the number of
iterations, and so on hyper parameters. So
when you're training a deep net for your
own application you find that there may
be a lot of possible settings for the
hyper parameters that you need to just
try out. So applying deep learning today is
a very intrictate process where often you
might have an idea. For example, you might
have an idea for the best value for the
learning rate. You might say, well maybe
alpha equals 0.01 I want to try that.
Then you implement, try it out, and then
see how that works. Based on
that outcome you might say, you know what?
I've changed online, I want to increase
the learning rate to 0.05. So, if
you're not sure what the best value
for the learning rate to use. You might
try one value of the learning rate alpha
and see their cost function j go down
like this, then you might try a larger
value for the learning rate alpha and
see the cost function blow up and
diverge. Then, you might try another
version and see it go down really fast.
it's inverse to higher value. You might
try another version and
see the cost function J do that then.
I'll be trying to set the values. So you might
say, okay looks like this the value of
alpha. It gives me a pretty fast learning
and allows me to converge to a lower
cost function j and so I'm going to use
this value of alpha. You saw in a
previous slide that there are a lot of
different hybrid parameters. It turns
out that when you're starting on the new
application, you should find it very
difficult to know in advance exactly
what is the best value of the hyper
parameters. So, what often happens is you
just have to try out many different
values and go around this cycle your
try out some values, really try five hidden
layers. With this many number of hidden
units implement that, see if it works, and
then iterate. So the title of this slide
is that applying deep learning is a very
empirical process, and empirical process
is maybe a fancy way of saying you just
have to try a lot of things and see what
works. Another effect I've seen is that
deep learning today is applied to so
many problems ranging from computer
vision, to speech recognition, to natural
language processing, to a lot of
structured data applications such as
maybe a online advertising, or web search,
or product recommendations, and so on.
What I've seen is that first, I've seen
researchers from one discipline, any one
of these, and try to go to a different one.
And sometimes the intuitions about hyper
parameters carries over and sometimes it
doesn't, so I often advise people,
especially when starting on a new
problem, to just try out a range of
values and see what w. In the next
course we'll
see some systematic ways for trying out
a range of values. Second,
even if you're working on one
application for a long time, you know
maybe you're working on online
advertising, as you make progress on the
problem it is quite possible that the best
value for the learning rate, a number of
hidden units, and so on might change. So
even if you tune your system to the best
value of hyper parameters today it's
possible you'll find that the best value
might change a year from now maybe
because the computer infrastructure,
be it you know CPUs, or the type of GPU
running on, or something has changed.
So maybe one rule of thumb is
every now and then, maybe every few
months, if you're working on a problem
for an extended period of time for many
years just try a few values for the
hyper parameters and double check if
there's a better value for the hyper
parameters. As you do so you slowly
gain intuition as well about the hyper
parameters that work best for your
problems.
I know that this might seem like an
unsatisfying part of deep learning that
you just have to try on all the values
for these hyper parameters, but maybe
this is one area where deep learning
research is still advancing, and maybe
over time we'll be able to give better
guidance for the best hyper parameters
to use. It's also possible that
because CPUs and GPUs and networks and
data sets are all changing, and it is
possible that the guidance won't
converge for some time. You just need
to keep trying out different values and
evaluate them on a hold on
cross-validation set or something and
pick the value that works for your
problems. So that was a brief discussion
of hyper parameters. In the second course,
we'll also give some suggestions for how
to systematically explore the space of
hyper parameters but by now you actually
have pretty much all the tools you need
to do their programming exercise before
you do that adjust or share view one
more set of ideas which is I often ask
what does deep learning have to do the
human brain?