0% found this document useful (0 votes)

277 views21 pages

Optimization For Machine Learning Mini Course

Uploaded by

Nanang Roni Wibowo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

277 views21 pages

Optimization For Machine Learning Mini Course

Uploaded by

Nanang Roni Wibowo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 21

MACHINE LEARNING

MASTERY

Optimization
for Machine
Learning
7-Day Mini-Course

Jason Brownlee
i

Disclaimer
The information contained within this eBook is strictly for educational purposes. If you
wish to apply ideas contained in this eBook, you are taking full responsibility for your
actions.
The author has made every effort to ensure the accuracy of the information within this
book was correct at time of publication. The author does not assume and hereby
disclaims any liability to any party for any loss, damage, or disruption caused by errors
or omissions, whether such errors or omissions result from accident, negligence, or any
other cause.
No part of this eBook may be reproduced or transmitted in any form or by any means,
electronic or mechanical, recording or by any information storage and retrieval
system, without written permission from the author.

Optimization for Machine Learning Crash Course

Reserved. Edition: v1.0

Find the latest version of this guide online at: http://MachineLearningMastery.com

Contents

Before We Get Started... 1

Lesson 01: Why optimize? 3
Lesson 02: Grid search 5
Lesson 03: Optimization algorithms in SciPy 7
Lesson 04: BFGS algorithm 9
Lesson 05: Hill-climbing algorithm 11
Lesson 06: Simulated annealing 13
Lesson 07: Gradient descent 15
Final Word Before You Go... 17
Before We Get Started...

Function optimization is a field of mathematics to find the optima of a function.

A function can be anything that takes an input and generartes an output. We
normally consider the function’s input and output are numbers when we talk
about function optimization. Optimization is applied in machine learning in
various ways: from selection of hyperparameters to selection of input features,
from deciding the split in a decision tree, to convergence of a neural network.
This mini-course is designed to give you a quick introduction on optimization in
seven days. Let’s get started.

Who is this crash course for?

Before we get started, let’s make sure you are in the right place.
This course is for developers that may know some applied machine
learning. Maybe you know how to work through a predictive modeling problem
end-to-end, or at least most of the main steps, with popular tools. The lessons
in this course do assume a few things about you, such as:

You need to know:

▷ You know your way around basic Python for programming.
▷ You can comfortable write a function, loops, manipulate variables in Python
▷ You want to learn function optimization to deepen your understanding
and application of machine learning.

You do NOT need to know:

▷ You do not need to be a math wiz!
▷ You do not need to be a machine learning expert!
This crash course will take you from a developer that knows a little
machine learning to a developer who can navigate the basics of optimization
methods. This crash course assumes you have a working Python3 SciPy
environment with at least NumPy installed. If you need help with your
environment, you can follow the step-by-step tutorial here:
Before We Get Started... 2

▷ How to Setup a Python Environment for Machine Learning and Deep

Learning https://machinelearningmastery.com/setup-python-
environment-machine-learn ing-deep-learning-anaconda/

Mini-course overview
This crash course is broken down into seven lessons. You could complete one
lesson per day (recommended) or complete all of the lessons in one day
(hardcore). It really depends on the time you have available and your level of
enthusiasm. Below is a list of the seven lessons that will get you started and
productive with optimization for machine learning in Python:
▷ Lesson 01: Why optimize?
▷ Lesson 02: Grid search.
▷ Lesson 03: Optimization algorithms in SciPy.
▷ Lesson 04: BFGS algorithm.
▷ Lesson 05: Hill-climbing algorithm.
▷ Lesson 06: Simulated annealing.
▷ Lesson 07: Gradient descent.
Each lesson could take you 60 seconds or up to 30 minutes. Take your time
and complete the lessons at your own pace. Ask questions and even share your
results online. The lessons expect you to go off and find out how to do things. I
will give you hints, but part of the point of each lesson is to force you to learn
where to go to look for help on and about the statistical methods and the
NumPy API and the best-of-breed tools in Python. (Hint: I have all of the
answers directly on this blog; use the search box.) Share your results online,
I’ll cheer you on!

Hang in there, don’t give up!

Lesson 01
Why optimize? 0
1
Machine learning is different from other kinds of software projects in the
sense that it is less trivial on how we should write the program. A toy example
in programming is to write a for loop to print numbers from 1 to 100. You
know exactly you need a variable to count, and there should be 100 iterations
of the loop to count. A toy example in machine learning is to use neural
network for regression, but you have no idea how many iterations you need
exactly to train the model. You might set it too few or too many and you don’t
have a rule to tell what is the right number. Hence many people consider
machine learning models as a black box. The consequence is that, while the
model has many variables that we can tune (the hyperparameters, for example)
we do not know what should be the correct values until we tested it out.
In this lesson, you will discover why machine learning practitioners should
study optimization to improve their skills and capabilities. Optimization is also
called function optimization in mathematics that aimed to locate the maximum
or minimum value of certain function. For different nature of the function,
different methods can be applied.
Machine learning is about developing predictive models. Whether one
model is better than another, we have some evaluation metrics to measure a
model’s performance subject to a particular data set. In this sense, if we
consider the parameters that created the model as the input, the inner
algorithm of the model and the data set in concern as constants, and the
metric that evaluated from the model as the output, then we have a function
constructed.
Take decision tree as an example. We know it is a binary tree because
every intermediate node is asking a yes-no question. This is constant and we
cannot change it. But how deep this tree should be is a hyperparameter that
we can control. What features and how many features from the data we allow
the decision tree to use is another. A different value for these hyperparameters
will change the decision tree model, which in turn gives a different metric,
such as average accuracy from k-fold cross validation in classification problems.
Then we have a function defined that takes the hyperparameters as input and
the accuracy as output.
From the perspective of the decision tree library, once you provided the
hyperparameters and the training data, it can also consider them as constants
and the selection of features and the thresholds for split at every node as
input. The metric is still the output here because the decision tree library
shared the same goal of making the best prediction. Therefore, the library also
has a function defined, but different from the one mentioned above.
Lesson 01: Why optimize? 4

The function here does not mean you need to explicitly define a function in the
programming language. A conceptual one is suffice. What we want to do next is
to manipulate on the input and check the output until we found the best output
is achieved. In case of machine learning, the best can mean
▷ Highest accuracy, or precision, or recall
▷ Largest AUC of ROC
▷ Greatest F1 score in classification or R2 score in regression
▷ Least error, or log-loss
or something else in this line. We can manipulate the input by random methods
such as sampling or random perturbation. We can also assume the function has
certain properties and try out a sequence of inputs to exploit these properties.
Of course, we can also check all possible input and as we exhausted the
possibility, we will know the best answer.
These are the basics of why we want to do optimization, what it is about,
and how we can do it. You may not notice it, but training a machine learning
model is doing optimization. You may also explicitly perform optimization to
select features or fine-tune hyperparameters. As you can see, optimization is
useful in machine learning.

Your Task
For this lesson, you must find a machine learning model and list three examples
that optimization might be used or might help in training and using the model.
These may be related to some of the reasons above, or they may be your own
personal motivations.

Next
In the next lesson, you will discover how to perform grid search on an arbitrary
function.
Lesson
02 Grid
search
0
f (x, y) = x2 + y2
2
In this lesson, you will discover a gentle introduction to perform grid search for
optimization. Let’s start with this function:

This is a function with two-dimensional input (x, y) and one-dimensional

output. What can we do to find the minimum of this function? In other words,
for what x and y, we can have the least f (x, y)?
Without looking at what f (x, y) is, we can first assume the x and y are in
some bounded− region, say, from 5 to +5. Then we can check for every
combination of x and y in this range. If we remember the value of f (x, y) and
keep track on the least we ever saw, then we can find the minimum of it after
exhausting the region. In Python code, it is like this:

from numpy import arange, inf

# objective function
def objective(x, y):
return x**2.0 + y**2.0

# define range for input

r_min, r_max = -5.0, 5.0
# generate a grid sample from the domain sample =
list() step = 0.1
for x in arange(r_min, r_max+step, step):
for y in arange(r_min, r_max+step, step):
sample.append([x,y])
# evaluate the sample
best_eval = inf
best_x, best_y = None,
None for x,y in sample:
eval = objective(x,y)
if eval <
best_eval:
best_x = x
best_y = y
best_eval =
eval
# summarize best solution
print('Best: f(%.5f,%.5f) = %.5f' % (best_x, best_y, best_eval))
Program 02.1: Grid search on a function
Lesson 02: Grid search 6

This code scan from the lowerbound of the−range 5 to upperbound +5 with

each step of increment of 0.1. This range is same for both x and y. This will
create a large number of samples of the (x, y) pair. These samples are created
out of combinations of x and y over a range. If we draw their coordinate on a
graph paper, they form a grid, and hence we call this grid search.
With the grid of samples, then we evaluate the objective function f (x, y) for
every sample of (x, y). We keep track on the value, and remember the least we
ever saw. Once we exhausted the samples on the grid, we recall the least value
that we found as the result of the optimization.

Your Task
For this lesson, you should lookup how to use numpy.meshgrid() function and
rewrite the example code. Then you can try to replace the objective function
−
into f (x, y, z) = (x y+1)2 +z 2, which is a function with 3D input.

Next
In the next lesson, you will learn how to use scipy to optimize a function.
Lesson 03
Optimization algorithms in SciPy 0
3
In this lesson, you will discover how you can make use of SciPy to optimize
your function. There are a lot of optimization algorithms in the literature. Each
has its strengths and weaknesses, and each is good for a different kind of
situation. Reusing the same function we introduced in the previous lesson,
f (x, y) = x2 + y2
we can make use of some predefined algorithms in SciPy to find its minimum.
Probably the easiest is the Nelder-Mead algorithm. This algorithm is based on
a series of rules to determine how to explore the surface of the function.
Without going into
from scipy.optimize the detail,
import we can simply call SciPy and apply Nelder-Mead
minimize
algorithm to find a function’s
from numpy.random import rand minimum:

# objective function
def objective(x):
return x[0]**2.0 + x[1]**2.0

# define range for input r_min, r_max = -5.0, 5.0

# define the starting point as a random sample from the domain pt = r_min + rand(2) * (r_max -
# perform the search
result = minimize(objective, pt, method='nelder-mead') # summarize the result
print('Status : %s' % result['message']) print('Total Evaluations: %d' % result['nfev']) # evaluate s
solution = result['x'] evaluation = objective(solution)
print('Solution: f(%s) = %.5f' % (solution, evaluation))

Program 03.1: Nelder-Mead algorithm to

find minimum

In the code above, we need to write our function with a single vector
argument. Hence virtually the function becomes
f (x[0], x[1]) = (x[0])2 +
(x[1])2
Lesson 03: Optimization algorithms in SciPy 8

Nelder-Mead algorithm needs a starting point. We choose a random point−in the

range of 5 to
+5 for that (rand(2) is numpy’s way to generate a random coordinate pair
between 0 and 1). The function minimize() returns a OptimizeResult object,
which contains information about the result that is accessible via keys. The
message key provides a human-readable message about the success or failure of
the search, and the nfev key tells the number of function evaluations
performed in the course of optimization. The most important one is x key,
which specifies the input values that attained the minimum.
Nelder-Mead algorithm works well for convex functions, which the shape is
smooth and like a basin. For more complex function, the algorithm may stuck
at a local optimum but fail to find the real global optimum.

Your Task
For this lesson, you should replace the objective function in the example code
above with the following:

from numpy import e, pi, cos, sqrt, exp

def objective(v): x, y = v
return ( -20.0 * exp(-0.2 * sqrt(0.5 * (x**2 + y**2)))
- exp(0.5 * (cos(2 * pi C *x)+cos(2*pi*y))) + e + 20 )

This defined the Ackley function. The global minimum is at v=[0,0]. However,
Nelder-Mead most likely cannot find it because this function has many local
minima. Try repeat your code a few times and observe the output. You should
get a different output each time you run the program.

Next
In the next lesson, you will learn how to use the same SciPy function to apply
a different optimization algorithm.
Lesson 04
BFGS algorithm 0
4
In this lesson, you will discover how you can make use of SciPy to apply BFGS
algorithm to optimize your function. As we have seen in the previous lesson,
we can make use of the minimize() function from scipy.optimize to optimize a
function using Nelder-Meadd algorithm. This is the simple “pattern search”
algorithm that does not need to know the derivatives of a function.
First-order derivative means to differentiate the objective function once.
Similarly, second- order derivative is to differentiate the first-order derivative
one more time. If we have the second-order derivative of the objective function,
we can apply the Newton’s method to find its optimum.
There is another class of optimization algorithm that can approximate the
second-order derivative from the first order derivative, and use the
approximation to optimize the objective function. They are called the quasi-
Newton methods. BFGS is the most famous one of this class.
Revisiting the same objective function that we used in previous lessons,
f (x, y) = x2 + y2
we can tell that the first-order derivative is:
Σ Σ
2×x
∇f = 2 ×
y
This is a vector of two components, because the function f (x, y) receives a
vector value of two components (x, y) and returns a scalar value.
If we create a new function for the first-order derivative, we can call SciPy
and apply the BFGS algorithm:

from scipy.optimize import minimize

from numpy.random import rand

# objective function
def objective(x):
return x[0]**2.0 + x[1]**2.0

# derivative of the objective function

Lesson 04: BFGS algorithm 10

def derivative(x):
return [x[0] * 2, x[1] * 2]

# define range for input r_min, r_max = -5.0, 5.0

# define the starting point as a random sample from the domain pt = r_min + rand(2) * (r_max -
# perform the bfgs algorithm search
result = minimize(objective, pt, method='BFGS', jac=derivative) # summarize the result
print('Status : %s' % result['message']) print('Total Evaluations: %d' % result['nfev']) # evaluate s
solution = result['x'] evaluation = objective(solution)
print('Solution: f(%s) = %.5f' % (solution, evaluation))

Program 04.1: BFGS algorithm to find

minimum

The first-order derivative of the objective function is provided to the

minimize() function with the jac argument. The argument is named after
Jacobian matrix, which is how we call the first-order derivative of a function
that takes a vector and returns a vector. The BFGS algorithm will make use of
the first-order derivative to compute the inverse of the Hessian matrix (i.e., the
second-order derivative of a vector function) and use it to find the optima.
Besides BFGS, there is also L-BFGS-B. It is a version of the former that
uses less memory (the “L”) and the domain is bounded to a region (the “B”). To
use this variant, we simply replace the name of the method:

...
result = minimize(objective, pt, method='L-BFGS-B', jac=derivative)

Your Task
For this lesson, you should create a function with much more parameters (i.e.,
the vector argument to the function is much more than two components) and
observe the performance of BFGS and L-BFGS-B. Do you notice the difference
in speed? How different are the result from these two methods? What happen
if your function is not convex but have many local optima?

Next
In the next lesson, you will learn how to implement hill-climbing method.
Lesson 05
Hill-climbing algorithm 0
it to optimize your function.

5
In this lesson, you will discover how to implement hill-climbing algorithm and use

The idea of hill-climbing is to start from a point on the objective function.

Then we move the point a bit in a random direction. In case the move allows us
to find a better solution, we keep the new position. Otherwise we stay with the
old. After enough iterations of doing this, we should be close enough to the
optimum of this objective function. The progress is named because it is like we
are climbing on a hill, which we keep going up (or down) in any direction
whenever we can.
In Python, we can write the above hill-climbing algorithm for minimization as
a function:

from numpy.random import randn

def in_bounds(point, bounds):

# enumerate all dimensions of the point
for d in range(len(bounds)):
# check if out of bounds for this dimension
if point[d] < bounds[d, 0] or point[d] > bounds[d, 1]:
return False
return True

def hillclimbing(objective, bounds, n_iterations, step_size):

# generate an initial point
solution = None
while solution is None or not in_bounds(solution, bounds):
solution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])
# evaluate the initial point
solution_eval = objective(solution)
# run the hill climb
for i in range(n_iterations):
# take a step
candidate = None
while candidate is None or not in_bounds(candidate, bounds):
candidate = solution + randn(len(bounds)) * step_size
# evaluate candidate point
candidte_eval = objective(candidate)
# check if we should keep the new point
Lesson 05: Hill-climbing algorithm 12

if candidte_eval <= solution_eval: # store the new point

solution, solution_eval = candidate, candidte_eval # report progress
print('>%d f(%s) = %.5f' % (i, solution, solution_eval))
return [solution, solution_eval]

Program 05.1: Hill climbing local search

algorithm

This function allows any objective function to be passed as long as it takes

a vector and returns a scalar value. The bounds argument should be a numpy
×
array of n 2 dimension, which n is the size of the vector that the objective
function expects. It tells the lower- and upper-bound of the range we should
look for the minimum. For example, we can set up the bound as follows for the
objective function that expects two dimensional vectors (like the one in the
previous lesson) and the components of the vector to be between −5 to +5:

bounds = np.asarray([[-5.0, 5.0], [-5.0, 5.0]])

This hillclimbing function will randomly pick an initial point within the bound,
then test the objective function in iterations. Whenever it can find the
objective function yields a less value, the solution is remembered and the next
point to test is generated from its neighborhood.

Your Task
For this lesson, you should provide your own objective function (such as copy
over the one from previous lesson), set up the n_iterations and step_size and
apply the hillclimbing function to find the minimum. Observe how the
algorithm finds a solution. Try with different values of step_size and compare
the number of iterations needed to reach the proximity of the final solution.

Next
In the next lesson, you will learn how to implement simulated annealing.
Lesson 06
Simulated annealing 0
optima. The reason is because of the greedy nature of the algorithm:
6
In this lesson, you will discover how simulated annealing works and how to
use it. For the non-convex functions, the algorithms you learned in previous
lessons may be trapped easily at local optima and failed to find the global

Whenever a better solution is found, it will not let go. Hence if a even better
solution exists but not in the proximity, the algorithm will fail to find it.
Simulated annealing try to improve on this behavior by making a balance
between exploration and exploitation. At the beginning, when the algorithm is
not knowing much about the function to optimize, it prefers to explore other
solutions rather than stay with the best solution found. At later stage, as more
solutions are explored the chance of finding even better solutions is
diminished, the algorithm will prefer to remain in the neighborhood of the best
solution it found.
The following is the implementation of simulated annealing as a Python
function:

from numpy.random import randn, rand

def simulated_annealing(objective, bounds, n_iterations, step_size, temp): # generate an initial p

best = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0]) # evaluate the initial point
best_eval = objective(best) # current working solution
curr, curr_eval = best, best_eval # run the algorithm
for i in range(n_iterations): # take a step
candidate = curr + randn(len(bounds)) * step_size # evaluate candidate point
candidate_eval = objective(candidate) # check for new best solution
if candidate_eval < best_eval: # store new best point
best, best_eval = candidate, candidate_eval # report progress
print('>%d f(%s) = %.5f' % (i, best, best_eval))
# difference between candidate and current point evaluation diff = candidate_eval - curr_eval
Lesson 06: Simulated annealing 14

# calculate temperature for current epoch t = temp / float(i + 1)

# calculate metropolis acceptance criterion metropolis = exp(-diff / t)
# check if we should keep the new point
if diff < 0 or rand() < metropolis: # store the new current point
curr, curr_eval = candidate, candidate_eval
return [best, best_eval]

Program 06.1: Simulated annealing algorithm

Similar to the hill-climbing algorithm in the previous lesson, the function

starts with a random initial point. Also similar to that in previous lesson, the
algorithm runs in loops prescribed by the count n_iterations. In each iteration,
a random neighborhood point of the current point is picked and the objective
function is evaluated on it. The best solution ever found is remembered in the
variable best and best_eval. The difference to the hill-climbing algorithm is
that, the current point curr in each iteration is not necessarily the best
solution. Whether the point is moved to a neighborhood or stay depends on a
probability that related to the number of iterations we did and how much
improvement the neighborhood can make. Because of this stochastic nature,
we have a chance to get out of the local minima for a better solution. Finally,
regardless where we end up, we always return the best solution ever found
among the iterations of the simulated annealing algorithm.
In fact, most of the hyperparameter tuning or feature selection problems
are encountered in machine learning are not convex. Hence simulated
annealing should be more suitable then hill-climbing for these optimization
problems.

Your Task
For this lesson, you should repeat the exercise you did in the previous lesson
with the simulated annealing code above. Try with the objective function f (x,
y) = x2 + y2, which is a convex one. Do you see simulated annealing or hill
climbing takes less iteration? Replace the objective function with the Ackley
function introduced in Lesson 03. Do you see the minimum found by simulated
annealing or hill climbing is smaller?

Next
In the next lesson, you will learn how to implement
gradient descent.
Lesson 07
Gradient
descent
0
7
In this lesson, you will discover how you can implement gradient descent
algorithm. Gradient descent algorithm is the algorithm used to train a neural
network. Although there are many variants, all of them are based on gradient,
or the first-order derivative, of the function. The idea lies in the physical
meaning of a gradient of a function. If the function takes a vector and returns
a scalar value, the gradient of the function at any point will tell you the
direction that the function is increased the fastest. Hence if we aimed at
finding the minimum of the function, the direction we should explore is the
exact opposite of the gradient.
In mathematical equation, if we are looking for the minimum of f (x), where
x is a vector, and the gradient of f (x) is denoted by ∇f (x) (which is also a
vector), then we know
xnew = x − α × ∇f (x)
will be closer to the minimum than x. Now let’s try to implement this in
Python. Reusing the sample objective function and its derivative we learned in
Lesson 04, this is the gradient descent algorithm and its use to find the
minimum of the objective function:

from numpy import asarray

from numpy import arange
from numpy.random import rand

# objective function
def objective(x):
return x[0]**2.0 + x[1]**2.0

# derivative of the objective function

def derivative(x):
return asarray([x[0]*2, x[1]*2])

# gradient descent algorithm

def gradient_descent(objective, derivative, bounds, n_iter, step_size): # generate an initial point
solution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0]) # run the gradient des
for i in range(n_iter): # calculate gradient
gradient = derivative(solution)
Lesson 07: Gradient descent 16

# take a step
solution = solution - step_size * gradient # evaluate candidate point
solution_eval = objective(solution) # report progress
print('>%d f(%s) = %.5f' % (i, solution, solution_eval))
return [solution, solution_eval]

# define range for input

bounds = asarray([[-5.0, 5.0], [-5.0, 5.0]]) # define the total iterations
n_iter = 40
# define the step size step_size = 0.1
# perform the gradient descent search
solution, solution_eval = gradient_descent(objective, derivative, bounds, n_iter, C
step_size)
print("Solution: f(%s) = %.5f" % (solution, solution_eval))

Program 07.1: Gradient descent algorithm

This algorithm depends on not only the objective function but also its
derivative. Hence it may not suitable for all kinds of problems. This algorithm
also sensitive to the step size, which a too large step size with respect to the
objective function may cause the gradient descent algorithm fail to converge.
If this happens, we will see the progress is not moving toward lower value.
There are several variations to make gradient descent algorithm more
robust, for example:
▷ Add a momentum into the process, which the move is not only following
the gradient but also partially the average of gradients in previous
iterations.
▷ Make the step sizes different for each component of the vector x
▷ Make the step size adaptive to the progress

Your Task
For this lesson, you should run the example program above with a different
step_size and n_iter and observe the difference in the progress of the
algorithm. At what step_size you will see the above program not converge?
Then try to add a new parameter β to the gradient_descent() function as the
momentum weight, which the update rule now becomes

xnew = x − α × ∇f (x) − β × g
where g is the average∇of f (x) in, for example, five previous iterations. Do you
see any improvement to this optimization? Is it a suitable example for using
momentum?
Final Word Before You Go...

You made it. Well done! Take a moment and look back at how far you have come.
You discovered:
▷ The importance of optimization in applied machine learning.
▷ How to do grid search to optimize by exhausting all possible solutions.
▷ How to use SciPy to optimize your own function.
▷ How to implement hill-climbing algorithm for optimization.
▷ How to use simulated annealing algorithm for optimization.
▷ What is gradient descent, how to use it, and some variation of this
algorithm.
This is just the beginning of your journey with optimization for machine
learning. Keep practicing and developing your skills. Take the next step and
check out my book on Optimization for Machine Learning.

How Did You Go With The Crash-Course?

Did you enjoy this crash-course?
Do you have any questions or sticking points?

Let me know, send me an email at: [email protected]

Take the Next Step

Looking for more help with Optimization for Machine

Learning? Grab my new book:

Optimization for Machine Learning Crash Course
https://machinelearningmastery.com/optimization-for-machine-learning/

Lecture 16 - Hyperparameter Tuning
No ratings yet
Lecture 16 - Hyperparameter Tuning
10 pages
Deep Learning Module 3
No ratings yet
Deep Learning Module 3
15 pages
Optimization For Machine Learning
100% (5)
Optimization For Machine Learning
402 pages
Unit V
No ratings yet
Unit V
26 pages
Deep Learning Andrew NG
100% (3)
Deep Learning Andrew NG
173 pages
3 - InnovatiCS - Introduction To CRISP-DM
No ratings yet
3 - InnovatiCS - Introduction To CRISP-DM
35 pages
MS&E 448 Final Presentation High Frequency Algorithmic Trading
No ratings yet
MS&E 448 Final Presentation High Frequency Algorithmic Trading
29 pages
HC900 Control Designer SW Man 51 52 25 110r15 2013 09
No ratings yet
HC900 Control Designer SW Man 51 52 25 110r15 2013 09
306 pages
Operation Manual: Cam Positioner
No ratings yet
Operation Manual: Cam Positioner
138 pages
Data Science Machine Learning
No ratings yet
Data Science Machine Learning
369 pages
Introduction To Optimization-Lec1
No ratings yet
Introduction To Optimization-Lec1
36 pages
Subtitle
No ratings yet
Subtitle
3 pages
Slide 03
No ratings yet
Slide 03
50 pages
UQuAD10 Development of An Urdu Question Answering
No ratings yet
UQuAD10 Development of An Urdu Question Answering
22 pages
Optimal Hyperparameters For Deep LSTM-Networks For Sequence Labeling Tasks
No ratings yet
Optimal Hyperparameters For Deep LSTM-Networks For Sequence Labeling Tasks
34 pages
SEMIKRON DataSheet SKiiP 11AC126V1 25230030 PDF
No ratings yet
SEMIKRON DataSheet SKiiP 11AC126V1 25230030 PDF
4 pages
DeepSeek pzx2nv
No ratings yet
DeepSeek pzx2nv
56 pages
15-Hyperparameter Tuning - Batch Normalization-14!08!2024
No ratings yet
15-Hyperparameter Tuning - Batch Normalization-14!08!2024
4 pages
Tutorial Software Hmi Honeywell HCiX
No ratings yet
Tutorial Software Hmi Honeywell HCiX
107 pages
DL 4
No ratings yet
DL 4
15 pages
Project
No ratings yet
Project
25 pages
Hyperparameter Tuning
No ratings yet
Hyperparameter Tuning
4 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
31 pages
STM32MCU Basics
No ratings yet
STM32MCU Basics
16 pages
CS60010 - Deep - NN - PPTX (1) 2
No ratings yet
CS60010 - Deep - NN - PPTX (1) 2
50 pages
Hcf4069u PDF
No ratings yet
Hcf4069u PDF
11 pages
Tuning Decision Trees Python
No ratings yet
Tuning Decision Trees Python
50 pages
Unit III
No ratings yet
Unit III
19 pages
Neural Networks Cheat Sheet - 2020 PDF
No ratings yet
Neural Networks Cheat Sheet - 2020 PDF
14 pages
Tune: A Research Platform For Distributed Model Selection and Training
No ratings yet
Tune: A Research Platform For Distributed Model Selection and Training
8 pages
The Importance of Hyperparameters in Machine Learning
No ratings yet
The Importance of Hyperparameters in Machine Learning
8 pages
Modeling and Identification of Heat Exchanger Proc
No ratings yet
Modeling and Identification of Heat Exchanger Proc
13 pages
Lecture 17
No ratings yet
Lecture 17
33 pages
Machine Learning Exploring The Model
No ratings yet
Machine Learning Exploring The Model
17 pages
Controledge Hc900 - Version 7.2: 1 Software Overview
No ratings yet
Controledge Hc900 - Version 7.2: 1 Software Overview
8 pages
Intro To Machine Learning With PyTorch
No ratings yet
Intro To Machine Learning With PyTorch
48 pages
F20BC - CW - 2020 - Biologically Inspired
No ratings yet
F20BC - CW - 2020 - Biologically Inspired
6 pages
Optimization For Machine Learning: Finding Function Optima With Python
0% (1)
Optimization For Machine Learning: Finding Function Optima With Python
21 pages
Machine Learning
No ratings yet
Machine Learning
25 pages
Adam vs. SGD - Closing The Generalization Gap On Image Classification
No ratings yet
Adam vs. SGD - Closing The Generalization Gap On Image Classification
7 pages
Chap-6 Machine Learning Introduction
No ratings yet
Chap-6 Machine Learning Introduction
49 pages
Optimization of Apparel Supply Chain Using Deep Reinforcement Learning
No ratings yet
Optimization of Apparel Supply Chain Using Deep Reinforcement Learning
9 pages
SDSC4008 08 Performance
No ratings yet
SDSC4008 08 Performance
39 pages
Hyper Parameter Turning
No ratings yet
Hyper Parameter Turning
4 pages
Omnet++ and Mosaik: Enabling Simulation of Smart Grid Communications
No ratings yet
Omnet++ and Mosaik: Enabling Simulation of Smart Grid Communications
5 pages
DCN El
No ratings yet
DCN El
17 pages
Modern Control Systems: Authors and Organization
No ratings yet
Modern Control Systems: Authors and Organization
4 pages
Advanced Techniques in Machine Learning and Optimization
No ratings yet
Advanced Techniques in Machine Learning and Optimization
8 pages
Professional Machine Learning Engineer - 6
No ratings yet
Professional Machine Learning Engineer - 6
5 pages
Lecture-16 Machine Learning With Python
No ratings yet
Lecture-16 Machine Learning With Python
39 pages
Lec 2
No ratings yet
Lec 2
5 pages
8 Id1308 Jte Van Hoang Phuoc Toan 6447
No ratings yet
8 Id1308 Jte Van Hoang Phuoc Toan 6447
7 pages
Module3 Notes
No ratings yet
Module3 Notes
18 pages
Pre-Owned Car Price Prediction Using Machine Learning Techniques
No ratings yet
Pre-Owned Car Price Prediction Using Machine Learning Techniques
5 pages
1 s2.0 S1369703X22004338 Main
No ratings yet
1 s2.0 S1369703X22004338 Main
21 pages
ML 1 Project
No ratings yet
ML 1 Project
2 pages
Module2.3 Hyperparameter Optimization
No ratings yet
Module2.3 Hyperparameter Optimization
29 pages
03-Introduction To Machine Learning - DNN
No ratings yet
03-Introduction To Machine Learning - DNN
35 pages
Intro DL 01
No ratings yet
Intro DL 01
64 pages
stm32 Education Step2
100% (1)
stm32 Education Step2
11 pages
Purities Prediction in A Manufacturing Froth Flotation Plant The Deep Learning Techniques
No ratings yet
Purities Prediction in A Manufacturing Froth Flotation Plant The Deep Learning Techniques
12 pages
Badodd: Bangladeshi Autonomous Driving Object Detection Dataset
No ratings yet
Badodd: Bangladeshi Autonomous Driving Object Detection Dataset
7 pages
GlobalLogic - Optimization Algorithms For Machine Learning
No ratings yet
GlobalLogic - Optimization Algorithms For Machine Learning
4 pages
Probability, Optimization in ML
No ratings yet
Probability, Optimization in ML
19 pages
Powell MATO Final Jan 23 2024
No ratings yet
Powell MATO Final Jan 23 2024
129 pages
Machine Learning Coms-4771: Alina Beygelzimer Tony Jebara, John Langford, Cynthia Rudin
No ratings yet
Machine Learning Coms-4771: Alina Beygelzimer Tony Jebara, John Langford, Cynthia Rudin
17 pages
Module 4
No ratings yet
Module 4
28 pages
01 - Introduction
No ratings yet
01 - Introduction
35 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
8 pages
Guidebook Machine Learning Basics PDF
100% (1)
Guidebook Machine Learning Basics PDF
16 pages
Machine Learning
No ratings yet
Machine Learning
10 pages
THE Roadmap: Machine Learning
No ratings yet
THE Roadmap: Machine Learning
9 pages
Twenty Frequently Asked Interview Questions and Answers
No ratings yet
Twenty Frequently Asked Interview Questions and Answers
8 pages
Iu 3.6.4 ML 101
No ratings yet
Iu 3.6.4 ML 101
39 pages
Unit 5 Intro To Machine Learning
No ratings yet
Unit 5 Intro To Machine Learning
25 pages
Machine Learning HC
No ratings yet
Machine Learning HC
4 pages
DAIOT UNIT 5 (1) Own
No ratings yet
DAIOT UNIT 5 (1) Own
13 pages
Machine Learning - Stanford University - Coursera
No ratings yet
Machine Learning - Stanford University - Coursera
16 pages
Introduction
No ratings yet
Introduction
4 pages
Deep Learning With Python
100% (6)
Deep Learning With Python
396 pages
Transformers For Machine Learning A Deep Dive (Uday Kamath, Kenneth L. Graham, Wael Emara)
100% (12)
Transformers For Machine Learning A Deep Dive (Uday Kamath, Kenneth L. Graham, Wael Emara)
284 pages
Machine Learning With Python
100% (14)
Machine Learning With Python
692 pages
Deep Learning in Computer Vision - Principles and Applications
100% (3)
Deep Learning in Computer Vision - Principles and Applications
339 pages
LLM Application Through Production
100% (11)
LLM Application Through Production
254 pages
Hang Li - Machine Learning Methods-Springer (2023) (Z-Lib - Io)
100% (9)
Hang Li - Machine Learning Methods-Springer (2023) (Z-Lib - Io)
530 pages
Sinan Ozdemir - Quick Start Guide To Large Language Models - Strategies and Best Practices For Using ChatGPT and Other LLMs-Addison-Wesley Professional (2023)
100% (5)
Sinan Ozdemir - Quick Start Guide To Large Language Models - Strategies and Best Practices For Using ChatGPT and Other LLMs-Addison-Wesley Professional (2023)
326 pages
Artificial Intelligence With Python (Machine Learning Foundations, Methodologies, and Applications) (Teik Toe Teoh, Zheng Rong)
93% (15)
Artificial Intelligence With Python (Machine Learning Foundations, Methodologies, and Applications) (Teik Toe Teoh, Zheng Rong)
334 pages
Modern Deep Learning For Tabular Data
83% (6)
Modern Deep Learning For Tabular Data
855 pages
Machine Learning From Scratch PDF
88% (8)
Machine Learning From Scratch PDF
124 pages
Deep Learning in Natural Language Processing PDF
100% (9)
Deep Learning in Natural Language Processing PDF
338 pages
Machine Learning
100% (11)
Machine Learning
135 pages
Deep Learning With PyTorch Guide For Beginners and Intermediate
100% (7)
Deep Learning With PyTorch Guide For Beginners and Intermediate
120 pages
Python Machine Learning For Beginners Ebook Final
100% (11)
Python Machine Learning For Beginners Ebook Final
305 pages
Data Structure and Algorithms With Python
100% (14)
Data Structure and Algorithms With Python
369 pages
Natural Language Processing With PyTorch - Build Intelligent Language Applications Using Deep Learning PDF
100% (14)
Natural Language Processing With PyTorch - Build Intelligent Language Applications Using Deep Learning PDF
210 pages
Machine Learning
100% (5)
Machine Learning
35 pages
Full Course of Machine Learning
100% (16)
Full Course of Machine Learning
660 pages
Deep Learning For NLP and Speech Recogni
100% (7)
Deep Learning For NLP and Speech Recogni
640 pages
(Studies in Computational Intelligence) Witold Pedrycz, Shyi-Ming Chen - Deep Learning - Algorithms and Applications-Springer (2020)
100% (7)
(Studies in Computational Intelligence) Witold Pedrycz, Shyi-Ming Chen - Deep Learning - Algorithms and Applications-Springer (2020)
368 pages
Algorithms For Data Science 1st Brian Steele (WWW - Ebook DL - Com)
100% (15)
Algorithms For Data Science 1st Brian Steele (WWW - Ebook DL - Com)
438 pages
Machine Learning - An Applied Mathematics Introduction PDF
100% (13)
Machine Learning - An Applied Mathematics Introduction PDF
246 pages
Deep Learning - Fundamentals, Theory and Applications 2019 PDF
100% (10)
Deep Learning - Fundamentals, Theory and Applications 2019 PDF
168 pages
Hackers Guide To Machine Learning With Python PDF
100% (15)
Hackers Guide To Machine Learning With Python PDF
272 pages
Understanding Machine Learning
100% (69)
Understanding Machine Learning
416 pages
Machine Learning Projects in Python
100% (16)
Machine Learning Projects in Python
135 pages
Learning The Pandas Library Python Tools For Data Munging Analysis and Visual PDF
100% (18)
Learning The Pandas Library Python Tools For Data Munging Analysis and Visual PDF
208 pages
Machine Learning Masterclass
100% (11)
Machine Learning Masterclass
108 pages
Machine Learning in Python: Hands on Machine Learning with Python Tools, Concepts and Techniques
From Everand
Machine Learning in Python: Hands on Machine Learning with Python Tools, Concepts and Techniques
Abiprod Pty Ltd
5/5 (10)
Machine Learning with Python: Design and Develop Machine Learning and Deep Learning Technique using real world code examples
From Everand
Machine Learning with Python: Design and Develop Machine Learning and Deep Learning Technique using real world code examples
Abhishek Vijayvargia
No ratings yet
Python Text Mining: Perform Text Processing, Word Embedding, Text Classification and Machine Translation
From Everand
Python Text Mining: Perform Text Processing, Word Embedding, Text Classification and Machine Translation
Alexandra George
No ratings yet
Practical Full Stack Machine Learning: A Guide to Build Reliable, Reusable, and Production-Ready Full Stack ML Solutions
From Everand
Practical Full Stack Machine Learning: A Guide to Build Reliable, Reusable, and Production-Ready Full Stack ML Solutions
Alok Kumar
No ratings yet
Optimizing AI and Machine Learning Solutions: Your ultimate guide to building high-impact ML/AI solutions (English Edition)
From Everand
Optimizing AI and Machine Learning Solutions: Your ultimate guide to building high-impact ML/AI solutions (English Edition)
Mirza Rahim Baig
No ratings yet
Mastering Machine Learning Algorithms - Second Edition: Expert techniques for implementing popular machine learning algorithms, fine-tuning your models, and understanding how they work, 2nd Edition
From Everand
Mastering Machine Learning Algorithms - Second Edition: Expert techniques for implementing popular machine learning algorithms, fine-tuning your models, and understanding how they work, 2nd Edition
Giuseppe Bonaccorso
2/5 (1)
PYTHON MACHINE LEARNING: Leveraging Python for Implementing Machine Learning Algorithms and Applications (2023 Guide)
From Everand
PYTHON MACHINE LEARNING: Leveraging Python for Implementing Machine Learning Algorithms and Applications (2023 Guide)
Roberta Bowman
No ratings yet
Mastering Classification Algorithms for Machine Learning: Learn how to apply Classification algorithms for effective Machine Learning solutions (English Edition)
From Everand
Mastering Classification Algorithms for Machine Learning: Learn how to apply Classification algorithms for effective Machine Learning solutions (English Edition)
PARTHA MAJUMDAR
No ratings yet
Analysis and Design of Algorithms: A Beginner’s Hope
From Everand
Analysis and Design of Algorithms: A Beginner’s Hope
Shefali Singhal
No ratings yet
GROKKING ALGORITHM BLUEPRINT: A Comprehensive Beginner's Guide to Learn the Realms of Grokking Algorithms from A-Z and Become Efficient Programmers
From Everand
GROKKING ALGORITHM BLUEPRINT: A Comprehensive Beginner's Guide to Learn the Realms of Grokking Algorithms from A-Z and Become Efficient Programmers
William Turner
No ratings yet
Neural Networks: Neural Networks Tools and Techniques for Beginners
From Everand
Neural Networks: Neural Networks Tools and Techniques for Beginners
John Slavio
5/5 (10)
Next Level Deep Machine Learning: Complete Tips and Tricks to Deep Machine Learning
From Everand
Next Level Deep Machine Learning: Complete Tips and Tricks to Deep Machine Learning
Joe Grant
No ratings yet
Python Machine Learning: A Practical Beginner's Guide to Understanding Machine Learning, Deep Learning and Neural Networks with Python, Scikit-Learn, Tensorflow and Keras
From Everand
Python Machine Learning: A Practical Beginner's Guide to Understanding Machine Learning, Deep Learning and Neural Networks with Python, Scikit-Learn, Tensorflow and Keras
Brandon Railey
No ratings yet
Python Machine Learning For Beginners: Handbook For Machine Learning, Deep Learning And Neural Networks Using Python, Scikit-Learn And TensorFlow
From Everand
Python Machine Learning For Beginners: Handbook For Machine Learning, Deep Learning And Neural Networks Using Python, Scikit-Learn And TensorFlow
Finn Sanders
No ratings yet
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Python Machine Learning: A Step by Step Beginner’s Guide to Learn Machine Learning Using Python
From Everand
Python Machine Learning: A Step by Step Beginner’s Guide to Learn Machine Learning Using Python
Brady Ellison
No ratings yet
Python Machine Learning Illustrated Guide For Beginners & Intermediates:The Future Is Here!
From Everand
Python Machine Learning Illustrated Guide For Beginners & Intermediates:The Future Is Here!
William Sullivan
4.5/5 (2)
Introduction to Algorithms and Data Structures: A solid foundation for the real world of machine learning and data analytics
From Everand
Introduction to Algorithms and Data Structures: A solid foundation for the real world of machine learning and data analytics
Bolakale Aremu
No ratings yet
Artificial Intelligence Interview Questions
From Everand
Artificial Intelligence Interview Questions
Tech Interviews
5/5 (2)
Machine Learning in Python: Hands on Machine Learning with Python Tools, Concepts and Techniques
From Everand
Machine Learning in Python: Hands on Machine Learning with Python Tools, Concepts and Techniques
Bob Mather
5/5 (1)
How to Build Self-Driving Cars From Scratch, Part 1: A Step-by-Step Guide to Creating Autonomous Vehicles With Python
From Everand
How to Build Self-Driving Cars From Scratch, Part 1: A Step-by-Step Guide to Creating Autonomous Vehicles With Python
Bolakale Aremu
No ratings yet
Machine Learning - A Comprehensive, Step-by-Step Guide to Learning and Applying Advanced Concepts and Techniques in Machine Learning: 3
From Everand
Machine Learning - A Comprehensive, Step-by-Step Guide to Learning and Applying Advanced Concepts and Techniques in Machine Learning: 3
Peter Bradley
No ratings yet

Optimization For Machine Learning Mini Course

Uploaded by

Optimization For Machine Learning Mini Course

Uploaded by

MACHINE LEARNING

Optimization for Machine Learning Crash Course

Reserved. Edition: v1.0

Find the latest version of this guide online at: http://MachineLearningMastery.com

Before We Get Started... 1

Function optimization is a field of mathematics to find the optima of a function.

Who is this crash course for?

You need to know:

You do NOT need to know:

▷ How to Setup a Python Environment for Machine Learning and Deep

Hang in there, don’t give up!

This is a function with two-dimensional input (x, y) and one-dimensional

from numpy import arange, inf

# define range for input

This code scan from the lowerbound of the−range 5 to upperbound +5 with

# define range for input r_min, r_max = -5.0, 5.0

Program 03.1: Nelder-Mead algorithm to

Nelder-Mead algorithm needs a starting point. We choose a random point−in the

from numpy import e, pi, cos, sqrt, exp

from scipy.optimize import minimize

# derivative of the objective function

# define range for input r_min, r_max = -5.0, 5.0

Program 04.1: BFGS algorithm to find

The first-order derivative of the objective function is provided to the

The idea of hill-climbing is to start from a point on the objective function.

from numpy.random import randn

def in_bounds(point, bounds):

def hillclimbing(objective, bounds, n_iterations, step_size):

if candidte_eval <= solution_eval: # store the new point

Program 05.1: Hill climbing local search

This function allows any objective function to be passed as long as it takes

bounds = np.asarray([[-5.0, 5.0], [-5.0, 5.0]])

from numpy.random import randn, rand

def simulated_annealing(objective, bounds, n_iterations, step_size, temp): # generate an initial p

# calculate temperature for current epoch t = temp / float(i + 1)

Program 06.1: Simulated annealing algorithm

Similar to the hill-climbing algorithm in the previous lesson, the function

from numpy import asarray

# derivative of the objective function

# gradient descent algorithm

# define range for input

Program 07.1: Gradient descent algorithm

How Did You Go With The Crash-Course?

Let me know, send me an email at: [email protected]

Looking for more help with Optimization for Machine

Learning? Grab my new book:

You might also like