0% found this document useful (0 votes)
8 views77 pages

Introduction To ML and DL

This pdf will give a thorough insight into Machine Learning and Deep Learning

Uploaded by

Dev Damani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views77 pages

Introduction To ML and DL

This pdf will give a thorough insight into Machine Learning and Deep Learning

Uploaded by

Dev Damani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 77

An Introduction to Machine

Learning and Deep Learning


A quick overview of the ideas, language, and techniques we will be using
throughout the course

PSV Nataraj
References
Deep Learning: From Basics to Practice, Volumes 1 and 2, Andrew Glassner, The
Imaginary Institute, Seattle, WA, 2018 http://www.imaginary-institute.com.

Fundamentals of Machine Learning for Predictive Data Analytics


J. D. Kelleher, B. M. Namee and Aoife D'Arcy, MIT Press, 2015.

Deep Learning, Ian Goodfellow, Yoshua Bengio, Aaron Courville, MIT Press, 2017.
http://www.deeplearning- book.org/
Why this lecture?
• This Lecture is to help us get familiar with the big ideas and basic
terminology of machine learning.
• The phrase machine learning describes a growing body of techniques
that all have one goal: discover meaningful information from data.
• Here, “data” refers to anything that can be recorded and measured.
“Data” refers to anything that can be recorded or
measured.
• Data can be
• raw numbers (like stock prices on successive days, the mass of different
planets, the heights of people visiting a county fair).
• sounds (the words someone speaks into their cell phone),
• pictures (photographs of flowers or cats),
• words (the text of a newspaper article or a novel),
• or anything else that we want to investigate.
• “Meaningful information” is whatever we can extract from the data
that will be useful to us in some way.
• We decide what’s meaningful to us, and then we design an algorithm
to find as much of it as possible from our data.
What’s machine learning?
• The phrase “machine learning” describes a wide diversity of
algorithms and techniques.
• It’s used by so many people in so many different ways that it’s best to
consider it as:

A big, expanding collection of algorithms and principles that


analyze vast quantities of training data in order to extract
meaning from it.
What is deep learning?
• More recently, the phrase deep learning was coined to refer to

Approaches to machine learning that use specialized layers of


computation, stacked up one after the next.

• This makes a “deep” structure, like a stack of pancakes.


• Deep learning (DL) refers to the nature of the system we create, not
to any particular algorithm .
• DL really refers to this particular style or approach to machine
learning (ML).
Example applications that use machine
learning to extract meaning from data
Left: Getting a zip code from an envelope. Middle: Reading numbers &letters on a check.
Right: Recognizing faces from photos.
Extracting meaning from data.
Left: Turning a recording into sounds, then words, and ultimately a complete utterance.
Middle: Finding one unusual event in a particle output full of similar-looking trails.
Right: Predicting the whale population off Canada’s west coast
Common threads in ML applications

• Sheer volume of work involved, and its painstaking detail.


• We have millions of data to examine, and we want to extract some meaning from
every one of them.
• Why can’t humans do it ?
• Humans get tired, bored, and distracted,
• What about Computers?
• computers just plow on steadily and reliably.

ML has ability to extract meaningful information quickly, so are


used in many fields.
Expert systems can also find meaning from data
• Expert systems was an early approach to finding the meaning that’s hiding
inside of data.
• Idea:
• Study what human experts know and automate that.
• Make a computer mimic the human experts it was based on.
• Create a rule-based system: a large number of rules for the computer to
imitate human experts.
• Example: Recognize zip codes. 7’s have a horizontal line at the top, and a
diagonal line that starts at the right edge of the horizontal line and moves
left and down. Some people put a bar through the middle of their 7’s. So
now we add another rule for that special case.
But handcrafting rules is a tough job!
• This process of hand-crafting the rules to understand data is called
feature engineering

• term is also used to describe when we use the computer to find these
features for us.
• It’s easy to overlook one rule, or even lots of them. It's a tough job !
How does ML compare with Expert systems?
• Expert Systems: Difficult to manually find right set of rules, & make
sure they work properly across a wide variety of data. This difficulties
have doomed expert systems.
• ML Systems: Beauty is they learn a dataset’s characteristics
automatically.
• Don’t have to tell an algorithm how to recognize a cat or dog, because system
figures that out for itself.
• Flip side of ML: To do its job well, ML system often needs a lot of
data. Enormous amounts of data.
Why recent explosion in ML ?
• Why has machine learning has exploded in popularity and
applications in the last few years?
• Couple of big reasons:
A. Flood of data: provided by the Internet has let these tools extract a
lot of meaning from a lot of data.
Example: Online companies make use of every interaction with every customer to
accumulate more data. Then they use it as input to ML algorithms, getting more
information about customers.
B. Increased Computing power - GPUs
Compare ML &
DL
- Fit the best
line
• Find the best straight line
through a bunch of data points,
see Figure

• Given a set of data points (in


blue), we can imagine a
straightforward algorithm that
computes the best straight line
(in red) through those points.
ML vs DL – Line fitting example
• ML: Represent a straight line with • DL: Don’t know how to directly calculate the
just a few parameters. right answer, so we build a system that can
figure out how to do that itself.
• Use some formula to compute • For DL, we create an algorithm that can work
parameter values, given the input out its own answers, rather than
data points. implementing a known process that directly
yields an answer
• This is a familiar algorithm, uses
analysis to find the best way to solve • DL learns slowly. Every time program sees a
new piece of data, it improves its own
a problem. parameters
• Strategy used by many ML • DL ultimately finds a set of good values.
algorithms. • Much more open-ended than the one that
fits a straight line.
What’s a main reason for the
enormous success of deep
learning algorithms?
Heart of DL
success DL success is due to the programs
that find their own answers (apart
from availability of high
performance computing power).
What’s a classifier ?
• A classifier assigns a label to each sample describing which category or class, that
sample belongs to.
Example of Classifiers
• If the input is a song, classifier assigns the label as the genre (e.g., rock
or classical).
• If it’s a photo of an animal, the classifier assigns the label as the name
of the animal shown (e.g., a tiger or an elephant).
• In mountain weather for hiking, classifier may label the hiking
experience into 3 categories: Lousy, Good, and Great.
Example: Online
They use it as input to
companies (Amazon,
machine learning
Flipkart etc.) make use of
algorithms, getting more
every interaction with
information about
every customer to
customers.
accumulate more data.

Simple example of ML Systems


Example of Samples, Features, Labels
• Weather measurements on a mountain for hiking
• Sample is weather at a given moment.
• Features are measurements: temperature, wind speed, humidity, etc.
• Hand over each sample (with a value for each feature) to a human expert.
• Expert examines features and provides a label for that sample.
• Expert’s opinion, using a score from 0 to 100, tells how the day’s weather would be for
good hiking.
• Labels can be “Lousy”, “Good”, “Excellent” (weather for hiking)
• The idea is shown in next Figure .
Example of Samples,
Features, Labels .
(Contd.)
To label a dataset, we start
with a list of samples, or
data items.
Each sample is made up of
a list of features that
describe it.
We give the dataset to a
human expert, who
examines the features of
each sample one by one,
and assigns a label for that
sample.
A Computerized Learning Strategy
• First, collect as much data as possible.
• Call each piece of observed data (say, the weather at a given
moment) as sample,
• Call the names of the measurements that make it up (the
temperature, wind speed, humidity, etc.) as features.
• Hand over each sample (with a value for each feature) to a human
expert.
• Expert examines features and provides a label for that sample.
• Example: if our sample is a photo, the label might be the name of
the person or the type of animal in the photo.
A computerized learning
strategy .
Figure shows the idea of a
learning strategy - one step of
training or learning.
Split the sample’s features and
its label. From the features,
algorithm predicts a label.
Compare prediction with truth
label.
If predicted label matches
truth label, don’t do any thing.
Otherwise, tell algorithm to
update itself
The process is basically trial
and error.
First, split Data for Training & Validation
• First, set aside some of these labeled samples for time being (use them
later for validation).
• Give remaining labeled data to our computer, and ask it to find a way
to come up with the right label for each input.
• We do not tell it how to do this.
• Instead, we give labelled data to an algorithm with a large number of
parameters it can adjust (perhaps even millions of them).
• Different types of learning will use different algorithms.
Training step and Learning rate
• Each algorithm learns by changing the internal parameters it uses to create its
predictions.
• Big change: risk of changing them so much that it makes other predictions
worse.
• Small change: Cause learning to run slower.
• We have to find by trial and error for each type of algorithm the right trade-off
between these extremes.
• We call the amount of updating the learning rate,
• A small learning rate is cautious and slow,
• A large learning rate speeds things up but could backfire.
Testing or Validation step
• We now return to the labeled data kept aside in the last section.
• This is called as test data.
• We evaluate how the system can generalize what it learned, by
showing these samples that it’s never seen before.
• This test set shows how the system performs on new data.
Testing – Procedure for evaluating a classifier
Split the test data (not training
data) into features and labels.
Algorithm predicts a label for each
set of features.
Compare predictions with the
truth labels to get a measurement
of accuracy.
If it’s good enough, deploy the
system.
If the results aren’t good enough,
go back and train some more.

In this evaluation process there is


no feedback and no learning.
How to Retrain, if testing results are not good
• Use again original training set data. Note that these are the same samples.
• Shuffle this data first - but no new information.
• Show every sample again, letting it learn along the way again.
• Computer learns over and over again from the very same data.
• Now, show test data set .
• Ask algorithm to predict labels for the test set again.
• If the performance isn’t good enough, go back to original training set again,
and then test again.
• Repeat this process often hundreds of times. Let it learn just a little more each
time.
• Computer doesn’t get bored or cranky seeing the same data over and over.
Learning – Good and Bad News
• Bad News:
• No guarantee that there’s a successful learning algorithm for every
set of data,
• No guarantee that if there is one, we’ll find it.
• May not have enough computational resources to find the
relationship between the samples and their labels.
• Good news:
• Even without a mathematical guarantee, in practice we can often
find solutions that generalize very well, sometimes doing even
better than human experts.
• Learning algorithm modifies itself its own
parameter values, over time.
• Learning algorithm are also controlled by values
that we set (such as the learning rate we saw
above).
• These are called hyperparameters. Parameters and
• What’s the difference between between hyperparameters
parameters and hyperparameters ?
• Computer adjusts its own parameter values
during the learning process, while we specify
the hyperparameters when we write and run
our program.
• When the algorithm has learned enough to
perform well enough on the test set that we’re
satisfied, we’re ready to deploy, or release, our
When do we algorithm to the world.
• Users submit data and our system returns the
deploy the label it predicts.
System ? • That’s how pictures of faces are turned into
names, sounds are turned into words, and
weather measurements are turned into
forecasts.
Machine Learning – major categories

See the major


Let’s now get a big
categories that make
picture for the field
up the majority of
of machine learning.
today’s ML tools.
• Supervised learning (SL) is done for
samples with pre-assigned labels.

Supervised • Supervision comes from the labels.


• Labels guide the comparison step
Learning (ML) • There are two general types of
supervised learning, called
classification and regression.
Classification: look through a
given collection of categories
to find the one that best
describes a particular input.
Two Types of SL
Regression: take a set of
measurements and predict
some other value
SL – Classification
• Start training by providing a list of all the labels (or classes, or categories) that we
want it to learn.
• Make the list so that it has all the labels for all the samples in the training set,
with the duplicates removed.
• Train the system with lots of photos and their labels, until it does a good job of
predicting the correct label for each photo.
• Now, turn the system loose on new photos it hasn’t seen before.
• For those objects it saw during training, It should properly label images.
• Caution: For those objects it did not see during training, the system will try to
pick the best category from those it knows about.
• Next Figure shows the idea.
SL – Classification Example
• Example: Sort and label photos of everyday objects.
• We want to sort them : an apple peeler, a salamander, a piano, and so
on.
• We want to classify or categorize these photos.
• The process is called classification or categorization.
SL – Classification Example
In Figure, we used a trained
classifier to identify four
images never seen before.
The system had not been
trained on metal spoons or
headphones,
In both cases it found best
match it could.
To correctly identify those
objects, the system needs to
see multiple examples of
them during training.
SL- Regression .
(Example Music band
attendance)
Problem: An incomplete
collection of measurements
of attendance.
Estimate missing values of
attendance.
Data: Attendance at a series
of concerts at a local arena.
Problem: Unfortunately, we
lost count for one evening’s
performance.
Also want to know what
tomorrow’s attendance is
likely to be.
SL - Regression
.
Regression is process of filling in
or predicting data .
“Regression” uses statistical
properties of the data to estimate
missing or future values.
Most famous kind of regression
is linear regression.
Left: Linear regression fits a
straight line (red) to the data
(blue). The line is not a very
good match to the data, but it has
the benefit of being simple.
Right: Nonlinear regression fits
a curve to same data. This is a
better match to the data, but has
more complicated form and
requires more work (and thus
more time)
• What’s USL?
• When input data does not have labels, any
algorithm that learns from the data belongs to
USL.
Unsupervised • We are not “supervising” the learning process
Learning (USL) by offering labels.
• The system has to figure everything out on its
– a form of ML own, with no help from us.
• USL used for clustering, noise reduction, and
dimension reduction.
• Let’s look at these in turn.
USL for Clustering - Via Pottery Example
Using a clustering algorithm to organize marks on clay pots

Suppose we’re digging out .


foundation for a new house.
Surprise! we find the ground is filled
with old clay pots and vases.
Call an archaeologist, who says its a
jumbled collection of ancient pottery,
from many different places and
different times.
Archaeologist doesn’t recognize any
of the markings and decorations, so
she can’t declare for sure where each
one came from.
Some marks look like variations on
same theme, while others look like
different symbols.
• She takes rubbings of the markings, and then
tries to sort them into groups.
• But there are far too many of them for her to USL for
manage.
• She turns to a machine learning algorithm. Clustering -
• Why ML? Via Pottery
• To automatically group the markings together in
a sensible way.
Example
• On the right of previous figure, we show her (Contd.)
captured marks, and the groupings that could
be found automatically by an algorithm.
• This is a clustering problem
• The ML algorithm is a clustering
algorithm.
• There are many clustering algorithms to
choose from.
• Because our inputs are unlabeled, this
archaeologist is performing clustering,
using an unsupervised learning algorithm.

USL for Clustering


USL for Noise Reduction – Noisy Image Example
Figure shows a noisy image, &
how a de-noising algorithm .
cleans it up.
Why is de-noising a form of
unsupervised learning (USL)?
As we don’t have labels for our
data (for example, in a noisy
photo we just have pixels)
USL algorithm estimates what
part of sample is noise &
removes it.
By removing weird and missing
values from the input, learning
process happens more quickly
and smoothly.
USL for Dimensionality Reduction
Problem: Sometimes our samples have more features than they
need.
So, simplify data:
- Remove uninformative features,
- Combine redundant features, or
For these tasks, there are USL algorithms that can do the job.
USL finds a way to reduce the number of features of our data -
called as dimension reduction.
• Data: Weather samples in the desert at the height
of summer.
• Record daily wind speed, wind direction, and
rainfall.
• Given the season and locale, the rainfall value will
be 0 in every sample.
USL for
Dimensionality • If we use these samples in a machine learning
system, the computer will need to process and
Reduction interpret this useless, constant piece of
Example#1 Weather information with every sample.
• At best this would slow down the analysis.
• At worst it could affect the system’s accuracy,
because the computer would devote some of its
finite resources of time and memory to trying to
learn from this unchanging feature.
• Sometimes features contain redundant
data.
• A health clinic might take everyone’s weight
in kilograms when they walk into the door.
USL for Then when a nurse takes them to an
Dimensionality examination room, she measures their
Reduction weight again but this time in pounds.
Example#2 Health • Same information repeated twice, but it
Clinic might be hard to recognize that because
the values are different.
• Like the useless rainfall measurements, this
redundancy will not work to our benefit
Shooting a movie inside a Persian carpet warehouse.

Semi-
Supervised Problem: Want hundreds of carpets, all over the warehouse.

Learning Carpets on the floors, carpets on the walls, and carpets in


great racks in the middle of the space.
(Generators)
Example - Each carpet to look real, but be different from all the others.

Persian Carpets Our budget is nowhere near big enough to buy, or even
borrow, hundreds of unique carpets.
in Movie So instead, we buy just a few carpets, and then we give them
to our props department to make many fake carpets.
Semi-Supervised Learning (Generators)
Example - Persian Carpets in Movie

Figure shows a Persian carpet that


we’d like to generalize.
Semi-Supervised Learning (Generators)
Example - Persian Carpets in Movie
.
Figure shows some new fake
carpets based on the starting
image of previous figure.

Made by ML Algorithms.
• This process of data generation is
implemented by ML algorithms called
generators.
• Train generators with large numbers of
examples - so that they can produce new
Semi- • versions with lots of variation.
• We don’t need labels to train generators, so its
Supervised unsupervised learning techniques.
Learning • But we do give generators some feedback as
they’re learning, so they know if they’re
(Generators) making good enough fakes for us or not.
• A generator is in the middle ground. It doesn’t
have labels, but it is getting some feedback
from us. We call this middle ground semi-
supervised learning.
• Suppose you are take care of a friend’s three-year old
daughter.
• You have no idea what the young girl likes to eat.
• First dinner: make Pasta with butter. She likes it!
• Repeat this dinner for a week. She gets bored.
• Week 2: Add some cheese, and she likes it.
Reinforcement • Repeat this dinner for week 2. She gets bored.
Learning - • Week 3: Try pesto sauce. But girl refuses to take a bite.
• So pasta + marinara sauce , and she rejects that too.
Example#1
• Frustrated, you make a baked potato with cream. She
likes it!
• Weeks 3 & 4 : Try one recipe and one variation after
another, trying to develop a menu that the child will
enjoy.
• Only feedback: Little girl eats the meal, or she doesn’t.
• Approach to learning is Reinforcement Learning !
• Agent: Autonomous car
• Environment: Traffic/people on the
street.
RL Example #2: • Actions: Driving
• Feedback: Driving okay if following
traffic rules and keep everybody safe.
• Agent: DJ at a dance club
• Environment : Dancers
RL Example #3:
• Feedback: Like or dislike the music.
• Agent makes decisions and takes actions (the
chef).
• Environment is everything else in the universe
(the child).
• Environment gives feedback or a reward signal
Reinforcement to agent after every action
• Feedback tells how good or bad that action is.
Learning - • The reward signal is often just a single number,
formally where larger positive numbers mean the action
was considered better, while more negative
numbers can be seen as punishments.
• The reward signal is not a label, nor a pointer to
a specific kind of “correct answer.”
• Next figure shows the idea of RL
Reinforcement
Learning (Contd.)
In reinforcement learning,
An agent (who acts)
An environment
(everything except agent).

The agent acts, and the


environment responds by
sending feedback in the
form of a reward signal.
• The general plan of learning from mistakes
is the same, but the mechanism is
different.
• Supervised learning: system produces a
result (typically a category or a predicted
value), and then we compare it to the
correct result, which we provide.
How’s RL • Reinforcement learning: There is no
different from correct result. The data has no label.
SL? • There’s just feedback that tells us how well
we’re doing.
• Feedback tells that our action was “good”
or “bad.”
• In contrast to supervised learning
algorithms, the reward signal is not a label
and no idea/pointer to “correct answer.”
AI, ML, DL, RL
Simple definitions and explanations
Artificial Intelligence (AI)
• “Artificial” refers to something which is not made by a human or a non‐natural thing
• “Intelligence” means ability to understand or think.
• AI is intelligence exhibited by machines or software.

• It is also the name of the scientific field which studies how to create computers and
computer software that are capable of intelligent behavior.

• AI definition: “It is the study of how to train the computers so that computers can do
things which at present human can do better.”

• There is a misconception that Artificial Intelligence is a system, but it is not a system .

• AI is implemented in the system.


High‐profile examples of AI
• Autonomous vehicles (such as drones andself‐driving cars),
• Medical diagnosis,
• Creating art (such as poetry),
• Proving mathematical theorems,
• Playing games (such as Chess or Go),
• Search engines (such as Google search),
• Online assistants (such as Siri or Google assitant),
• Image recognition in photographs,
• Spam filtering,
• Predicting flight delays,
• Targeting online advertisements.[
High‐profile Sectors of AI
• Healthcare – IBM Watson
• Automotive – driverless car Tesla
• Finance and economics – fraud prevention in debit cards, stocks
• Government
• Video games
• Military – drones
• Advertising – digital footprints for customer targeting
• Art – visual art
Approaches to AI
• Cybernetics
• Symbolic ‐ Possibility that human intelligence could be reduced to
symbol manipulation.
• Sub‐symbolic (see next slide)
• Statistical Learning
Sub‐symbolic AI approaches
• Soft computing
• Machine learning
• Neural networks
• Support Vector Machines (SVM)
• Fuzzy systems
• Evolutionary computation
• Evolutionary algorithms
• Genetic algorithm
• Differential evolution
• Metaheuristic and Swarm Intelligence
• Ant colony optimization
• Particle swarm optimization
Statistical Learning approach to AI
• Hidden Markov models
• Information theory
• Bayesian Decision theory
Machine learning (ML)

• ML is the learning in which a machine can automatically learn


and improve from experience
• on its own, without being explicitly programmed.

• It is a subset of AI
• ML builds algorithms that are guided by data (rather than relying on human
programmers to provide explicit instructions)
• ML uses training sets to infer models that are more accurate than humans
could build on their own.
Neural Networks
• Within the field of ML, neural networks are a subset of algorithms
built around a model of artificial neurons spread across three or more
layers
• There are many other ML techniques that don’t rely on neural
networks.
Deep Learning and Reinforcement learning
• Deep learning and reinforcement learning are machine learning
approaches, which in turn are a part of AI tools.

• What makes DL and RL interesting is that they enable a computer to


develop rules on its own to solve problems.
What is Deep learning?

• Deep learning is part of ML methods, based on artificial neural


networks.
• A deep neural network (DNN) is an ANN with multiple layers between
the input and output layers.
• Deep learning uses multiple layers to progressively extract higher
level features from raw input.
• For example, in image processing, lower layers may identify edges,
while higher layer may identify human‐meaningful items such as
digits/letters or faces.
• Learning can be supervised, semi‐supervised, or unsupervised.
What is reinforcement learning?
• Reinforcement learning comes under machine learning
• RL is an autonomous, self‐teaching system that essentially learns by
trial and error.
• It performs actions with the aim of maximizing rewards, or in other
words, it is learning by doing in order to achieve the best outcomes.
• Example: This is similar to how we learn things like riding a bike where
in the beginning we fall off a lot and make too heavy and often erratic
moves, but over time we use the feedback of what worked and what
didn’t to fine‐tune our actions and learn how to ride a bike.
How is RL used ?
• Computers use RL: they try different actions, learn from the feedback
whether that action delivered a better result, and then reinforce the
actions that worked, i.e. reworking and modifying its algorithms
autonomously over many iterations until it makes decisions that
deliver the best result.
• Example: A robot learning how to walk. The robot first tries a large
step forward and falls. The outcome of a fall with that big step is a
data point the reinforcement learning system. Since the feedback was
negative, a fall, the system adjusts the action to try a smaller step.
The robot is able to move forward. This is an example of
reinforcement learning in action.
Differences
between
AI, ML, DL, RL, Optimization,Data mining, Stats
Differences between AL and ML
AI stands for Artificial intelligence,
where intelligence is defined ML stands for Machine Learning
acquisition of knowledge intelligence which is defined as the acquisition of
is defined as a ability to acquire and knowledge or skill
apply knowledge.
The aim is to increase chance of The aim is to increase accuracy, but it
success and not accuracy. does not care about success
It work as a computer program that It is a simple concept: machine takes
does smart work data and learns from data.
The goal is to simulate natural The goal is to learn from data on a
intelligence to solve complex certain task to maximize the
problem performance of machine on this task.
Differences between AL and ML (contd.)
ML allows system to learn new
AI is decision making.
things from data.
It leads to develop a system to
It involves in creating self learning
mimic human to respond behave
algorithms.
in a circumstances.

AI will go for finding the optimal ML will go for only solution for that
solution. whether it is optimal or not.

AI leads to intelligence or wisdom. ML leads to knowledge.


ML and Data mining
• Machine learning and data mining often employ the same methods and
overlap significantly.
• Machine learning focuses on prediction, based on known properties
learned from the training data,
• In machine learning, performance is usually evaluated with respect to the
ability to reproduce known knowledge
• Machine learning employs data mining methods as "unsupervised
learning"

• In data mining, the key task is the discovery of previously unknown


knowledge.
• Data mining focuses on the discovery of (previously) unknown properties in
the data (this is the analysis step of knowledge discovery in databases).
• Data mining uses many machine learning methods, but with different goals
ML and Optimization
• Machine learning also has intimate ties to optimization
• Many learning problems are formulated as minimization of some loss
function on a training set of examples.
• The difference between the two fields arises from the goal of
generalization:
• Optimization algorithms minimize the loss on a training set,
• Machine learning minimizes the loss on unseen samples.
ML and Statistics
• Machine learning and statistics are closely related fields.
• Ideas of machine learning have had a long pre‐history in statistics.
• Some statisticians have adopted methods from machine learning,
leading to a combined field that they call statistical learning.
DL and RL
• Deep learning and reinforcement learning are both systems that learn
autonomously.
• The difference between them is
• Deep learning is learning from a training set and then applying that
learning to a new data set,
• Reinforcement learning is dynamically learning by adjusting actions
based in continuous feedback to maximize a reward.
• Deep learning and reinforcement learning aren’t mutually exclusive.
• In fact, you might use deep learning in a reinforcement learning
system, which is referred to as deep reinforcement learning

You might also like