ML Unit-5
ML Unit-5
1
Bayesian Belief Network
The generalized form of Bayesian network that represents and solve decision
problems under uncertain knowledge is known as an Influence diagram.
4
Bayesian Belief Network
• Each node corresponds to the random variables, and a variable can be continuous
or discrete.
• Arc or directed arrows represent the causal relationship or conditional probabilities
between random variables. These directed links or arrows connect the pair of nodes
in the graph.
• These links represent that one node directly influence the other node, and if there is
no directed link that means that nodes are independent with each other
5
Bayesian Belief Network
• In the above diagram, A, B, C, and D are random variables represented by the nodes
of the network graph.
• If we are considering node B, which is connected with node A by a directed arrow,
then node A is called the parent of Node B.
• Node C is independent of node A.
• The Bayesian network graph does not contain any cyclic graph. Hence, it is known as a
directed acyclic graph or DAG
6
Bayesian network is based on Joint probability distribution and conditional probability
Probability Basics
7
Bayesian Belief Network
Joint Probability
• Joint probability is the likelihood of more than one event
occurring at the same time P(A and B).
• The probability of event A and event B occurring together. It
is the probability of the intersection of two or more events
written as p(A ∩ B).
Example: The probability that a card is a four and red =p(four
and red) = 2/52=1/26.
(There are two red fours in a deck of 52, the 4 of hearts and the
4 of diamonds).
8
Bayesian Belief Network
Example: Harry installed a new burglar alarm at his home to detect
burglary.
• The alarm reliably responds at detecting a burglary but also
responds for minor earthquakes.
• Harry has 2 neighbors David &Sophia, who have taken a
responsibility to inform Harry at work when they hear the
alarm.
• David always calls Harry when he hears the alarm, but
sometimes he got confused with the phone ringing and calls at
that time too.
• On the other hand, Sophia likes to listen to high music, so
sometimes she misses to hear the alarm.
• Here we would like to compute the probability of Burglary
Alarm.
9
Bayesian Belief Network
10
Bayesian Belief Network
11
Bayesian Belief Network
We can write the events of problem statement in the form of
probability: P[D, S, A, B, E], can rewrite the above probability
statement using joint probability distribution:
13
Bayesian Belief Network
14
Bayesian Belief Network
15
Bayesian Belief Network
16
Bayesian Belief Network
17
Bayesian Belief Network
= 0.00068045.
Hence, a Bayesian network can answer any query about the domain by
using Joint distribution.
18
Active Learning
•Understanding passive/active learning
•Introduction to active learning
•Why active learning and its significance
•Active learning basic architecture/life cycle
•Active learning strategy and its working
•Use cases for active learning
19
Understanding Passive Learning
■Passive learning, the standard framework in which
a large quantity of labelled data is passed to the
algorithm, requires significant effort in labelling the
entire set of data.
20
Understanding Active Learning
■By using active learning, we can selectively leverage a system like crowd-
sourcing, to ask human experts to selectively label some items in the data set,
but not have to label the entirety.
■The algorithm iteratively selects the most informative examples based on some
value metric and sends those unlabelled examples to a labelling oracle, who
returns the true labels for those queried examples back to the algorithm.
21
Introduction: Active Learning
■ The primary goal of machine learning is to derive general patterns from a
limited amount of data.
■ For most of supervised and unsupervised learning tasks, what we usually do is
to gather a significant quantity of data which is randomly sampled from the
underlying population distribution and then we induce a classifier or model.
■ But this process is some kind of passive!
■ Often the most time-consuming and costly task in the process is the gathering
the data.
■ Example: document classification.
■ Easy to get large pool of unlabeled document, but it will take a long time for
people to hand-label thousands of training document.
22
Introduction: Active Learning
■Now, instead of randomly picking documents to
be manually labeled fro our training set, we want
to choose and query documents from the pool
very carefully.
■Based on this carefully choosing training data, we
can improve the model’s performance very
quickly.
23
Introduction: Active Learning
■ Active learning is the subset of machine learning in which a learning
algorithm can query a user interactively to label data with the desired
outputs.
■ A growing problem in machine learning is the large amount of
unlabeled data, since data is continuously getting cheaper to collect
and store.
■ Active learning is the subset of machine learning in which a learning
algorithm can query a user interactively to label data with the desired
outputs. In active learning, the algorithm proactively selects the subset
of examples to be labeled next from the pool of unlabeled data.
24
Introduction: Active Learning
■ The fundamental belief behind the active learner algorithm concept
is that an ML algorithm could potentially reach a higher level of
accuracy while using a smaller number of training labels if it were
allowed to choose the data it wants to learn from.
■ Therefore, active learners are allowed to interactively pose queries
during the training stage.
■ These queries are usually in the form of unlabeled data instances and
the request is to a human annotator to label the instance.
■ This makes active learning part of the human-in-the-loop paradigm,
where it is one of the most powerful examples of success.
25
Why Active Learning?
■Most supervised machine learning models require large amounts of data
to be trained with good results. And even if this statement sounds naive,
most companies struggle to provide their data scientists this data, in
particular labelled data. The latter is key to train any supervised model and
can become the main bottleneck for any data team.
■In most cases, data scientists are provided with a big, unlabelled data sets
and are asked to train well-performing models with them. Generally, the
amount of data is too large to manually label it, and it becomes quite
challenging for data teams to train good supervised models with that data.
26
Significance of Active Learning
■ Active Learning is a “human-in-the-loop” type of Deep Learning
framework that uses a large dataset of which only a small portion
(say 10%) is labeled for model training. Say there is a dataset of
1,000 samples, of which 100 are labeled. An Active Learning-based
model will train on the 100 samples and make predictions on the
rest of the 900 samples (test set). Suppose, of these 900 samples,
the confidence in prediction was very low for 10 samples. The
model will now ask a human user to provide it with the labels for
these 10 samples. That is, an Active Learning framework
is interactive, and that’s how the name “Active” was coined.
27
Active Learning: Basic Architecture
28
Active Learning Cycle
34
How does Active Learning work?
■ Pool-based sampling
■ This is the most well known scenario for active learning.
■ In this sampling method, the algorithm attempts to evaluate the
entire dataset before it selects the best query or set of queries.
■ The active learner algorithm is often initially trained on a fully
labeled part of the data which is then used to determine which
instances would be most beneficial to insert into the training set for
the next active learning loop.
■ The downside of this method is the amount of memory it can
require.
35
How does Active Learning work?
■
Membership query synthesis
■
This scenario is not applicable to all cases, because
it involves the generation of synthetic data.
■
The active learner in this method is allowed to
create its own examples for labeling.
■
This method is compatible with problems where it
is easy to generate a data instance.
36
Active Learning Use Cases
■ Active learning has found a number of applications in areas such as text categorization,
document classification, and image recognition. It has also been used for cancer detection
and drug discovery.
■ Text Categorization
■ One of the most common applications of active learning is text categorization, which is the task
of assigning a category to a piece of text. In this application, the categories are usually a set of
predefined labels such as “news”, “sports”, “entertainment”, and “opinion”. The goal is to
automatically assign each piece of text to one of these categories.
■ Document Classification
■ Active learning can also be used for document classification, which is the task of automatically
assigning a class to a document. In this application, the classes are usually a set of predefined
labels such as “technical document”, “marketing document”, and “legal document”.
37
Active Learning Use Cases
■ Image Recognition
■ Image recognition is another area where active learning can be used.
In this example, we have an image and we’d like our annotators to
label only relevant regions in the image. In other words, we need to
make sure that each labeled region contributes maximum
information for classifying the image. To achieve this objective, active
learning will pick up the most interesting regions from unlabelled
data and let them be processed by annotators.
■ This way, annotators don’t waste any time on labeling redundant
parts of an image that would have remained untagged if they
were just blindly assigning labels to all regions in an image. 38
learning orcement
Learning
•Reinforcement learning is an area of Machine Learning.
•It is about taking suitable action to maximize reward in a particular
situation.
• It is employed by various software and machines to find the best
possible behavior or path it should take in a specific situation.
• Reinforcement learning differs from supervised learning in a way
that in supervised learning the training data has the answer key with
it so the model is trained with the correct answer itself whereas in
reinforcement learning, there is no answer but the reinforcement
agent decides what to do to perform the given task.
•In the absence of a training dataset, it is bound to learn from its
experience.
39
•Reinforcement Learning (RL) is the science of decision
making.
•It is about learning the optimal behavior in an
environment to obtain maximum reward.
•In RL, the data is accumulated from machine learning
systems that use a trial-and-error method.
•Data is not part of the input that we would find in
supervised or unsupervised machine learning.
40
○ Reinforcement Learning is a feedback-based Machine
learning technique in which an agent learns to behave
in an environment by performing the actions and
seeing the results of actions. For each good action, the
agent gets positive feedback, and for each bad action,
the agent gets negative feedback or penalty.
○ In Reinforcement Learning, the agent learns
automatically using feedbacks without any labeled data,
unlike supervised learning.
○ Since there is no labeled data, so the agent is bound to learn by its
41
42
•Reinforcement learning uses algorithms that learn from
outcomes and decide which action to take next.
•After each action, the algorithm receives feedback that
helps it determine whether the choice it made was correct,
neutral or incorrect.
•It is a good technique to use for automated systems that
have to make a lot of small decisions without human
guidance.
•Reinforcement learning is an autonomous, self- teaching
system that essentially learns by trial and error.
•It performs actions with the aim of maximizing rewards, or
in other words, it is learning by doing in order to achieve the
best outcomes.
43
Reinforcement
Learning
1. Policy
2. Reward function
3. Value function
4. Model of the environment
44
Policy: Policy defines the learning agent behavior
for given time period. It is a mapping from perceived
states of the environment to actions to be taken
when in those states.
Reward function: Reward function is used to define
a goal in a reinforcement learning problem.A reward
function is a function that provides a numerical score
based on the state of the environment
Value function: Value functions specify what is
good in the long run. The value of a state is the total
amount of reward an agent can expect to
accumulate over the future, starting from that state. 45
Model of the environment: Models are used
for planning.
46
•Example: The problem is as follows: We have an
agent and a reward, with many hurdles in between.
The agent is supposed to find the best possible path to
reach the reward. The following problem explains the
problem more easily.
47
48
•The image shows the robot, diamond, and fire.
•The goal of the robot is to get the reward that is the
diamond and avoid the hurdles that are fired.
•The robot learns by trying all the possible paths and then
choosing the path which gives him the reward with the
least hurdles.
• Each right step will give the robot a reward and each
wrong step will subtract the reward of the robot.
•The total reward will be calculated when it reaches the
final reward that is the diamond.
49
Main points in Reinforcement
Learning
50
51
Application of Reinforcement
Learnings
52
Advantages of Reinforcement
learning
55
History of GAs
■As early as 1962, John Holland's work on adaptive
systems laid the foundation for later developments.
■By the 1975, the publication of the book Adaptation
in Natural and Artificial Systems, by Holland and his
students and colleagues
56
■early to mid-1980s, genetic algorithms were being
applied to a broad range of subjects.
■ In 1992 John Koza has used genetic algorithm to
evolve programs to perform certain tasks. He called
his method "genetic programming" (GP).
57
What is GA
■A genetic algorithm (or GA) is a search technique
used in computing to find true or approximate
solutions to optimization and search problems.
■(GA)s are categorized as global search heuristics.
■(GA)s are a particular class of evolutionary algorithms
that use techniques inspired by evolutionary biology
such as inheritance, mutation, selection, and
crossover (also called recombination)
58
■The evolution usually starts from a population of
randomly generated individuals and happens in
generations. •
59
■The new population is used in the next iteration of
the algorithm. •
■The algorithm terminates when either a maximum
number of generations has been produced, or a
satisfactory fitness level has been reached for the
population.
60
Vocabulary
■Individual - Any possible solution •
■Population - Group of all individuals •
■Fitness – Target function that we are optimizing
(each individual has a fitness) •
■Trait - Possible aspect (features) of an individual •
Genome - Collection of all chromosomes (traits) for
an individual
61
Basic Genetic Algorithm
■Start with a large “population” of randomly
generated “attempted solutions” to a problem •
■Repeatedly do the following: – Evaluate each of the
attempted solutions – (probabilistically) keep a
subset of the best solutions – Use these solutions to
generate a new population •
■Quit when you have a satisfactory solution (or you
run out of time
62
Example: the MAXONE
problem
■Suppose we want to maximize the number of ones in
a string of l binary digits
■It may seem so because we know the answer in
advance
■ However, we can think of it as maximizing the
number of correct answers, each encoded by 1, to l
yes/no difficult questions
63
Example (cont)
■An individual is encoded (naturally) as a string of l
binary digits •
■The fitness f of a candidate solution to the MAXONE
problem is the number of ones in its genetic code •
■We start with a population of n random strings.
Suppose that l = 10 and n = 6
64
Example (initialization)
65
66
67
Step 2: crossover
■Next we mate strings for crossover. For each couple
we first decide (using some pre-defined probability,
for instance 0.6) whether to actually perform the
crossover or not •
■ If we decide to actually perform crossover, we
randomly extract the crossover points, for instance 2
and 5
68
69
70
And now, iterate …
■In one generation, the total population fitness
changed from 34 to 37, thus improved by ~9%
■ At this point, we go through the same process all
over again, until a stopping criterion is met
71
GA Operators
■Methods of representation •
■Methods of selection •
■Methods of Reproduction
72
Common representation
methods
■Binary strings. •
■Arrays of integers (usually bound) •
■Arrays of letters
73
Methods of Selection
74
Methods of Selection
Roulette-wheel selection. •
Elitist selection. •
Fitness-proportionate selection. •
Scaling selection. •
Rank selection.
75
Roulette wheel selection
-Conceptually, this can be represented as a game of
roulette - each individual gets a slice of the wheel, but
more fit ones get larger slices than less fit ones.
76
Methods of Reproduction
77
Methods of Reproduction:
Crossover
78
79
80
Benefits of Genetic
Algorithms
■Concept is easy to understand •
■Modular, separate from application •
■Supports multi-objective optimization •
■Always an answer; answer gets better with time. •
■Easy to exploit previous or alternate solutions •
■Flexible building blocks for hybrid applications.
81
82
Introduction to Transfer
Learning in ML
■ Humans are extremely skilled at transferring knowledge from one task to another.
■ This means that when we face a new problem or task, we immediately recognize
it and use the relevant knowledge we have gained from previous learning
experiences.
■ This makes it easy to complete our tasks quickly and efficiently.
■ If a user can ride a bike and are asked to drive a motorbike, this is a good
example. Their experience with riding a bike will be helpful in such
situations. They can balance the bike and steer the motorbike.
■ This will make it easier than if they were a complete beginner. These lessons
are extremely useful in real life because they make us better and allow us to
gain more experience.
■ The same approach was used to introduce Transfer learning into machine
learning. This involves using knowledge from a task to solve a problem in the
target task.
■ Although most machine learning algorithms are designed for a single task, there is
an ongoing interest in developing transfer learning algorithms.
Why Transfer Learning?
■
One curious feature that many deep neural networks built
on images share is the ability to detect edges, colours,
intensities variations, and other features in the early layers.
■
These features are not specific to any particular task or
dataset.
■
It doesn't matter what kind of image we are using to detect
lions or cars.
■
These low-level features must be detected in both cases.
These features are present regardless of whether the image
data or cost function is exact.
■
These features can be learned in one task, such as detecting
lions. They can also be used to detect humans. Transfer
learning is exactly what this is.
Block Diagram
Freezed and Trainable Layers:
■
Machine learning is widely used by various e-commerce
and entertainment companies such as Amazon, Netflix,
etc., for product recommendation to the user. Whenever
we search for some product on Amazon, then we started
getting an advertisement for the same product while
internet surfing on the same browser and this is because
of machine learning.
■
Google understands the user interest using various
machine learning algorithms and suggests the product as
per customer interest.
■
As similar, when we use Netflix, we find some
recommendations for entertainment series, movies, etc.,
and this is also done with the help of machine learning.
5. Self-driving cars:
■One of the most exciting applications of machine
learning is self-driving cars. Machine learning plays a
significant role in self-driving cars. Tesla, the most
popular car manufacturing company is working on
self-driving car. It is using unsupervised learning
method to train the car models to detect people and
objects while driving.
6. Virtual Personal Assistant: