100% found this document useful (1 vote)

498 views22 pages

Reinforcement Learning - Chapter 2

The document summarizes key concepts related to reinforcement learning, including multi-armed bandit problems and Markov decision processes. It discusses exploration versus exploitation in reinforcement learning and describes the epsilon-greedy approach to balancing the two. The epsilon-greedy approach selects the highest-valued action with probability 1-epsilon and a random action with probability epsilon, allowing some exploration while still exploiting known rewards. It also provides examples of applications that can be modeled as multi-armed bandits, such as clinical trials, network routing, and online advertising.

Uploaded by

Sivasathiya G

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

498 views22 pages

Reinforcement Learning - Chapter 2

Uploaded by

Sivasathiya G

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

EASWARI ENGINEERING COLLEGE

(AUTONOMOUS)
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND
DATA SCIENCE

191AIC601T – REINFORCEMENT LEARNING

Unit II -Notes

III YEAR - B.TECH

PREPARED BY APPROVED BY

G.SIVASATHIYA,AP/AI&DS HOD/AI&DS
UNIT – 2
MULTI ARM BANDITS AND MARKOV DECISION
PROCESS

MULTI-ARMED BANDIT PROBLEM (MABP)

A bandit is defined as someone who steals your money. A one-

armed bandit is a simple slot machine wherein you insert a coin into the
machine, pull a lever, and get an immediate reward. But why is it called
a bandit? It turns out all casinos configure these slot machines in such
a way that all gamblers end up losing money!

A multi-armed bandit is a complicated slot machine wherein

instead of 1, there are several levers which a gambler can pull, with each
lever giving a different return. The probability distribution for the reward
corresponding to each lever is different and is unknown to the gambler.

The task is to identify which lever to pull in order to get maximum

reward after a given set of trials. Each arm chosen is equivalent to an
action, which then leads to an immediate reward.

Use Cases
Bandit algorithms are being used in a lot of research projects in
the industry. Listed some of their use cases in this section.

Clinical Trials

The well being of patients during clinical trials is as important as

the actual results of the study. Here, exploration is equivalent to
identifying the best treatment, and exploitation is treating patients as
effectively as possible during the trial.

Network Routing
Routing is the process of selecting a path for traffic in a network,
such as telephone networks or computer networks (internet). Allocation
of channels to the right users, such that the overall throughput is
maximised, can be formulated as a MABP.
Online Advertising
The goal of an advertising campaign is to maximise revenue from
displaying ads. The advertiser makes revenue every time an offer is
clicked by a web user. Similar to MABP, there is a trade-off between
exploration, where the goal is to collect information on an ad’s
performance using click-through rates, and exploitation, where we stick
with the ad that has performed the best so far.

Game Designing
Building a hit game is challenging. MABP can be used to test
experimental changes in game play/interface and exploit the changes
which show positive experiences for players.
EXPLORATION AND EXPLOITATION IN RL
Exploration
Exploration is more of a long-term benefit concept where it
allows the agent to improve its knowledge about each action which
could lead to long term benefit.

Exploitation
Exploitation basically exploits the agent’s current estimated
value and chooses the greedy approach to get the most reward.
However, the agent is being greedy with the estimated value and
not the actual value, so chances are it might not get the most
reward.

Let’s take an interesting example to understand Exploration-

Exploitation properly.

Let’s say your friend and you digging in the hope that they will get
diamond out of it. Your friend gets lucky and finds the diamond before
you and walks off happily.

By seeing this, you get a bit greedy and think that you might also
get lucky. So, you start digging at the same spot as your friend.
Your action is called the greedy action and the policy is called
the greedy policy.

However, in this situation the Greedy policy would fail because a

bigger diamond is buried where you were digging in the beginning.

However, when your friend found the diamond, the only knowledge
you got was the depth at which the diamond was buried. You do not
have the knowledge of what lies beyond that depth. In reality the
diamond may be where you were digging in the beginning or it may be
where your friend was digging, or it may be completely at a different
place.

With such partial knowledge about future states and future

rewards, our reinforcement learning agent will be in dilemma on whether
to exploit the partial knowledge to receive some rewards or it
should explore unknown actions which could result in much larger
rewards.

However, we cannot choose both explore and exploit

simultaneously.
ACTION-VALUE METHODS
We begin by looking more closely at some simple methods for
estimating the values of actions and for using the estimates to make
action selection decisions. In this chapter, we denote the true (actual)
value of action as , and the estimated value at the th play as
.

Recall that the true value of an action is the mean reward received
when that action is selected. One natural way to estimate this is by
averaging the rewards actually received when the action was selected. In
other words, if at the th play action has been chosen times prior to
, yielding rewards , then its value is estimated to be

(2.1)

If , then we define instead as some default value, such

as . As , by the law of large numbers converges
to .

We call this the sample-average method for estimating action

values because each estimate is a simple average of the sample of
relevant rewards. Of course this is just one way to estimate action
values, and not necessarily the best one.

Nevertheless, for now let us stay with this simple estimation

method and turn to the question of how the estimates might be used to
select actions.
The simplest action selection rule is to select the action (or one of
the actions) with highest estimated action value, that is, to select on
play one of the greedy actions, , for which . This
method always exploits current knowledge to maximize immediate
reward; it spends no time at all sampling apparently inferior actions to
see if they might really be better.

A simple alternative is to behave greedily most of the time, but every

once in a while, say with small probability , instead select an action at
random, uniformly, independently of the action-value estimates. We call
methods using this near-greedy action selection rule -greedy methods.

An advantage of these methods is that, in the limit as the number

of plays increases, every action will be sampled an infinite number of
times, guaranteeing that for all , and thus ensuring that all
the converge to . This of course implies that the probability of
selecting the optimal action converges to greater than , that is, to
near certainty. These are just asymptotic guarantees, however, and say
little about the practical effectiveness of the methods.

To roughly assess the relative effectiveness of the greedy and -

greedy methods, we compared them numerically on a suite of test
problems. This is a set of 2000 randomly generated -armed bandit
tasks with . For each action, , the rewards were selected from a
normal (Gaussian) probability distribution with mean and
variance .

The 2000 -armed bandit tasks were generated by reselecting

the 2000 times, each according to a normal distribution with
mean and variance . Averaging over tasks, we can plot the
performance and behavior of various methods as they improve with
experience over 1000 plays, as in Figure 2.1. We call this suite of test
tasks the 10-armed testbed.

Figure 2.1: Average performance of -greedy action-value

methods on the 10-armed testbed. These data are averages over 2000
tasks. All methods used sample averages as their action-value
estimates.
Action Value Function

No Exploration (Greedy Approach)

A naïve approach could be to calculate the q, or action value

function, for all arms at each timestep. From that point onwards, select
an action which gives the maximum q. The action values for each action
will be stored at each timestep by the following function:
It then chooses the action at each timestep that maximises the above
expression, given by:

However, for evaluating this expression at each time t, we will need

to do calculations over the whole history of rewards. We can avoid this
by doing a running sum. So, at each time t, the q-value for each action
can be calculated using the reward:

The problem here is this approach only exploits, as it always picks

the same action without worrying about exploring other actions that
might return a better reward. Some exploration is necessary to actually
find an optimal arm, otherwise we might end up pulling a suboptimal
arm forever.

Epsilon Greedy Approach

One potential solution could be to now, and we can then explore

new actions so that we ensure we are not missing out on a better choice
of arm. With epsilon probability, we will choose a random action
(exploration) and choose an action with maximum qt(a) with probability
1-epsilon.

With probability 1- epsilon – we choose action with maximum

value (argmaxa Qt(a))

With probability epsilon – we randomly choose an action from a

set of all actions A

For example, if we have a problem with two actions – A and B, the epsilon
greedy algorithm works as shown below:

This is much better than the greedy approach as we have an element of

exploration here. However, if two actions have a very minute difference
between their q values, then even this algorithm will choose only that
action which has a probability higher than the others.

MARKOV PROPERTY AND ITS PROCESSES.

THE MARKOV PROPERTY

Transition: Moving from one state to another is called

Transition.
Transition Probability: The probability that the agent will
move from one state to another is called transition probability.
The Markov Property state that :
“Future is Independent of the past given the present”
Mathematically we can express this statement as :

S[t] denotes the current state of the agent and s[t+1] denotes
the next state. What this equation means is that the transition
from state S[t] to S[t+1] is entirely independent of the past.

So, the RHS of the Equation means the same as LHS if the
system has a Markov Property. Intuitively meaning that our
current state already captures the information of the past states.

State Transition Probability :

As we now know about transition probability we can define

state Transition Probability as follows :
For Markov State from S[t] to S[t+1] i.e. any other successor state
, the state transition probability is given by:

We can formulate the State Transition probability into a State

Transition probability matrix by :
Each row in the matrix represents the probability from
moving from our original or starting state to any successor
state.Sum of each row is equal to 1.

Markov Process or Markov Chains

Markov Process is the memory less random process i.e. a

sequence of a random state S[1],S[2],….S[n] with a Markov
Property.

So, it’s basically a sequence of states with the Markov Property. It

can be defined using a set of states(S) and transition probability
matrix (P).The dynamics of the environment can be fully defined
using the States(S) and Transition Probability matrix(P).

But what random process means ?

To answer this question let’s look at a example:

The edges of the tree denote transition probability. From this
chain let’s take some sample. Now, suppose that we were sleeping
and the according to the probability distribution there is
a 0.6 chance that we will Run and 0.2 chance we sleep
more and again 0.2 that we will eat ice-cream. Similarly, we can
think of other sequences that we can sample from this chain.
Some samples from the chain :
 Sleep — Run — Ice-cream — Sleep

 Sleep — Ice-cream — Ice-cream — Run

In the above two sequences what we see is we get random set of

States(S) (i.e. Sleep,Ice-cream,Sleep ) every time we run the
chain.Hope, it’s now clear why Markov process is called random
set of sequences.

Markov Decision Process :

It is Markov Reward Process with a decisions.
Everything is same like MRP but now we have actual agency
that makes decisions or take actions.

It is a tuple of (S, A, P, R, 𝛾) where:

 S is a set of states,

 A is the set of actions agent can choose to take,

 P is the transition Probability Matrix,

 R is the Reward accumulated by the actions of the agent,

 𝛾 is the discount factor.

OPTIMAL VALUE FUNCTIONS

Solving a reinforcement learning task means, roughly,

finding a policy that achieves a lot of reward over the long run.
For finite MDPs, we can precisely define an optimal policy in the
following way.

Value functions define a partial ordering over policies. A

policy is defined to be better than or equal to a policy if its
expected return is greater than or equal to that of for all states.
In other words, if and only if for all .

There is always at least one policy that is better than or equal

to all other policies. This is an optimal policy. Although there may
be more than one, we denote all the optimal policies by .
They share the same state-value function, called the optimal state-
value function, denoted , and defined as, for all .

Optimal policies also share the same optimal action-value

function, denoted , and defined as, or all and .

For the state-action pair , this function gives the expected

return for taking action in state and thereafter following an
optimal policy. Thus, we can write in terms of asfollows:

TRACKING NON-STATIONARY PROBLEMS WITH A

CONSTANT STEP-SIZE APPROACH

The various types of RL-relevant problems you can

encounter have been fairly rigorously characterised over the
years. We’re going to focus on ‘Non-stationary problems’, this
subset of problems have true underlying values that change over
time.
This introduces an interesting dynamic between optimising
the total reward we receive and selecting the best actions to take
(they change over time).

This approach converges to the true value over time. For a

non-stationary problem the true value is going to vary over time
so intuitively an approach that converges to a single value isn’t
going to work well.

Because the goal changes over time the most useful data is
going to be from the most recent rewards. Older rewards are going
to be much less useful because the change to the action-values
will accumulate over time.

Starting with the incremental update formula for the sample-

average algorithm we’re going fix the step-size to be a constant,
then show that this achieves our goal, with Qn+1 depending on a
weighted average of the rewards.

Step one is to consider the the incremental sample-average

formula:

Incrementally calculating an estimate for the value of an action

Replace (1/n) with a constant: 𝛼

Step two is to rearrange the formula as follows

Collect together the Qn terms

Step three consider the first equation above, we can reformulate

it to be in terms of n, rather than n+1:

Just subtract 1 from every n

Step four, now substitute this into the final equation from step
two:

separate it out into three terms

If we look at the end of the equation you can see a (Qn-1) term.
Once again we can get an expression for (Qn-1) by subtracting 1
from n:

Substitute this in:

This is an infinite process — we’ll stop here because the pattern
is becoming clearer.

Step five, extend what we have discovered so far to the general

case using summation notation:

What does this mean?

We wanted an algorithm that puts more emphasis on recent

data. We can see that this is achieved here by weighting the sum
of the rewards. The contribution of each reward is greatest when
it is new, then expenentially decays away.

This approach is sometimes referred as exponential recency

weighted averages (ERWA).

UPPER-CONFIDENCE-BOUND ACTION SELECTION

Upper-Confidence Bound action selection uses uncertainty
in the action-value estimates for balancing exploration and
exploitation.
Since there is inherent uncertainty in the accuracy of the
action-value estimates when we use a sampled set of rewards
thus UCB uses uncertainty in the estimates to drive exploration.
Qt(a) here represents the current estimate for action a at
time t. We select the action that has the highest estimated action-
value plus the upper-confidence bound exploration term.
GRADIENT BANDITS
Another way to balance exploration and exploitation is the
gradient bandit algorithm.

So far we considered methods that estimate action values and then

use those to select other actions. This is often a good approach, but it is
not the only one possible. We can also consider learning some preference
Ht(a) (which is just a value) for each action a. The larger the preference,
the more often that action is taken.
Note that the preference has no interpretation in terms of reward.
Only the relative preference of one action over another is important.

The action probabilities are determined according to a soft-max

distribution (i.e. the prob. of taking the actions all sum up to 1):
Note the new notation π_t(a) = the probability of taking action “a” at
time t.
When we’re doing this, we update it with stochastic gradient
descent

Taking our current preference of a state H_t(a), and update it based on

the reward received (R_t), minus the average reward (R_hat_t), times
the probability of taking that action π_t(a).

ASSOCIATIVE SEARCH (CONTEXTUAL BANDITS)

The contextual bandit extends the model by making the decision

conditional on the state of the environment.
For example, you can use a contextual bandit to select which news
article to show first on the main page of your website to optimize click
through rate.

The context is information about the user: where they come from,
previously visited pages of the site, device information, geolocation, etc.
An action is a choice of what news article to display. An outcome is
whether the user clicked on a link or not. A reward is binary: 0 if there
is no click, 1 if there is a click.

REFERNCE LINK FOR LEARNING FORMULAE

https://lcalem.github.io/blog/2018/09/22/sutton-
chap02-bandits

Assignment 1: Reinforcement Learning Prof. B. Ravindran
100% (2)
Assignment 1: Reinforcement Learning Prof. B. Ravindran
4 pages
RL-Endterm Report - Mridul Agarwal
No ratings yet
RL-Endterm Report - Mridul Agarwal
27 pages
Monte Carlo Learning
No ratings yet
Monte Carlo Learning
14 pages
RL L2 MultiArmedBandits
No ratings yet
RL L2 MultiArmedBandits
44 pages
Temporal Difference Learning
No ratings yet
Temporal Difference Learning
17 pages
SCSA3015 Deep Learning Unit 3
100% (1)
SCSA3015 Deep Learning Unit 3
23 pages
Maximizing Productivity With ChatGPT
50% (2)
Maximizing Productivity With ChatGPT
15 pages
Deep Learning PPT Full Notes
No ratings yet
Deep Learning PPT Full Notes
105 pages
Bandit
No ratings yet
Bandit
8 pages
Unit:1 Reinforcement Learning
No ratings yet
Unit:1 Reinforcement Learning
9 pages
English Translation of A Birth Certificate From Honduras PDF
78% (9)
English Translation of A Birth Certificate From Honduras PDF
1 page
Branch and Bound With JS (Least Cost, Job Sequencing)
No ratings yet
Branch and Bound With JS (Least Cost, Job Sequencing)
24 pages
Assignment 3: Reinforcement Learning Prof. B. Ravindran
100% (1)
Assignment 3: Reinforcement Learning Prof. B. Ravindran
4 pages
Deep Learning Basics Concepts
90% (10)
Deep Learning Basics Concepts
69 pages
CMM Level3 Manual Documents
No ratings yet
CMM Level3 Manual Documents
10 pages
Data Preprocessing in Machine Learning
No ratings yet
Data Preprocessing in Machine Learning
27 pages
Assignment 5 (Sol.) : Reinforcement Learning
100% (1)
Assignment 5 (Sol.) : Reinforcement Learning
4 pages
A Baby Robot - 1
No ratings yet
A Baby Robot - 1
6 pages
Question Bank - REINFORCEMENT LEARNING
75% (4)
Question Bank - REINFORCEMENT LEARNING
2 pages
SCSA3015 Deep Learning Unit 4 PDF
No ratings yet
SCSA3015 Deep Learning Unit 4 PDF
30 pages
RL Model Question Paper
100% (1)
RL Model Question Paper
1 page
Introduction To Bandit Algorithm, Unit1
No ratings yet
Introduction To Bandit Algorithm, Unit1
18 pages
Mathematics For Machine Learning
No ratings yet
Mathematics For Machine Learning
100 pages
Here Is The Sample of Typing
No ratings yet
Here Is The Sample of Typing
9 pages
Unit 4
100% (1)
Unit 4
7 pages
Neural Network
100% (1)
Neural Network
54 pages
FALLSEM2024-25 BCSE209L TH VL2024250101717 2024-11-07 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101717 2024-11-07 Reference-Material-I
25 pages
Reinforcement Learning: A Short Cut
No ratings yet
Reinforcement Learning: A Short Cut
7 pages
Unit-3 Unit-3 RL Problems, Prediction and Control P 241111 181426
No ratings yet
Unit-3 Unit-3 RL Problems, Prediction and Control P 241111 181426
15 pages
Reinforcement Learning, Q-Learning
No ratings yet
Reinforcement Learning, Q-Learning
20 pages
Upper Confidence Bound Algorithm in Reinforcement Learning
No ratings yet
Upper Confidence Bound Algorithm in Reinforcement Learning
6 pages
KKDAT Form 1
No ratings yet
KKDAT Form 1
2 pages
Practice Assignment 4: Reinforcement Learning Prof. B. Ravindran
No ratings yet
Practice Assignment 4: Reinforcement Learning Prof. B. Ravindran
2 pages
Machine - Learning - Chapter 4
No ratings yet
Machine - Learning - Chapter 4
13 pages
Experiment 6
No ratings yet
Experiment 6
7 pages
Deep Learning
100% (2)
Deep Learning
49 pages
Smart Steam Emu
No ratings yet
Smart Steam Emu
12 pages
Computing Grade 6 EST
No ratings yet
Computing Grade 6 EST
5 pages
Electrical Switch
No ratings yet
Electrical Switch
6 pages
Get PC: Red Giant Pluraleyes
No ratings yet
Get PC: Red Giant Pluraleyes
6 pages
RL Unit5
No ratings yet
RL Unit5
101 pages
Applied Machine Learning Question Paper
100% (1)
Applied Machine Learning Question Paper
2 pages
SHG Mitsubishi - lehy-II C-1 - en
100% (1)
SHG Mitsubishi - lehy-II C-1 - en
24 pages
RL Unit 1
100% (1)
RL Unit 1
26 pages
Ma1254 - Random Processes: Unit I - Probability and Random Variable
100% (1)
Ma1254 - Random Processes: Unit I - Probability and Random Variable
5 pages
Building Transformer Models With Attention - Web - Page
No ratings yet
Building Transformer Models With Attention - Web - Page
19 pages
Deep Learning Notes Andrew NG
No ratings yet
Deep Learning Notes Andrew NG
54 pages
Machine Learning Imp Questions
100% (2)
Machine Learning Imp Questions
95 pages
ALV - ALV Utility Program
No ratings yet
ALV - ALV Utility Program
4 pages
Modelling of A Gas Turbine With Modelica
No ratings yet
Modelling of A Gas Turbine With Modelica
80 pages
UNIT-1 Foundations of Deep Learning
100% (1)
UNIT-1 Foundations of Deep Learning
51 pages
NNDL Lab Manual
No ratings yet
NNDL Lab Manual
41 pages
Practice Assignment 6: Reinforcement Learning Prof. B. Ravindran
No ratings yet
Practice Assignment 6: Reinforcement Learning Prof. B. Ravindran
24 pages
Unit II
No ratings yet
Unit II
10 pages
AIML Unit Wise Question Bank
100% (1)
AIML Unit Wise Question Bank
4 pages
CSC118 - Fundamentals of Algorithm Development
0% (1)
CSC118 - Fundamentals of Algorithm Development
3 pages
Deep Learning Notes
100% (1)
Deep Learning Notes
16 pages
Gov Uscourts FLSD 521536 237 7
No ratings yet
Gov Uscourts FLSD 521536 237 7
5 pages
Mid Term Report SoS
No ratings yet
Mid Term Report SoS
18 pages
4 6filter Banks
No ratings yet
4 6filter Banks
9 pages
SCSA3015 Deep Learning Unit 2 PDF
No ratings yet
SCSA3015 Deep Learning Unit 2 PDF
32 pages
Unit 5 Reinforcement Learning Notes
No ratings yet
Unit 5 Reinforcement Learning Notes
20 pages
2D Animation
No ratings yet
2D Animation
48 pages
120 Deep Learning Important Questions + Answers ?
No ratings yet
120 Deep Learning Important Questions + Answers ?
68 pages
Nishant Resume
No ratings yet
Nishant Resume
2 pages
704af0d354 Mv790 User Guide
No ratings yet
704af0d354 Mv790 User Guide
10 pages
The Dragonflybsd Operating System: Jeffrey M. Hsu, Member, Freebsd and Dragonflybsd
No ratings yet
The Dragonflybsd Operating System: Jeffrey M. Hsu, Member, Freebsd and Dragonflybsd
6 pages
NNDL Technical Publication Notes
No ratings yet
NNDL Technical Publication Notes
81 pages
Deep Learning Unit-II
No ratings yet
Deep Learning Unit-II
19 pages
RL Unit 5
No ratings yet
RL Unit 5
30 pages
PDF Python Machine Learning: Machine Learning and Deep Learning With Python, Scikit-Learn, and Tensorflow, 2Nd Edition by Sebastian Raschka
67% (3)
PDF Python Machine Learning: Machine Learning and Deep Learning With Python, Scikit-Learn, and Tensorflow, 2Nd Edition by Sebastian Raschka
3 pages
All Mcqs
No ratings yet
All Mcqs
34 pages
Govt - Polytechnic College Nedumangadu: Seminar Report ON
No ratings yet
Govt - Polytechnic College Nedumangadu: Seminar Report ON
29 pages
Robotech Chellenge 1
No ratings yet
Robotech Chellenge 1
5 pages
Lab Manual - CSP 350
No ratings yet
Lab Manual - CSP 350
57 pages
Project
No ratings yet
Project
2 pages
Lesson Plan 2
No ratings yet
Lesson Plan 2
4 pages
Omkar Sabnis B4-764 Experiment No. 7 Aim: Implementation of MC-Culloch Pitt Model For AND Gate Using Python. Theory
No ratings yet
Omkar Sabnis B4-764 Experiment No. 7 Aim: Implementation of MC-Culloch Pitt Model For AND Gate Using Python. Theory
10 pages
Typical UVM Testbench Architecture
No ratings yet
Typical UVM Testbench Architecture
5 pages
JAILBREAKER-Automated Jailbreak Across Multiple Large Language Model Chatbots-2023 7
100% (2)
JAILBREAKER-Automated Jailbreak Across Multiple Large Language Model Chatbots-2023 7
15 pages
Build A Human Lightwave
No ratings yet
Build A Human Lightwave
6 pages
Unit 1 Introduction of Machine Learning Notes
No ratings yet
Unit 1 Introduction of Machine Learning Notes
57 pages
Machine Learning, ML Ass 7
No ratings yet
Machine Learning, ML Ass 7
7 pages
Buffer Stock Management System
No ratings yet
Buffer Stock Management System
11 pages
Machine Learning, ML Ass 5
No ratings yet
Machine Learning, ML Ass 5
6 pages
CSC270 DB CDF V4.0
No ratings yet
CSC270 DB CDF V4.0
2 pages
Unit1 6thsemCS
No ratings yet
Unit1 6thsemCS
22 pages
Deep Learning Interview Questions
No ratings yet
Deep Learning Interview Questions
17 pages
Deep Learning Questions
50% (2)
Deep Learning Questions
51 pages
30 Deep Learning Projects
No ratings yet
30 Deep Learning Projects
7 pages