0% found this document useful (0 votes)
16 views

Q-Learning Algorithm (1)

The document presents a group project on the Q-learning algorithm, a reinforcement learning technique that allows machines to learn from interactions with their environment. It covers key concepts such as the Q-function, Q-table, and the steps involved in the Q-learning algorithm, along with examples and applications. The presentation also discusses the advantages and disadvantages of Q-learning.

Uploaded by

anum.ashraf237
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Q-Learning Algorithm (1)

The document presents a group project on the Q-learning algorithm, a reinforcement learning technique that allows machines to learn from interactions with their environment. It covers key concepts such as the Q-function, Q-table, and the steps involved in the Q-learning algorithm, along with examples and applications. The presentation also discusses the advantages and disadvantages of Q-learning.

Uploaded by

anum.ashraf237
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

GOVT.

RABIA BASRI GRADUATE COLLEGE (W)


WALTON ROAD LAHORE

Presentation Topic: Q-learning Algorithm


Group no 6:
Samia Anwar(116)
Fatima Liaqat(105)
Mahnoor(122)
Nimra Mehboob(123)

Content:
 Reinforcement Learning Technique: Q-learning
 Some imp terms in Q-Learning
 Factors and Algorithm of Q-learning
 Steps with examples
 Advantages and disadvantages Applications

What is reinforcement learning?


 Reinforcement Learning (RL) is a branch of
machine learning
 RL allows machines to learn by interacting with an
environment and receiving feedback based on their
actions. This feedback comes is in the form
of rewards or penalties.
Q-LEARNING:
 Q-Learning means quality learning.
 It is off-policy, model-free and value-based
reinforcement learning algorithm.
 Agent has to actively learn through the experience of
interactions with the environment.
 off-policy RLA(according to situation which action is
performed on which state).
 model-free RLA(learn the consequences of their
actions through experience without transition and
reward function).
 value-based RLA(train the value function to learn
which state is more valuable and take action).
 Agent uses trail and error to determine which actions
result in rewards(good outcome) and penalties(bad
outcome or negative reward).
 The decision making of q learning is improved day
by day due to updation in q table .

Some important terms in Q-learning:


Factors of Q-learning:
 There are 2 factors of q learning i.e., Q-
function(Bellman equation) and the other one is Q-
table.
1. Q-function(Bellman Equation):
 It is a recursive formula used to calculate value of
given state and determine the optimal action.

 Q(s,a)=R(s,a)+ *max[Q(s’,a’)].
Whereas:

 Q(s,a) is the Q value for given state and action pair.


 R(s,a) is the immediate reward for taking action in
state s.
 (Gamma) is the discount factor
representing importance of future rewards.
 Max Q(s’,a’) is the maximum q value for the next
state s’ and all possible actions a’.
Q-table:
 Q table is a data structure of sets of actions and states
and we use q learning algorithm to update q values in
q table.
 Combinations of actions and states.
 State no=no. of rows
 Action no = no. of columns
 Initially q table is initialized with value=0.
 The agent will use a q table to take the best possible
action based on the expected reward for each state in
the environment
 In simple words a q table is a data structure of step
of actions ans states and we use the q learning
algorithm to update the values in the table.
Q-Learning algorithm:
Steps to follow in q learning algorithm:
 Step1:Create an initial Q-Table with all values
initialized to 0
 Step 2:Choose an action and perform it.Update value
in table.
 Step 3:Get the value of the reward and calculate the
Q-value using bellman equation(Q-function).
Step 4:Continue the same process until the table is
filled or an episode ends.
Example:
 Here Rooms: States(s) and Doors: Actions(a).

 Suppose that we have 5 rooms in a building.We will number the rooms from 0 to 4 and the
outside of building can be thought of as one big room(5).

 We can represent each room as a node (states) and each door as a link(action).

 We have to get into the room 5 that’s why Our goal state is room 5 .

 Imp points: Goal room:5

 The doors that leads immediately to room 5 have reward 100.

 Others that have been not directly connected to room5 have 0 reward.

 Where there is no link between node(states:room) then reward is -1 (invalid link).

 Discount factor gamma:0.8


Application:
References:
https://www.datacamp.com/tutorial/introduction-q-learning-beginner-tutorial

https://www.geeksforgeeks.org/q-learning-in-python/

https://youtu.be/QRMNPCsnSHk

https://youtu.be/3Rx2x2traxw

https://youtu.be/ibBEEZNQZtk

https://youtu.be/5MC8Wdo-hS8

You might also like