0% found this document useful (0 votes)
5 views

Lab 01 QRoutingv5

Research notes

Uploaded by

Apoorv Sahni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Lab 01 QRoutingv5

Research notes

Uploaded by

Apoorv Sahni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 13

Reinforcement Learning QRouting Laboratory

By John Cosmas

1 Objective

In this lab you will familiarise yourself with Reinforcement Learning techniques that
can be modelled with Python and then use it to write Python software calculate the
most efficient route through the network.

2 Introduction

Python is an Object-Oriented programming language with many packages developed


by a Python community, which includes an incredibly diverse and welcoming group
of people. This community develop packages that help people use Python for many
purposes: to make games, build web applications, solve business problems, and
develop internal tools at all kinds of interesting companies. Python is also used
heavily in scientific fields such as artificial intelligence for academic research and
applied work.

2.1 Reinforcement Learning QRouting

An agent in an unknown environment obtains some rewards by interacting with the


environment.
The agent ought to take actions so as to maximize cumulative rewards.
The goal of Reinforcement Learning (RL) is to learn a good strategy for the agent
from experimental trials and relative simple feedback received.
With the optimal strategy, the agent is capable to actively adapt to the environment to
maximize future rewards.
The agent is acting in an environment. How the environment reacts to certain actions
is defined by a model which we may or may not know.
The agent can stay in one of many states (s∈S) of the environment, and choose to
take one of many actions (a∈A) to switch from one state to another. Which state the
agent will arrive in, is decided by transition probabilities between states (P). Once an
action is taken, the environment delivers a reward (r∈R) as feedback.
The model defines the reward function and transition probabilities. We may or may
not know how the model works and this differentiate two circumstances:
Know the model: planning with perfect information; do model-based RL. When we
fully know the environment, we can find the optimal solution by Dynamic
Programming (i.e. optimisation algorithms).
Does not know the model: learning with incomplete information; do model-free RL or
try to learn the model explicitly as part of the algorithm.

3 Laboratory
The laboratory tutorial is subdivided into six sections:
1. Creating and Drawing a Graph and adding Edges
2. Generating Available Actions and Quality Matrices
3. Working with Functions
4. Iterating through Learning loop
5. Charting most efficient route from initial_state to
goal
6. Plotting the Reward Gained against iteration scores

3.1 Creating a New Anaconda Environment


Create and activate a virtual environment by using command line. Open Anaconda
Prompt and run this command:

conda create -n py36 python=3.6.13

activate it by using this command:

conda activate py36

download the required packages for today’s lab such as numpy, networkx and
matplotlib:

conda install numpy


conda install networkx
conda install matplotlib==2.2.3

Today, we run qrouting.py file in two methods, command line and PyCharm
editor.

1- Command line: On the same window run the following command:

python ‘path’\qrouting.py

2- PyCharm editor: Open Anaconda Navigator, set Applications on to py36


and launch PyCharm. Create a new project “RL1” on PyCharm. In order to
use the environment you created previously (py36), follow the next steps: On
New Project window, choose Previously configured interpreter and then
click on Add Interpreter and Add Local Interpreter. On the new window
(Add Python Interpreter), choose Existing and browse to the location of
py36 “C:\Users\’yourlogin’\.conda\envs\py36\python.exe”.
Open qrouting.py with PyCharm and run it.

3.2 Creating and Drawing a Graph and adding Edges using networkx package
3.2.1 Creating a List of Tuples for defining edges of a Graph
Lists are used to store multiple items in a single variable.
Lists are one of 4 built-in data types in Python used to store collections of data, the
other 3 are Tuple, Set, and Dictionary, all with different qualities and usage.
Lists are created using square brackets:

Example
Create a List:
>>> thislist = ["apple", "banana", "cherry"]
>>> print(thislist)

Tuples are used to store multiple items in a single variable.


Tuple is one of 4 built-in data types in Python used to store collections of data, the
other 3 are List, Set, and Dictionary, all with different qualities and usage.
A tuple is a collection which is ordered and unchangeable.
Tuples are written with round brackets.
Example
Create a Tuple:
>>> thistuple = ("apple", "banana", "cherry")
>>>print(thistuple)

Task: Complete this List of Tuples to define the edges of the above Graph.

>>> edges = [(0, 1), (1, 5)]

3.2.2 Creating an empty Graph


Creating a graph
Create an empty graph with no nodes and no edges.

Example
>>> import networkx as nx
>>> G=nx.Graph()
By definition, a Graph is a collection of nodes (vertices) along with identified pairs of
nodes (called edges, links, etc). In NetworkX, nodes can be any hashable object e.g. a
text string, an image, an XML object, another Graph, a customized node object, etc.
Note: To import the networkx package, type in the terminal window prompt: conda
install networkx
Type Cntl Shift P, then type python and then Select Interpreter and select the ‘base’
conda interpreter.

3.2.3 Adding Edges to Graph

networkx.Graph.add_edges_from
Graph.add_edges_from(ebunch_to_add, **attr)
Add all the edges in ebunch_to_add.

Examples
G.add_edges_from([(0, 1), (1, 5)]) # using a list of edge tuples
G.add_edges_from(edges)

3.2.4 Laying out a Graph

networkx.drawing.layout.spring_layout
spring_layout(G, k=None, pos=None, fixed=None, iterations=50, threshold=0.0001,
weight='weight', scale=1, center=None, dim=2, seed=None)[source]
Position nodes using Fruchterman-Reingold force-directed algorithm.

The algorithm simulates a force-directed representation of the network treating edges


as springs holding nodes close, while treating nodes as repelling objects, sometimes
called an anti-gravity force. Simulation continues until the positions are close to an
equilibrium.
There are some hard-coded values: minimal distance between nodes (0.01) and
“temperature” of 0.1 to ensure nodes don’t fly away. During the simulation, k helps
determine the distance between nodes, though scale and center determine the size and
place after rescaling occurs at the end of the simulation.

Examples
>>> pos = nx.spring_layout(G)

3.2.5 Drawing Nodes of a Graph


draw_networkx_nodes
draw_networkx_nodes(G, pos, nodelist=None, node_size=300, node_color='r',
node_shape='o', alpha=1.0, cmap=None, vmin=None, vmax=None, ax=None,
linewidths=None, label=None, **kwds)[source]
Draw the nodes of the graph G.
This draws only the nodes of the graph G.
Parameters :
G : graph
A networkx graph

pos : dictionary
A dictionary with nodes as keys and positions as values. If not specified a
spring layout positioning will be computed. See networkx.layout for functions
that compute node positions.

ax : Matplotlib Axes object, optional


Draw the graph in the specified Matplotlib axes.

nodelist : list, optional


Draw only specified nodes (default G.nodes())

node_size : scalar or array


Size of nodes (default=300). If an array is specified it must be the same length
as nodelist.

node_color : color string, or array of floats


Node color. Can be a single color format string (default=’r’), or a sequence of
colors with the same length as nodelist. If numeric values are specified they
will be mapped to colors using the cmap and vmin,vmax parameters. See
matplotlib.scatter for more details.

node_shape : string
The shape of the node. Specification is as matplotlib.scatter marker, one of
‘so^>v<dph8’ (default=’o’).
alpha : float
The node transparency (default=1.0)

cmap : Matplotlib colormap


Colormap for mapping intensities of nodes (default=None)

vmin,vmax : floats
Minimum and maximum for node colormap scaling (default=None)

linewidths : [None | scalar | sequence]


Line width of symbol border (default =1.0)

label : [None| string]


Label for legend

Example
>>> G=nx.dodecahedral_graph()
>>> nodes=nx.draw_networkx_nodes(G,pos)

3.2.6 Drawing Edges of a Graph


draw_networkx_nodes
draw_networkx_nodes(G, pos, nodelist=None, node_size=300, node_color='r',
node_shape='o', alpha=1.0, cmap=None, vmin=None, vmax=None, ax=None,
linewidths=None, label=None, **kwds)[source]
Draw the nodes of the graph G.
draw_networkx_edges(G, pos, edgelist=None, width=1.0, edge_color='k',
style='solid', alpha=None, edge_cmap=None, edge_vmin=None, edge_vmax=None,
ax=None, arrows=True, label=None, **kwds)[source]
Draw the edges of the graph G.
This draws only the edges of the graph G.
Parameters :
G : graph
A networkx graph

pos : dictionary
A dictionary with nodes as keys and positions as values. If not specified a
spring layout positioning will be computed. See networkx.layout for functions
that compute node positions.

edgelist : collection of edge tuples


Draw only specified edges(default=G.edges())

width : float
Line width of edges (default =1.0)

edge_color : color string, or array of floats


Edge color. Can be a single color format string (default=’r’), or a sequence of
colors with the same length as edgelist. If numeric values are specified they
will be mapped to colors using the edge_cmap and edge_vmin,edge_vmax
parameters.

style : string
Edge line style (default=’solid’) (solid|dashed|dotted,dashdot)

alpha : float
The edge transparency (default=1.0)

edge_ cmap : Matplotlib colormap


Colormap for mapping intensities of edges (default=None)

edge_vmin,edge_vmax : floats
Minimum and maximum for edge colormap scaling (default=None)

ax : Matplotlib Axes object, optional


Draw the graph in the specified Matplotlib axes.

arrows : bool, optional (default=True)


For directed graphs, if True draw arrowheads.

label : [None| string]


Label for legend

For directed graphs, “arrows” (actually just thicker stubs) are drawn at the head end.
Arrows can be turned off with keyword arrows=False. Yes, it is ugly but drawing
proper arrows with Matplotlib this way is tricky.

Examples
>>> edges=nx.draw_networkx_edges(G,pos)

3.2.7 Drawing Labels of a Graph


draw_networkx_labels(G, pos, labels=None, font_size=12, font_color='k',
font_family='sans-serif', font_weight='normal', alpha=1.0, ax=None, **kwds)[source]
Draw node labels on the graph G.

Parameters :
G : graph
A networkx graph

pos : dictionary, optional


A dictionary with nodes as keys and positions as values. If not specified a
spring layout positioning will be computed. See networkx.layout for functions
that compute node positions.

font_size : int
Font size for text labels (default=12)

font_color : string
Font color string (default=’k’ black)

font_family : string
Font family (default=’sans-serif’)

font_weight : string
Font weight (default=’normal’)

alpha : float
The text transparency (default=1.0)

ax : Matplotlib Axes object, optional


Draw the graph in the specified Matplotlib axes.

Examples
>>> labels=nx.draw_networkx_labels(G,pos)

3.2.8 Plotting Graph with pylab

>>> pylab.show(x, y)

3.3 Generating Available Actions and Quality Matrices


3.3.1 Generating Available Actions Matrix
Create the available actions matrix M
If initial_state = 0 and target = 10
For all the edges in the graph
(0, 1)
(1, 5)
(5, 6)
(5, 4)
(1, 2)
(1, 3)
(9, 10)
(2, 4)
(0, 6)
(6, 7)
(8, 9)
(7, 8)
(1, 7)
(3, 9)
Set -1 if there is no edge between two nodes,
0 if there is an edge between two nodes
100 if the start or the end of an edge is the goal
M= [[ -1. 0. -1. -1. -1. -1. 0. -1. -1. -1. -1.]
[ 0. -1. 0. 0. -1. 0. -1. 0. -1. -1. -1.]
[ -1. 0. -1. -1. 0. -1. -1. -1. -1. -1. -1.]
[ -1. 0. -1. -1. -1. -1. -1. -1. -1. 0. -1.]
[ -1. -1. 0. -1. -1. 0. -1. -1. -1. -1. -1.]
[ -1. 0. -1. -1. 0. -1. 0. -1. -1. -1. -1.]
[ 0. -1. -1. -1. -1. 0. -1. 0. -1. -1. -1.]
[ -1. 0. -1. -1. -1. -1. 0. -1. 0. -1. -1.]
[ -1. -1. -1. -1. -1. -1. -1. 0. -1. 0. -1.]
[ -1. -1. -1. 0. -1. -1. -1. -1. 0. -1. 100.]
[ -1. -1. -1. -1. -1. -1. -1. -1. -1. 0. 100.]]

3.3.2 Generating Quality Matrix

Q= [[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]

3.4 Working with Functions


3.4.1 Determine the available actions for a given state
Create a function that takes the given current state provides a list of available states
def available_actions(state):
return available_action

using
numpy.where ()
This function accepts a numpy-like array (ex. a NumPy array of
integers/booleans). It returns a new numpy array, after filtering based on a
condition, which is a numpy-like array of boolean values. For example,
condition can take the value of array ([ [True, True, True]]), which is a
numpy-like boolean array.
3.4.2 Choosing one of the actions
Create a function that chooses one of the available actions at random
def sample_next_action(available_actions_range):
return next_action

using
random.choice(a, size=None, replace=True, p=None)
Generates a random sample from a given 1-D array

Parameters
A 1-D array-like or int
If an ndarray, a random sample is generated from its elements. If an int, the
random sample is generated as if it were np.arange(a)

sizeint or tuple of ints, optional


Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are
drawn. Default is None, in which case a single value is returned.

replaceboolean, optional
Whether the sample is with or without replacement. Default is True, meaning
that a value of a can be selected multiple times.

p1-D array-like, optional


The probabilities associated with each entry in a. If not given, the sample
assumes a uniform distribution over all entries in a.

Returns
samplessingle item or ndarray
The generated random samples

Raises
ValueError
If a is an int and less than zero, if a or p are not 1-dimensional, if a is an array-
like of size 0, if p is not a vector of probabilities, if a and p have different
lengths, or if replace=False and the sample size is greater than the population
size

Examples
Generate a uniform random sample from np.arange(5) of size 1:
np.random.choice(5, 1)
array([4]) # random

3.4.3 Update Q matrix according to path chosen


Create a function that updates the Q matrix according to the path chosen

def update(current_state, action, gamma):


find index of largest value in a vector defined by action of Q matrix
if there is more than one choice of largest values
pick one from the choices available
else:
pick the max index from the Q matrix
get the max index value from the Q matrix
set Q[current_state, action] to M[current_state, action] + gamma * max_value
if max value of Q matrix is greater than 0:
return np.sum(Q / np.max(Q)*100
else:
return 0

using
numpy.where ()
This function accepts a numpy-like array (ex. a NumPy array of
integers/booleans). It returns a new numpy array, after filtering based on a
condition, which is a numpy-like array of boolean values. For example,
condition can take the value of array ([ [True, True, True]]), which is a
numpy-like boolean array.

Example
# find index of largest value in a vector of an array
max_index = np.where(Q[action, ] == np.max(Q[action, ]))[1]

numpy.shape(a)
Return the shape of an array.
Parameters
array_like
Input array.

Returns
shapetuple of ints
The elements of the shape tuple give the lengths of the corresponding array
dimensions.

Examples
np.shape(np.eye(3))
(3, 3)
np.shape([[1, 2]])
(1, 2)
np.shape([0])
(1,)
np.shape(0)
()

if max_index.shape[0] > 1: # there is more than one choice from Q matrix


max_index = int(np.random.choice(max_index, size = 1)) # pick one from the
choices available
else:
max_index = int(max_index) # pick the max index from the Q matrix

numpy.max() function.
Syntax
The syntax of max() function as given below.
max_value = numpy.max(arr)
Pass the numpy array as argument to numpy.max(), and this function shall return the
maximum value.

Example: Get Maximum Value of Numpy Array


In this example, we will take a numpy array with random numbers and then find the
maximum of the array using numpy.max() function.

Python Program

import numpy as np
arr = np.random.randint(10, size=(4,5))
print(arr)
#find maximum value
max_value = np.max(arr)
print('Maximum value of the array is',max_value)

Output
[[3 2 2 2 2]
[5 7 0 4 5]
[8 1 4 8 4]
[2 0 7 2 1]]
Maximum value of the array is 8

numpy.sum(arr, axis, dtype, out) : This function returns the sum of array elements
over the specified axis.

Parameters :
arr :
input array.
axis :
axis along which we want to calculate the sum value. Otherwise, it will
consider arr to be flattened(works on all the axis). axis = 0 means along the
column and axis = 1 means working along the row.
out :
Different array in which we want to place the result. The array must have
same dimensions as expected output. Default is None.
initial :
[scalar, optional] Starting value of the sum.

Return :
Sum of the array elements (a scalar value if axis is none) or array with sum values
along the specified axis.

3.5 Iterating through Learning loop

gamma = 0.75
# learning parameter
initial_state = 0

available_action = available_actions(initial_state)
action = sample_next_action(available_action)
update(initial_state, action, gamma)

scores = []
for i in range(1000):
current_state = np.random.randint(0, int(Q.shape[0]))
available_action = available_actions(current_state)
action = sample_next_action(available_action)
score = update(current_state, action, gamma)
scores.append(score)
print(Q)

3.6 Charting most efficient route from initial_state to goal

print("Trained Q matrix:")
print(Q / np.max(Q)*100)
# You can uncomment the above two lines to view the trained Q matrix

# Testing
current_state = 0
steps = [current_state]

while current_state != goal:

next_step_index = np.where(Q[current_state, ] == np.max(Q[current_state, ]))[1]


if next_step_index.shape[0] > 1:
next_step_index = int(np.random.choice(next_step_index, size = 1))
else:
next_step_index = int(next_step_index)
steps.append(next_step_index)
current_state = next_step_index

3.7 Plotting the Reward Gained against iteration scores

pl.plot(scores)
pl.xlabel('No of iterations')
pl.ylabel('Reward gained')
pl.show()

You might also like