Lab 01 QRoutingv5
Lab 01 QRoutingv5
By John Cosmas
1 Objective
In this lab you will familiarise yourself with Reinforcement Learning techniques that
can be modelled with Python and then use it to write Python software calculate the
most efficient route through the network.
2 Introduction
3 Laboratory
The laboratory tutorial is subdivided into six sections:
1. Creating and Drawing a Graph and adding Edges
2. Generating Available Actions and Quality Matrices
3. Working with Functions
4. Iterating through Learning loop
5. Charting most efficient route from initial_state to
goal
6. Plotting the Reward Gained against iteration scores
download the required packages for today’s lab such as numpy, networkx and
matplotlib:
Today, we run qrouting.py file in two methods, command line and PyCharm
editor.
python ‘path’\qrouting.py
3.2 Creating and Drawing a Graph and adding Edges using networkx package
3.2.1 Creating a List of Tuples for defining edges of a Graph
Lists are used to store multiple items in a single variable.
Lists are one of 4 built-in data types in Python used to store collections of data, the
other 3 are Tuple, Set, and Dictionary, all with different qualities and usage.
Lists are created using square brackets:
Example
Create a List:
>>> thislist = ["apple", "banana", "cherry"]
>>> print(thislist)
Task: Complete this List of Tuples to define the edges of the above Graph.
Example
>>> import networkx as nx
>>> G=nx.Graph()
By definition, a Graph is a collection of nodes (vertices) along with identified pairs of
nodes (called edges, links, etc). In NetworkX, nodes can be any hashable object e.g. a
text string, an image, an XML object, another Graph, a customized node object, etc.
Note: To import the networkx package, type in the terminal window prompt: conda
install networkx
Type Cntl Shift P, then type python and then Select Interpreter and select the ‘base’
conda interpreter.
networkx.Graph.add_edges_from
Graph.add_edges_from(ebunch_to_add, **attr)
Add all the edges in ebunch_to_add.
Examples
G.add_edges_from([(0, 1), (1, 5)]) # using a list of edge tuples
G.add_edges_from(edges)
networkx.drawing.layout.spring_layout
spring_layout(G, k=None, pos=None, fixed=None, iterations=50, threshold=0.0001,
weight='weight', scale=1, center=None, dim=2, seed=None)[source]
Position nodes using Fruchterman-Reingold force-directed algorithm.
Examples
>>> pos = nx.spring_layout(G)
pos : dictionary
A dictionary with nodes as keys and positions as values. If not specified a
spring layout positioning will be computed. See networkx.layout for functions
that compute node positions.
node_shape : string
The shape of the node. Specification is as matplotlib.scatter marker, one of
‘so^>v<dph8’ (default=’o’).
alpha : float
The node transparency (default=1.0)
vmin,vmax : floats
Minimum and maximum for node colormap scaling (default=None)
Example
>>> G=nx.dodecahedral_graph()
>>> nodes=nx.draw_networkx_nodes(G,pos)
pos : dictionary
A dictionary with nodes as keys and positions as values. If not specified a
spring layout positioning will be computed. See networkx.layout for functions
that compute node positions.
width : float
Line width of edges (default =1.0)
style : string
Edge line style (default=’solid’) (solid|dashed|dotted,dashdot)
alpha : float
The edge transparency (default=1.0)
edge_vmin,edge_vmax : floats
Minimum and maximum for edge colormap scaling (default=None)
For directed graphs, “arrows” (actually just thicker stubs) are drawn at the head end.
Arrows can be turned off with keyword arrows=False. Yes, it is ugly but drawing
proper arrows with Matplotlib this way is tricky.
Examples
>>> edges=nx.draw_networkx_edges(G,pos)
Parameters :
G : graph
A networkx graph
font_size : int
Font size for text labels (default=12)
font_color : string
Font color string (default=’k’ black)
font_family : string
Font family (default=’sans-serif’)
font_weight : string
Font weight (default=’normal’)
alpha : float
The text transparency (default=1.0)
Examples
>>> labels=nx.draw_networkx_labels(G,pos)
>>> pylab.show(x, y)
Q= [[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]
using
numpy.where ()
This function accepts a numpy-like array (ex. a NumPy array of
integers/booleans). It returns a new numpy array, after filtering based on a
condition, which is a numpy-like array of boolean values. For example,
condition can take the value of array ([ [True, True, True]]), which is a
numpy-like boolean array.
3.4.2 Choosing one of the actions
Create a function that chooses one of the available actions at random
def sample_next_action(available_actions_range):
return next_action
using
random.choice(a, size=None, replace=True, p=None)
Generates a random sample from a given 1-D array
Parameters
A 1-D array-like or int
If an ndarray, a random sample is generated from its elements. If an int, the
random sample is generated as if it were np.arange(a)
replaceboolean, optional
Whether the sample is with or without replacement. Default is True, meaning
that a value of a can be selected multiple times.
Returns
samplessingle item or ndarray
The generated random samples
Raises
ValueError
If a is an int and less than zero, if a or p are not 1-dimensional, if a is an array-
like of size 0, if p is not a vector of probabilities, if a and p have different
lengths, or if replace=False and the sample size is greater than the population
size
Examples
Generate a uniform random sample from np.arange(5) of size 1:
np.random.choice(5, 1)
array([4]) # random
using
numpy.where ()
This function accepts a numpy-like array (ex. a NumPy array of
integers/booleans). It returns a new numpy array, after filtering based on a
condition, which is a numpy-like array of boolean values. For example,
condition can take the value of array ([ [True, True, True]]), which is a
numpy-like boolean array.
Example
# find index of largest value in a vector of an array
max_index = np.where(Q[action, ] == np.max(Q[action, ]))[1]
numpy.shape(a)
Return the shape of an array.
Parameters
array_like
Input array.
Returns
shapetuple of ints
The elements of the shape tuple give the lengths of the corresponding array
dimensions.
Examples
np.shape(np.eye(3))
(3, 3)
np.shape([[1, 2]])
(1, 2)
np.shape([0])
(1,)
np.shape(0)
()
numpy.max() function.
Syntax
The syntax of max() function as given below.
max_value = numpy.max(arr)
Pass the numpy array as argument to numpy.max(), and this function shall return the
maximum value.
Python Program
import numpy as np
arr = np.random.randint(10, size=(4,5))
print(arr)
#find maximum value
max_value = np.max(arr)
print('Maximum value of the array is',max_value)
Output
[[3 2 2 2 2]
[5 7 0 4 5]
[8 1 4 8 4]
[2 0 7 2 1]]
Maximum value of the array is 8
numpy.sum(arr, axis, dtype, out) : This function returns the sum of array elements
over the specified axis.
Parameters :
arr :
input array.
axis :
axis along which we want to calculate the sum value. Otherwise, it will
consider arr to be flattened(works on all the axis). axis = 0 means along the
column and axis = 1 means working along the row.
out :
Different array in which we want to place the result. The array must have
same dimensions as expected output. Default is None.
initial :
[scalar, optional] Starting value of the sum.
Return :
Sum of the array elements (a scalar value if axis is none) or array with sum values
along the specified axis.
gamma = 0.75
# learning parameter
initial_state = 0
available_action = available_actions(initial_state)
action = sample_next_action(available_action)
update(initial_state, action, gamma)
scores = []
for i in range(1000):
current_state = np.random.randint(0, int(Q.shape[0]))
available_action = available_actions(current_state)
action = sample_next_action(available_action)
score = update(current_state, action, gamma)
scores.append(score)
print(Q)
print("Trained Q matrix:")
print(Q / np.max(Q)*100)
# You can uncomment the above two lines to view the trained Q matrix
# Testing
current_state = 0
steps = [current_state]
pl.plot(scores)
pl.xlabel('No of iterations')
pl.ylabel('Reward gained')
pl.show()