0% found this document useful (0 votes)
4 views29 pages

AI - Practical Part 2

The document introduces Artificial Intelligence (AI), defining it as the science of creating intelligent machines that can mimic human thought processes. It outlines the goals and applications of AI, including expert systems, natural language processing, and intelligent robots, as well as providing details on various search algorithms such as Depth-First Search, Breadth-First Search, and the A* algorithm. Additionally, it discusses the Towers of Hanoi problem as an example of recursion in programming.

Uploaded by

sehofe9690
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views29 pages

AI - Practical Part 2

The document introduces Artificial Intelligence (AI), defining it as the science of creating intelligent machines that can mimic human thought processes. It outlines the goals and applications of AI, including expert systems, natural language processing, and intelligent robots, as well as providing details on various search algorithms such as Depth-First Search, Breadth-First Search, and the A* algorithm. Additionally, it discusses the Towers of Hanoi problem as an example of recursion in programming.

Uploaded by

sehofe9690
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Practical No.

1
Aim: -Introduction of Artificial Intelligence and its application.
Artificial Intelligence: - “The science and engineering of making intelligent machines,
especially Intelligent computer programs”. Artificial Intelligence is an approach to make a
computer, a robot, or a product to think how smart human think.AI is a study of how human brain think,
learn, decide and work, when it tries to solve problems. And finally, this study outputs intelligent
software systems. The aim of AI is to improve computer functions which are related to human
knowledge, for example, reasoning, learning, and problem-solving. The intelligence is intangible. It is
composed of
• Reasoning
• Learning
• Problem Solving
• Perception
• Linguistic Intelligence
The objectives of AI research are reasoning, knowledge representation, planning, learning,
natural language processing, realization, and ability to move and manipulate objects. There are
long-term goals in the general intelligence sector.
Approaches include statistical methods, computational intelligence, and traditional coding AI.
During the AI research related to search and mathematical optimization, artificial neural
networks and methods based on statistics, probability, and economics, we use many tools.
Computer science attracts AI in the field of science, mathematics, psychology, linguistics,
philosophy and so on.

Goals of AI:
• To Create Expert Systems − The systems which exhibit intelligent behavior,
learn, demonstrate, explain, and advice its users.
• To Implement Human Intelligence in Machines − Creating systems that understand,
think, learn ,and behave like humans.

Applications of AI :

➢ Gaming − AI plays important role for machine to think of large number of possible
positions based on deep knowledge in strategic games. for example, chess,river
crossing, N-queens problems and etc.

B.Tech CSE 6th Semester CTIEMT Page No. 1


➢ Natural Language Processing − Interact with the computer that understands
natural language spoken by humans.
➢ Expert Systems − Machine or software provide explanation and advice to the users.

➢ Expert Systems − Machine or software provide explanation and advice to the users.

➢ Speech Recognition − There are some AI based speech recognition systems have
ability to hear and express as sentences and understand their meanings while a

person talks to it. For example Siri and Google assistant


➢ Handwriting Recognition − The handwriting recognition software reads the text
written on paper and recognize the shapes of the letters and convert it into editable
text
➢ Intelligent Robots − Robots are able to perform the instructions given by a human

B.Tech CSE 6th Semester CTIEMT Page No. 2


Practical No. 2
Aim: Implementation of Depth-First Search (Uninformed Search Strategy)

DFS is also an important type of uniform search. DFS visits all the vertices in the graph. This
type of algorithm always chooses to go deeper into the graph. After DFS visited all the
reachable vertices from a particular sources vertices it chooses one of the remaining
undiscovered vertices and continues the search. DFS reminds the space limitation of breath
first search by always generating next a child of the deepest unexpanded nodded. The data
structure stack or (LIFO) is used for DFS. One interesting property of DFS isthat, the discover
and finish time of each vertex from a parenthesis structure. If we use one open parenthesis
when a vertex is finishedthen the result is properly nested set of parenthesis.

Algorithm of Depth-First Search :


1. PUSH the starting node into the stack.
2. If the stack is empty then stop and return failure.
3. If the top node of the stack is the goal node, then stop and return success.
4. Else POP the top node from the stack and process it. Find all its neighbors that
are in ready state andPUSH them into the stack in any order.
5. Go to step 3.
6. Exit.
Advantages of Depth First Search:
• Memory requirement is only linear with respect to the search graph. This is in contrast
with breadth-first search which requires more space. The reason is that the algorithm
only needs to store a stack of nodes on the path from the root to the current node.
• The time complexity of a depth-first Search to depth d and branching factor b (the
number of children at each node, the outdegree) is O(bd) since it generates the same set
of nodes as breadth-first search, but simply in a different order. Thus practically depth-
first search is time-limited rather than space-limited.

Disadvantages of Depth First Search:

• Depth-First Search is not guaranteed to find the solution.


• And there is no guarantee to find a minimal solution, if more than one solution.

B.Tech CSE 6th Semester CTIEMT Page No. 3


CODE:

graph = {
'5' : ['3','7'],
'3' : ['2', '4'],
'7' : ['8'],
'2' : [],
'4' : ['8'],
'8' : []
}

visited = set() # Set to keep track of visited nodes of graph.

def dfs(visited, graph, node): #function for dfs


if node not in
visited: print
(node)
visited.add(node
)
for neighbour in graph[node]:
dfs(visited, graph,
neighbour)

# Driver Code
print("Following is the Depth-First
Search") dfs(visited, graph, '5')

Output:

B.Tech CSE 6th Semester CTIEMT Page No. 4


Practical No. 3
Aim: Implementation of Breadth-First Search(Uninformed Search Strategy)
Breadth-first search (BFS) :- is an algorithm for traversing or searching tree or graph data
structures. It starts at the tree root (or some arbitrary node of a graph, sometimes referred to
as a 'search key'[1]), and explores all of the neighbor nodes at the present depth prior to moving
on to the nodes at the next depth level.

Advantages of Breadth First Search:


• BFS will never get trapped exploring the useful path forever.
• If there is a solution, BFS will definitely find it.
• If there is more than one solution then BFS can find the minimal one that requires less
number of steps.
• Low storage requirement – linear with depth.
• Easily programmable.
Disadvantages of Breadth First Search:
The main drawback of BFS is its memory requirement. Since each level of the graph must
be saved in order to generate the next level and the amount of memory is proportional to
the number of nodes stored the space complexity of BFS is O(bd ), where b is the branching
factor(the number of children at each node, the outdegree) and d is the depth. As a result,
BFS is severely space-bound in practice so will exhaust the memory available on typical
computers in a matter of minutes.

Algorithm of Breadth-First Search :


Step 1: SET STATUS = 1 (ready state) for each node in G

Step 2: Enqueue the starting node A and set its STATUS = 2 (waiting state)

Step 3: Repeat Steps 4 and 5 until QUEUE is empty

Step 4: Dequeue a node N. Process it and set its STATUS = 3 (processed state).
Step 5: Enqueue all the neighbours of N that are in the ready state (whose STATUS = 1) and
Set their STATUS = 2
(waiting state )
[ END OF LOOP]

B.Tech CSE 6th Semester CTIEMT Page No. 5


CODE:

graph = {
'A' : ['B','C'],
'B' : ['D', 'E'],
'C' : ['F'],
'D' : [],
'E' : ['F'],
'F' : []
}
visited = [] # List to keep track of visited nodes.
queue = [] #Initialize a queue

def bfs(visited, graph, node):


visited.append(node)
queue.append(node)

while queue:
s = queue.pop(0)
print (s, end = " ")

for neighbour in graph[s]:


if neighbour not in visited:
visited.append(neighbour)
queue.append(neighbour)

# Driver Code
print("bfs traversal is as follow:")
bfs(visited, graph, 'A')

OUTPUT:

B.Tech CSE 6th Semester CTIEMT Page No. 6


Practical No. 04
Aim: Implementation of A* Algorithm (Informed Search Strategy).
A* Search algorithm is one of the best and popular technique used in path-finding and graph
traversals.
A * algorithm has 3 paramters:

• g(n): The actual cost of traversal from initial state to the current state.

• h(n): The estimated cost of traversal from the current state to the goal state.

• f(n): The actual cost of traversal from the initial state to the goal state.

f(n) = g(n) + h(n)

Advantages of A* Algorithm:

• It is optimal search algorithm in terms of heuristics.

• It is one of the best heuristic search techniques.

• It is used to solve complex search problems.

• There is no other optimal algorithm guaranteed to expand fewer nodes than A*.
Disadvantages of A* Algorithm:

• This algorithm is complete if the branching factor is finite and every action has fixed
cost.

• The performance of A* search is dependant on accuracy of heuristic algorithm used


to compute thefunction h(n).

Algorithm:
// A* Search Algorithm

1. Initialize the open list

2. Initialize the closed list put the starting node on the open list (you can leave its f at zero)

3. while the open list is not empty

a) find the node with the least f on

the open list, call it "q"

b) pop q off the open list

B.Tech CSE 6th Semester CTIEMT Page No. 7


c) generate q's 8 successors and set their

parents to q

d) for each successor

i) if successor is the goal, stop search

ii) else, compute both g and h for successor

successor.g = q.g + distance between successor and q

successor.h = distance from goal to successor (This can be done using many

ways, we will discuss three heuristics- Manhattan, Diagonal and Euclidean

Heuristics)

successor.f = successor.g + successor.h

iii) if a node with the same position as successor is in the OPEN list which has a

lower f than successor, skip this successor

iV) if a node with the same position as successor is in the CLOSED list which has

a lower f than successor, skip this successor otherwise, add the node to the open list

end (for loop)

e) push q on the closed list

end (while loop)

CODE:

from collections import dequeclass

Graph:

# example of adjacency list (or rather map)#

adjacency_list = {

# 'A': [('B', 1), ('C', 3), ('D', 7)],# 'B':

[('D', 5)],

# 'C': [('D', 12)]

B.Tech CSE 6th Semester CTIEMT Page No. 8


#}

def init (self, adjacency_list):

self.adjacency_list = adjacency_list

def get_neighbors(self, v): return

self.adjacency_list[v]

# heuristic function with equal values for all nodesdef

h(self, n):

H={

'A': 1,

'B': 1,

'C': 1,

'D': 1

return H[n]

def a_star_algorithm(self, start_node, stop_node):

# open_list is a list of nodes which have been visited, but who's neighbors# haven't all

been inspected, starts off with the start node

# closed_list is a list of nodes which have been visited# and

who's neighbors have been inspected

open_list = set([start_node])

closed_list = set([])

# g contains current distances from start_node to all other nodes# the

default value (if it's not found in the map) is +infinity

g = {}

g[start_node] = 0

# parents contains an adjacency map of all nodesparents

B.Tech CSE 6th Semester CTIEMT Page No. 9


= {}

parents[start_node] = start_nodewhile

len(open_list) > 0:

n = None
# find a node with the lowest value of f() - evaluation functionfor v in

open_list:

if n == None or g[v] + self.h(v) < g[n] + self.h(n):n = v;

if n == None:

print('Path does not exist!')

return None

# if the current node is the stop_node

# then we begin reconstructin the path from it to the start_nodeif n ==

stop_node:

reconst_path = [] while

parents[n] != n:

reconst_path.append(n)

n = parents[n]

reconst_path.append(start_node)

reconst_path.reverse()

print('Path found: {}'.format(reconst_path))return

reconst_path

# for all neighbors of the current node dofor (m,

weight) in self.get_neighbors(n):

# if the current node isn't in both open_list and closed_list# add it

to open_list and note n as it's parent

if m not in open_list and m not in closed_list:

open_list.add(m)
B.Tech CSE 6th Semester CTIEMT Page No. 10
parents[m] = n

g[m] = g[n] + weight

# otherwise, check if it's quicker to first visit n, then m# and if

it is, update parent data and g data

# and if the node was in the closed_list, move it to open_listelse:

if g[m] > g[n] + weight:g[m]

= g[n] + weight

parents[m] = n

if m in closed_list:

closed_list.remove(m)

open_list.add(m)

# remove n from the open_list, and add it to closed_list#

because all of his neighbors were inspected open_list.remove(n)

closed_list.add(n)

print('Path does not exist!')

return None

adjacency_list = {

'A': [('B', 1), ('C', 3), ('D', 7)],

'B': [('D', 5)],

'C': [('D', 12)]

graph1 = Graph(adjacency_list)

graph1.a_star_algorithm('A', 'D')

OUTPUT:

B.Tech CSE 6th Semester CTIEMT Page No. 11


Practical No. 5
Aim: Write a program to implement towers of Hanoi.
Tower of Hanoi :
The solution to the Towers of Hanoi puzzle is a classic example of recursion. The ancient
puzzle of the Towers of Hanoi consists of a number of wooden disks mounted on three poles,
which are in turn attached toa baseboard. The disks each have different diameters and a hole
in the middle large enough for the poles to pass through. In the beginning, all the disks are on
the left pole. The object of the puzzle is to move all the disks over to the right pole, one at a
time, so that they end up in the original order on that pole. You can use the middle pole as a
temporary resting place for disks, but at no time is a larger disk to be on top of a smallerone.
It's easyto solve the Towers of Hanoi with two or three disks, but the process becomes more
difficult with four or more disks.

Algorithm:
• Identify the base case: If there is only one disk, simply move it from the source rod to
the destination rod.
• Otherwise, recursively solve the problem for n-1 disks, moving them from the source
rod to the auxiliary rod.
• Move the largest disk from the source rod to the destination rod.
• Recursively solve the problem for n-1 disks on the auxiliary rod, moving them to the
destination rod.

CODE:
def hanoi(n, f, to, via):if n

== 1:

print("Move disk 1 from",f,"to",to);else:

hanoi(n-1, f, via, to)

print("Move disk",n,"from",f,"to",to);

hanoi(n-1, via, to, f)

n=3

B.Tech CSE 6th Semester CTIEMT Page No. 12


f = 'A'

to = 'B' via

= 'C'

hanoi(n, f, via, to)

OUTPUT:

B.Tech CSE 6th Semester CTIEMT Page No. 13


Practical No. 6
Aim: Implementation of Tic-Tac-Toe game
Rules of the Game :
1. The game is played on a grid that's 3 squares by 3 squares.
2. You are X , your friend (or the computer in this case) is O . Players take turns putting
their marks in empty squares.
3. The first player to get 3 of her marks in a row (up, down, across, or diagonally) is the
winner.
4. When all 9 squares are full, the game is over. If no player has 3 marks in a row, the
game ends in a tie.

CODE:
import os
import time
board = [' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ']
player = 1
# Win Flags
Win = 1
Draw = -1
Running = 0
Stop = 1
Game = Running
Mark = 'X'
def DrawBoard():
print(" %c | %c | %c " % (board[1], board[2], board[3]))
print(" | | ")
print(" %c | %c | %c " % (board[4], board[5], board[6]))
print(" | | ")
print(" %c | %c | %c " % (board[7], board[8], board[9]))
print(" | | ")
def CheckPosition(x):
if board[x] == ' ':
return True

B.Tech CSE 6th Semester CTIEMT Page No. 14


else:
return False
def CheckWin():
global Game
if board[1] == board[2] and board[2] == board[3] and board[1] != ' ':
Game = Win
elif board[4] == board[5] and board[5] == board[6] and board[4] != ' ':
Game = Win
elif board[7] == board[8] and board[8] == board[9] and board[7] != ' ':
Game = Win
elif board[1] == board[4] and board[4] == board[7] and board[1] != ' ':
Game = Win
elif board[2] == board[5] and board[5] == board[8] and board[2] != ' ':
Game = Win
elif board[3] == board[6] and board[6] == board[9] and board[3] != ' ':
Game = Win
elif board[1] == board[5] and board[5] == board[9] and board[5] != ' ':
Game = Win
elif board[3] == board[5] and board[5] == board[7] and board[5] != ' ':
Game = Win
elif board[1] != ' ' and board[2] != ' ' and board[3] != ' ' and \
board[4] != ' ' and board[5] != ' ' and board[6] != ' ' and \
board[7] != ' ' and board[8] != ' ' and board[9] != ' ':
Game = Draw
else:
Game = Running
print("Tic-Tac-Toe Game Designed By Sourabh Somani")
print("Player 1 [X] --- Player 2 [O]\n")
print()
print()
print("Please Wait...")

B.Tech CSE 6th Semester CTIEMT Page No. 15


time.sleep(3)
while Game == Running:
os.system('cls')
DrawBoard()
if player % 2 != 0:
print("Player 1's chance")
Mark = 'X'
else:
print("Player 2's chance")
Mark = 'O'
choice = int(input("Enter the position between [1-9] where you want to mark: "))
if CheckPosition(choice):
board[choice] = Mark
player += 1
CheckWin()
os.system('cls')
DrawBoard()
if Game == Draw:
print("Game Draw")
elif Game == Win:
player -= 1
if player % 2 != 0:
print("Player 1 Won")
else:
print("Player 2 Won")

OUTPUT:

B.Tech CSE 6th Semester CTIEMT Page No. 16


Practical No. 7
Aim: Write a program to implement water jug problem.

Statement :We are given 2 jugs, a 4 liter one and a 3- liter one. Neither has any measuring
markers on it.There is a pump that can be used to fill the jugs with water. How can we get
exactly 2 liters of water in tothe 4-liter jugs?

Solution:-

1. Start with the initial state where both jugs are empty.

2. Create a queue. Next, add the initial state to it.

3. While the queue is not empty, opt for the following:

o Pop the front state from the queue.

o Apply all possible production rules to generate new states.

o Check if any of these new states match the goal state.

o If a goal state is found, the problem is solved.

o If not, add the new states to the queue for further exploration.

4. BFS ensures that you find the shortest path to the goal state, which is efficient for
solving the Water Jug Problem.

CODE:

def gcd(a, b):

if b == 0:

return a

else:

return gcd(b, a % b)

def can_measure_water(jug1_cap, jug2_cap, target):

if target > jug1_cap + jug2_cap:

return False

B.Tech CSE 6th Semester CTIEMT Page No. 17


if jug1_cap == 0 or jug2_cap == 0:

return target == 0 or target == jug1_cap + jug2_cap

return target % gcd(jug1_cap, jug2_cap) == 0

def measure_water(jug1_cap, jug2_cap, target):

if not can_measure_water(jug1_cap, jug2_cap, target):

return []

gcd_val = gcd(jug1_cap, jug2_cap)

a, b = jug1_cap // gcd_val, jug2_cap // gcd_val

if a > b:

a, b = b, a

jug1_cap, jug2_cap = jug2_cap, jug1_cap

q2 = target // gcd_val

while q2 > 0:

q1 = (target - b * q2) // a

yield ('fill', 1)

yield ('pour', 1, 2)

yield ('empty', 2)

yield ('pour', 1, 2)

yield ('fill', 1)

yield ('pour', 1, 2)

q2 -= 1

target -= a

jug1_cap = 4

jug2_cap = 3

target = 2

B.Tech CSE 6th Semester CTIEMT Page No. 18


if can_measure_water(jug1_cap, jug2_cap, target):

print(list(measure_water(jug1_cap, jug2_cap, target)))

else:

print("Cannot measure the desired amount of water.")

OUTPUT:

B.Tech CSE 6th Semester CTIEMT Page No. 19


Practical No. 8
Aim: Write a program to construct a Bayesian network from given data.

Theory:
A Bayesian network is a directed acyclic graph in which each edge corresponds to a conditional
dependency, and each node corresponds to a unique random variable. The Bayesian network
consists of two major parts: a directed acyclic graph and a set of conditional probability
distributions

1) The directed acyclic graph is a set of random variables represented by nodes.

2) A node's conditional probability distribution (random variable) is


defined for every possible outcome of the preceding causal node(s).
Code:
import numpy as np

import csv

import pandas as pd

from pgmpy.models import Bayesian Model

from pgmpy.estimators import Maximum Likelihood Estimator

from pgmpy.Inference import Variable Elimination

# Read Cleveland Heart Disease data

heartDisease = pd.read_csv('heart.csv')

heartDisease = heartDisease.replace('?',np.nan)

#display the data print('Few examples from the dataset are given below')
print(heartDisease.head())

#Model Bayesian Network

Model=BayesianModel([('age','trestbps'),('age','fbs'),
('sex','trestbps'),('exang','trestbps'),('trestbps','heartdise
ase'),('fbs','heartdisease'),('heartdisease','restecg'),
('heartdisease','thalach'),('heartdisease','chol')])

#Learning CPDs using Maximum Likelihood Estimators

print('\n Learning CPD using Maximum likelihood estimators')

B.Tech CSE 6th Semester CTIEMT Page No. 20


# Inferencing with Bayesian Network

print('\n Inferencing with Bayesian Network:')

HeartDisease_infer = VariableElimination(model)

# Computing the Probability of Heart Disease given Age

print('\n 1. Probability of HeartDisease given Age=30')


q=HeartDisease_infer.query(variables=['heartdisease'],evidence ={'age':28})
print(q['heartdisease'])

#computing the Probability of HeartDisease given cholesterol

print('\n 2. Probability of HeartDisease given cholesterol=100')


q=HeartDisease_infer.query(variables=['heartdisease'],evidence ={'chol':100})
print(q['heartdisease'])

Output:

B.Tech CSE 6th Semester CTIEMT Page No. 21


Practical No. 9
Aim: Write a program to infer from Bayesian Network.

Theory- Inference over a Bayesian network can come in two forms. The first is simply
evaluating the joint probability of a particular assignment of values for each variable (or a
subset ) in the network .The second, more interesting inference task ,is to find P(x|e),or, to find
the probability of some assignment of a subset of the variables (x) given assignments of other
variables (our evidence ,e).

Code:
from pgmpy.models import BayesianNetwork
from pgmpy.factors.discrete import TabularCPD
import networkx as nx
import pylab as plt

# Defining Bayesian Structure


model = BayesianNetwork([('Guest', 'Host'), ('Price', 'Host')])

# Defining the CPDs:


cpd_guest = TabularCPD('Guest', 3, [[0.33], [0.33], [0.33]])
cpd_price = TabularCPD('Price', 3, [[0.33], [0.33], [0.33]])
cpd_host = TabularCPD('Host', 3, [[0, 0, 0, 0, 0.5, 1, 0, 1, 0.5],
[0.5, 0, 1, 0, 0, 0, 1, 0, 0.5],
[0.5, 1, 0, 1, 0.5, 0, 0, 0, 0]],
evidence=['Guest', 'Price'], evidence_card=[3, 3])

# Associating the CPDs with the network structure.


model.add_cpds(cpd_guest, cpd_price, cpd_host)
# Infering the posterior probability
from pgmpy.inference import VariableElimination

infer = VariableElimination(model)
posterior_p = infer.query(['Host'], evidence={'Guest': 2, 'Price': 2})
print(posterior_p)

B.Tech CSE 6th Semester CTIEMT Page No. 22


nx.draw(model, with_labels=True)
plt.savefig('model.png')
plt.close()

Output:

B.Tech CSE 6th Semester CTIEMT Page No. 23


Practical No. 10
Aim: Write a program to run value and policy iteration in a grid world.
Theory:
1. Policy Iteration:
Policy iteration first starts with some (non-optimal) policy, such as a random policy, and
then calculates the value of each state of the MDP given that policy — this step is
called policy evaluation. It then updates the policy itself for every state by calculating
the expected reward of each action applicable from that state.

Code:

from tabular_policy import TabularPolicy


from tabular_value_function import TabularValueFunction
from qtable import QTable
class PolicyIteration:
def __init__(self, mdp, policy):
self.mdp = mdp
self.policy = policy
def policy_evaluation(self, policy, values, theta=0.001):
while True:
delta = 0.0
new_values = TabularValueFunction()
for state in self.mdp.get_states():
# Calculate the value of V(s)
actions = self.mdp.get_actions(state)
old_value = values.get_value(state)
new_value = values.get_q_value(
self.mdp, state, policy.select_action(state, actions)
)
values.update(state, new_value)
delta = max(delta, abs(old_value - new_value))
if delta < theta:
B.Tech CSE 6th Semester CTIEMT Page No. 24
break
return values
def policy_iteration(self, max_iterations=100, theta=0.001):
# create a value function to hold details
values = TabularValueFunction()
for i in range(1, max_iterations + 1):
policy_changed = False
values = self.policy_evaluation(self.policy, values, theta)
for state in self.mdp.get_states():
actions = self.mdp.get_actions(state)
old_action = self.policy.select_action(state, actions)
q_values = QTable()
for action in self.mdp.get_actions(state):
# Calculate the value of Q(s,a)
new_value = values.get_q_value(self.mdp, state, action)
q_values.update(state, action, new_value)
# V(s) = argmax_a Q(s,a)
(new_action, _) = q_values.get_max_q(state, self.mdp.get_actions(state))
self.policy.update(state, new_action)
policy_changed = (
True if new_action is not old_action else policy_changed
)

Output:

B.Tech CSE 6th Semester CTIEMT Page No. 25


2. Value Iteration:
Value iteration is a method of computing an optimal MDP policy and its value. Value
iteration starts at the "end" and then works backward, refining an estimate of either Q* or
V*. There is really no end, so it uses an arbitrary end point.

Code:

from tabular_value_function import TabularValueFunction

from qtable import QTable

class ValueIteration:

def __init__(self, mdp, values):

self.mdp = mdp

self.values = values

def value_iteration(self, max_iterations=100, theta=0.001):

for i in range(max_iterations):

delta = 0.0

new_values = TabularValueFunction()

for state in self.mdp.get_states():

qtable = QTable()

for action in self.mdp.get_actions(state):

# Calculate the value of Q(s,a)

new_value = 0.0

for (new_state, probability) in self.mdp.get_transitions(

state, action

):

reward = self.mdp.get_reward(state, action, new_state)

new_value += probability * (

reward

+(

self.mdp.get_discount_factor()

* self.values.get_value(new_state)
B.Tech CSE 6th Semester CTIEMT Page No. 26
)

qtable.update(state, action, new_value)

# V(s) = max_a Q(sa)

(_, max_q) = qtable.get_max_q(state, self.mdp.get_actions(state))

delta = max(delta, abs(self.values.get_value(state) - max_q))

new_values.update(state, max_q)

self.values.merge(new_values)

# Terminate if the value function has converged

if delta < theta:

return i

Output:

B.Tech CSE 6th Semester CTIEMT Page No. 27


Practical No. 11
Aim: Write a program to do reinforcement learning in the grid world.
Theory:
This is a toy environment called Gridworld that is often used as a toy model in the
Reinforcement Learning literature. In this particular case: State space: GridWorld has 10x10 =
100 distinct states. The start state is the top left cell. The gray cells are walls and cannot be
moved to.

Code:

# global variables

BOARD_ROWS = 3

BOARD_COLS = 4

WIN_STATE = (0, 3)

LOSE_STATE = (1, 3)

START = (2, 0)

DETERMINISTIC = True

def giveReward(self):

if self.state == WIN_STATE:

return 1

elif self.state == LOSE_STATE:

return -1

else:

return 0

def nxtPosition(self, action):

if self.determine:

if action == "up":

nxtState = (self.state[0] - 1, self.state[1])

elif action == "down":


B.Tech CSE 6th Semester CTIEMT Page No. 28
nxtState = (self.state[0] + 1, self.state[1])

elif action == "left":

nxtState = (self.state[0], self.state[1] - 1)

else:

nxtState = (self.state[0], self.state[1] + 1)

def play(self, rounds=10):

i=0

while i < rounds:

if self.State.isEnd:

# back propagate reward

reward = self.State.giveReward()

# explicitly assign end state to reward values

self.state_values[self.State.state] = reward # this is optional

print("Game End Reward", reward)

for s in reversed(self.states):

reward = self.state_values[s] + self.lr * (reward - self.state_values[s])

self.state_values[s] = round(reward, 3)

self.reset()

Output:

B.Tech CSE 6th Semester CTIEMT Page No. 29

You might also like