Homework - 0: - MD Aamir Sohail - EE16BTECH11021 - AI5001: Introduction To Modern AI CODE: Q.5 ( - Greedy Method)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Homework - 0

• MD AAMIR SOHAIL
• EE16BTECH11021
• AI5001: Introduction to Modern AI

CODE: Q.5 ( - greedy method)


1 # MD AAMIR SOHAIL
2 # EE16BTECH11021
3 # AI5001 : Assignment 1 , Q. 5 : P l o t ( Average reward , S t e p s )
4 # N−armed b a n d i t problem u s i n g e p s i l o n −g r e e d y approach
5
6 im po rt numpy a s np
7 im po rt random
8 im po rt m a t p l o t l i b . p y p l o t a s p l t
9
10 n = 10 # o f arms
11 t a s k s = 2000 # of bandit ’ s
12 epsilon = [0.10 ,0.01 ,0.00] # s p e c i f i c e p s i n e−g r e e d y method
13 s t e p s = 1000 # o f t i m e s t o s e l e c t an arm
14
15 mu , sigma = 0 , 1
16 q = np . random . normal (mu, sigma , [ n , t a s k s ] ) # ( True ) Action−v a l u e
17
18 Q = np . z e r o s ( [ n , t a s k s ])
19 # Q = 1 [ t1 t2 . . . . . t (# t a s k s )
20 # 2 [ t1 t2 . . . . . t (# t a s k s )
21 # .
22 # .
23 # .
24 # n [ t1 t2 . . . . . t (# t a s k s ) ]
25
26 N t = np . z e r o s ( [ n , t a s k s ] )
27
28 f o r i t r in range ( len ( e p s i l o n ) ) :
29
30 p r i n t ( ’ For e p s i l o n : ’ , epsilon [ itr ])
31
32 epsilon avg = [ ]
33 e p s i l o n a v g . append ( 0 )
34
35 Q[ : ] = 0
36 N t [:] = 1
37
38 f o r i i n r a n g e ( 1 , s t e p s +1) : # s t e p s
39
40 R = []
41
42 f o r task in range ( t as k s ) :
43
44 # At each time s t e p and f o r each t a s k :
45 # with e p s i l o n prob . : randomly c h o o s e an a c t i o n f r o m s e t o f a l l a c t i o n s with
e q u a l prob .
46 # independent of acrion value estimate
47 # with 1− e p s i l o n prob . : c h o o s e an a c t i o n with max a c t i o n v a l u e e s t i m a t e
48
49 i f random . u n i f o r m ( 0 , 1 ) < e p s i l o n [ i t r ] :
50 i n d e x = np . random . r a n d i n t ( n )
51 else :
52 i f i != 1 :
53 i n d e x = np . argmax (Q [ : , t a s k ] ) # Reward o f a c t i o n a a t time s t e p t : R ˜ N( q ( a ) , 1 )
54 else :
55 i n d e x = np . random . r a n d i n t ( n ) # Randomnly s e l e c t an a c t i o n a t i n i t i a l s t e p f o r
Greedy−approach
56
57 reward = np . random . normal (mu, sigma ) + q [ i n d e x ] [ t a s k ]
58 R. append ( reward )

1
59
60 # Updating count o f t h e s e l e c t e d a c t i o n
61 N t [ index ] [ task ] = N t [ index ] [ task ] + 1
62 # Updating Action−v a l u e e s t i m a t e o f t h e s e l e c t e d a c t i o n
63 Q[ i n d e x ] [ t a s k ] = Q[ i n d e x ] [ t a s k ] + ( reward − Q[ i n d e x ] [ t a s k ] ) / N t [ i n d e x ] [ t a s k ]
64
65 # a v e r a g e o v e r 2000 t a s k s
66 avg R = np . mean (R)
67 e p s i l o n a v g . append ( avg R ) # S t e p s no . o f e l e m e n t s
68
69 p r i n t ( ’ Done e p s i l o n ’ )
70 plt . plot ( epsilon avg )
71
72 p l t . r c ( ’ t e x t ’ , u s e t e x=True )
73
74 p l t . x l a b e l ( ’ Steps ’ )
75 p l t . y l a b e l ( ’ Average Reward ’ )
76 p l t . l e g e n d ( [ r ” $ \ e p s i l o n=$ ”+s t r ( e p s i l o n [ 0 ] ) , r ” $ \ e p s i l o n=$ ”+s t r ( e p s i l o n [ 1 ] ) , r ” $ \ e p s i l o n=$ ”+s t r
( e p s i l o n [ 2 ] ) ] , l o c= ’ l o w e r r i g h t ’ , prop={ ’ s i z e ’ : 1 6 } )
77 p l t . t i t l e ( r ” $ \ e p s i l o n $ −g r e e d y a l g o r i t h m : 10−armed b a n d i t t e s t b e d ( Average o v e r 2000 t a s k s ) ”
)
78 p l t . show ( )

Figure 1: Average Reward vs Steps ( = 1000)

2
Figure 2: Average Reward vs Steps ( = 3000)

OBSERVATIONS:
• -Greedy approach eventually performs better than Greedy approach.

• For the first ∼ 100 steps, Greedy method improved faster but stuck with a sub-optimal action.
•  = 0.01-Greedy approach improves slowly but after some experience performs better than  = 0.10-
Greedy approach (see Figure 2)

3
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner

You might also like