Script

The document outlines the instructions for Programming Assignment 2, including submission guidelines and the importance of originality. It introduces a Markov Decision Process with four states and two actions, focusing on a deterministic career path problem. Additionally, it includes code snippets for implementing the environment and testing policies within the framework of the assignment.

Uploaded by

aashish0270

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

Script

Uploaded by

aashish0270

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

1 {

2 "cells": [
3 {
4 "cell_type": "markdown",
5 "id": "17c4e1c2-931b-4e68-9378-5d376b0df0ef",
6 "metadata": {
7 "id": "17c4e1c2-931b-4e68-9378-5d376b0df0ef"
8 },
9 "source": [
10 "# Programming Assignment 2"
11 ]
12 },
13 {
14 "cell_type": "markdown",
15 "id": "7d597afc-35f8-4b03-8084-de771032604c",
16 "metadata": {
17 "id": "7d597afc-35f8-4b03-8084-de771032604c"
18 },
19 "source": [
20 "**Name:** \n",
21 "**Roll No:**\n",
22 "***\n",
23 "\n",
24 "## Instructions\n",
25 "\n",
26 "\n",
27 "- Kindly name your submission files as `RollNo_Name_PA2.ipynb`. \n",
28 "- You are required to work out your answers and submit only the iPython Notebook.
The code should be well commented and easy to understand as there are marks for
this. This notebook can be used as a template for assignment submission. \n",
29 "- Submissions are to be made through iPearl portal. Submissions made through mail
will not be graded. \n",
30 "- Answers to the theory questions if any should be included in the notebook itself.
While using special symbols use the $\\LaTeX$ mode \n",
31 "- Make sure your plots are clear and have title, legends and clear lines, etc. \n",
32 "- Plagiarism of any form will not be tolerated. If your solutions are found to
match with other students or from other uncited sources, there will be heavy
penalties and the incident will be reported to the disciplinary authorities. \n",
33 "- In case you have any doubts, feel free to reach out to TAs for help. \n",
34 "\n",
35 "***"
36 ]
37 },
38 {
39 "cell_type": "markdown",
40 "id": "69751002-4656-47cf-8b2c-6016a434f4b6",
41 "metadata": {
42 "id": "69751002-4656-47cf-8b2c-6016a434f4b6"
43 },
44 "source": [
45 "## E1: A Deterministic Career Path\n",
46 "\n",
47 "Consider a simple Markov Decision Process below with four states and two actions
available at each state. In this simplistic setting actions have deterministic
effects, i.e., taking an action in a state always leads to one next state with
transition probability equal to one. There are two actions out of each state for the
agent to choose from: D for development and R for research. The
_ultimately-care-only-about-money_ reward scheme is given along with the states.\n",
48 "\n",
49 "<img src='assets/mdp-d.png' width=\"700\" align=\"left\"></img>"
50 ]
51 },
52 {
53 "cell_type": "code",
54 "execution_count": null,
55 "id": "b0f991f3-9630-4656-9caa-135f13847ed8",
56 "metadata": {
57 "id": "b0f991f3-9630-4656-9caa-135f13847ed8"
58 },
59 "outputs": [],
60 "source": [
61 "# import required libraries\n",
62 "import gymnasium as gym\n",
63 "import copy\n",
64 "import numpy as np\n",
65 "import matplotlib.pyplot as plt\n",
66 "import matplotlib.font_manager\n",
67 "import random"
68 ]
69 },
70 {
71 "cell_type": "markdown",
72 "id": "47afefbb-7b00-44e5-82dd-16953b59a7f3",
73 "metadata": {
74 "id": "47afefbb-7b00-44e5-82dd-16953b59a7f3"
75 },
76 "source": [
77 "### E1.1 Environment Implementation"
78 ]
79 },
80 {
81 "cell_type": "code",
82 "execution_count": null,
83 "id": "61c01aa4-94c3-48e5-ad3b-279e03685260",
84 "metadata": {
85 "id": "61c01aa4-94c3-48e5-ad3b-279e03685260"
86 },
87 "outputs": [],
88 "source": [
89 "'''\n",
90 "Represents a Career Path problem Gym Environment which provides a Fully
observable\n",
91 "MDP\n",
92 "'''\n",
93 "class CareerPathEnv(gym.Env):\n",
94 " '''\n",
95 " CareerPathEnv represents the Gym Environment for the Career Path problem
environment\n",
96 " States : [0:'Unemployed',1:'Industry',2:'Grad School',3:'Academia']\n",
97 " Actions : [0:'Research', 1:'Development']\n",
98 " '''\n",
99 " metadata = {'render.modes': ['human']}\n",
100 "\n",
101 " def __init__(self,initial_state=0,no_states=4,no_actions=2):\n",
102 " '''\n",
103 " Constructor for the CareerPath class\n",
104 "\n",
105 " Args:\n",
106 " initial_state : starting state of the agent\n",
107 " no_states : The no. of possible states which is 4\n",
108 " no_actions : The no. of possible actions which is 2\n",
109 "\n",
110 " '''\n",
111 " self.initial_state = initial_state\n",
112 " self.state = self.initial_state\n",
113 " self.nA = no_actions\n",
114 " self.nS = no_states\n",
115 " self.prob_dynamics = {\n",
116 " # s: {\n",
117 " # a: [(p(s,s'|a), s', r', terminal/not)]\n",
118 " # }\n",
119 "\n",
120 " 0: {\n",
121 " 0: [(1.0, 2, 0.0, False)],\n",
122 " 1: [(1.0, 1, 100.0, False)],\n",
123 " },\n",
124 " 1: {\n",
125 " 0: [(1.0, 0, -10.0, False)],\n",
126 " 1: [(1.0, 1, 100.0, False)],\n",
127 " },\n",
128 " 2: {\n",
129 " 0: [(1.0, 3, 10.0, False)],\n",
130 " 1: [(1.0, 1, 100.0, False)],\n",
131 " },\n",
132 " 3: {\n",
133 " 0: [(1.0, 3, 10.0, False)],\n",
134 " 1: [(1.0, 1, 100.0, False)],\n",
135 " },\n",
136 " }\n",
137 " self.reset()\n",
138 "\n",
139 " def reset(self):\n",
140 " '''\n",
141 " Resets the environment\n",
142 " Returns:\n",
143 " observations containing player's current state\n",
144 " '''\n",
145 " self.state = self.initial_state\n",
146 " return self.get_obs()\n",
147 "\n",
148 " def get_obs(self):\n",
149 " '''\n",
150 " Returns the player's state as the observation of the environment\n",
151 " '''\n",
152 " return (self.state)\n",
153 "\n",
154 " def render(self, mode='human'):\n",
155 " '''\n",
156 " Renders the environment\n",
157 " '''\n",
158 " print(\"Current state: {}\".format(self.state))\n",
159 "\n",
160 " def sample_action(self):\n",
161 " '''\n",
162 " Samples and returns a random action from the action space\n",
163 " '''\n",
164 " return random.randint(0, self.nA)\n",
165 " def P(self):\n",
166 " '''\n",
167 " Defines and returns the probabilty transition matrix which is in the form
of a nested dictionary\n",
168 " '''\n",
169 " self.prob_dynamics = {\n",
170 " 0: {\n",
171 " 0: [(1.0, 2, 0.0, False)],\n",
172 " 1: [(1.0, 1, 100.0, False)],\n",
173 " },\n",
174 " 1: {\n",
175 " 0: [(1.0, 0, -10.0, False)],\n",
176 " 1: [(1.0, 1, 100.0, False)],\n",
177 " },\n",
178 " 2: {\n",
179 " 0: [(1.0, 3, 10.0, False)],\n",
180 " 1: [(1.0, 1, 100.0, False)],\n",
181 " },\n",
182 " 3: {\n",
183 " 0: [(1.0, 3, 10.0, False)],\n",
184 " 1: [(1.0, 1, 100.0, False)],\n",
185 " },\n",
186 " }\n",
187 " return self.prob_dynamics\n",
188 "\n",
189 "\n",
190 " def step(self, action):\n",
191 " '''\n",
192 " Performs the given action\n",
193 " Args:\n",
194 " action : action from the action_space to be taking in the
environment\n",
195 " Returns:\n",
196 " observation - returns current state\n",
197 " reward - reward obtained after taking the given action\n",
198 " done - True if the episode is complete else False\n",
199 " '''\n",
200 " if action >= self.nA:\n",
201 " action = self.nA-1\n",
202 "\n",
203 " dynamics_tuple = self.prob_dynamics[self.state][action][0]\n",
204 " self.state = dynamics_tuple[1]\n",
205 "\n",
206 "\n",
207 " return self.state, dynamics_tuple[2], dynamics_tuple[3]"
208 ]
209 },
210 {
211 "cell_type": "markdown",
212 "id": "c9125c6e-8599-4dea-b388-b20596e33201",
213 "metadata": {
214 "id": "c9125c6e-8599-4dea-b388-b20596e33201"
215 },
216 "source": [
217 "### E1.2 Policies\n",
218 "\n",
219 "After implementing the environment let us see how to make decisions in the
environment. Let $\\pi_1(s) = R$ and $\\pi_2(s) = D$ for any state be two policies.
Let us see how these policies look like."
220 ]
221 },
222 {
223 "cell_type": "code",
224 "execution_count": null,
225 "id": "7d58aa48-25aa-4cb3-a70d-c4ac68f0cacc",
226 "metadata": {
227 "id": "7d58aa48-25aa-4cb3-a70d-c4ac68f0cacc",
228 "outputId": "8d2c9d13-71f0-4ed9-b0d8-28b4eac47e1e"
229 },
230 "outputs": [
231 {
232 "name": "stdout",
233 "output_type": "stream",
234 "text": [
235 "Research policy: \n",
236 " [[1. 0.]\n",
237 " [1. 0.]\n",
238 " [1. 0.]\n",
239 " [1. 0.]]\n",
240 "Development policy: \n",
241 " [[0. 1.]\n",
242 " [0. 1.]\n",
243 " [0. 1.]\n",
244 " [0. 1.]]\n",
245 "Random policy: \n",
246 " [[1 0]\n",
247 " [0 1]\n",
248 " [0 1]\n",
249 " [1 0]]\n",
250 "Uncertain policy: \n",
251 " [[0.5 0.5]\n",
252 " [0.5 0.5]\n",
253 " [0.5 0.5]\n",
254 " [0.5 0.5]]\n"
255 ]
256 }
257 ],
258 "source": [
259 "policy_R = np.concatenate((np.ones([4, 1]), np.zeros([4, 1])), axis=1)\n",
260 "policy_D = np.concatenate((np.zeros([4, 1]), np.ones([4, 1])), axis=1)\n",
261 "policy_random = np.array((np.random.permutation(2), np.random.permutation(2),
np.random.permutation(2), np.random.permutation(2)))\n",
262 "print(\"Research policy: \\n\",policy_R)\n",
263 "print(\"Development policy: \\n\", policy_D)\n",
264 "print(\"Random policy: \\n\",policy_random)\n",
265 "\n",
266 "policy_uncertain = np.concatenate((0.5*np.ones([4, 1]), 0.5*np.ones([4, 1])),
axis=1)\n",
267 "print(\"Uncertain policy: \\n\",policy_uncertain)"
268 ]
269 },
270 {
271 "cell_type": "markdown",
272 "id": "ed00cfd0",
273 "metadata": {
274 "id": "ed00cfd0"
275 },
276 "source": [
277 "### E1.3 Testing\n",
278 "\n",
279 "By usine one of the above policies, lets see how we navigate the environment. We
want to see how we make take and action based on a given policy, what state we
transition to and obtain the rewards from the transition."
280 ]
281 },
282 {
283 "cell_type": "code",
284 "execution_count": null,
285 "id": "3fd4869e",
286 "metadata": {
287 "id": "3fd4869e",
288 "outputId": "e0dc70c0-be1d-469f-f1db-d1157bd19c1c"
289 },
290 "outputs": [
291 {
292 "name": "stdout",
293 "output_type": "stream",
294 "text": [
295 "State\t Action\t New State\t Reward\t is_Terminal\n",
296 " 0 \t 1 \t 1 \t 100.0 \t False\n",
297 " 1 \t 1 \t 1 \t 100.0 \t False\n",
298 " 1 \t 0 \t 0 \t -10.0 \t False\n",
299 " 0 \t 0 \t 2 \t 0.0 \t False\n",
300 " 2 \t 0 \t 3 \t 10.0 \t False\n",
301 " 3 \t 1 \t 1 \t 100.0 \t False\n",
302 " 1 \t 1 \t 1 \t 100.0 \t False\n",
303 " 1 \t 1 \t 1 \t 100.0 \t False\n",
304 " 1 \t 0 \t 0 \t -10.0 \t False\n",
305 " 0 \t 0 \t 2 \t 0.0 \t False\n",
306 "Total Number of steps: 10\n",
307 "Final Reward: 490.0\n"
308 ]
309 }
310 ],
311 "source": [
312 "env = CareerPathEnv()\n",
313 "is_Terminal = False\n",
314 "start_state = env.reset()\n",
315 "steps = 0\n",
316 "total_reward = 0\n",
317 "\n",
318 "# you may change policy here\n",
319 "policy = policy_uncertain\n",
320 "# policy = policy_R\n",
321 "# policy = policy_D\n",
322 "# policy = policy_random\n",
323 "\n",
324 "print(\"State\\t\", \"Action\\t\" , \"New State\\t\" , \"Reward\\t\" ,
\"is_Terminal\")\n",
325 "steps = 0\n",
326 "max_steps = 5\n",
327 "\n",
328 "prev_state = start_state\n",
329 "\n",
330 "while steps < 10:\n",
331 " steps += 1\n",
332 "\n",
333 " action = np.random.choice(2,1,p=policy[prev_state])[0] #0 -> Research, 1 ->
Development\n",
334 " state, reward, is_Terminal = env.step(action)\n",
335 "\n",
336 " total_reward += reward\n",
337 "\n",
338 " print(\" \",prev_state, \"\\t \", action, \"\\t \", state, \"\\t\", reward,
\"\\t \", is_Terminal)\n",
339 " prev_state = state\n",
340 "\n",
341 "print(\"Total Number of steps:\", steps)\n",
342 "print(\"Final Reward:\", total_reward)"
343 ]
344 },
345 {
346 "cell_type": "markdown",
347 "id": "9121e977-35c4-4845-888d-2b662262347a",
348 "metadata": {
349 "id": "9121e977-35c4-4845-888d-2b662262347a"
350 },
351 "source": [
352 "### Iterative Policy Evaluation\n",
353 "Iterative Policy Evaluation is commonly used to calculate the state value function
$V_\\pi(s)$ for a given policy $\\pi$. Here we implement a function to compute the
state value function $V_\\pi(s)$ for a given policy\n",
354 "\n",
355 "<img src='assets/policy_eval.png' width=\"500\" align=\"left\"></img>"
356 ]
357 },
358 {
359 "cell_type": "code",
360 "execution_count": null,
361 "id": "c360b73b-e4d4-4538-b654-5837a123ee11",
362 "metadata": {
363 "id": "c360b73b-e4d4-4538-b654-5837a123ee11"
364 },
365 "outputs": [],
366 "source": [
367 "# Policy Evaluation\n",
368 "def EvaluatePolicy(env, policy, gamma=0.9, theta=1e-8, draw=False):\n",
369 " V = np.zeros(env.nS)\n",
370 " while True:\n",
371 " delta = 0\n",
372 " for s in range(env.nS):\n",
373 " Vs = 0\n",
374 " for a, action_prob in enumerate(policy[s]):\n",
375 " for prob, next_state, reward, done in env.P()[s][a]:\n",
376 " Vs += action_prob * prob * (reward + gamma * V[next_state])\n",
377 " delta = max(delta, np.abs(V[s]-Vs))\n",
378 " V[s] = Vs\n",
379 " if delta < theta:\n",
380 " break\n",
381 " return V"
382 ]
383 },
384 {
385 "cell_type": "markdown",
386 "id": "7b891c3a",
387 "metadata": {
388 "id": "7b891c3a"
389 },
390 "source": [
391 "### Policy improvement\n",
392 "\n",
393 "$\\pi'(s) = \\arg \\max_a \\sum_{s',r} p(s',r|s,a)\\left[ r + \\gamma v_\\pi(s')
\\right ]$\n"
394 ]
395 },
396 {
397 "cell_type": "code",
398 "execution_count": null,
399 "id": "930db58c",
400 "metadata": {
401 "id": "930db58c"
402 },
403 "outputs": [],
404 "source": [
405 "##Policy Improvement Function\n",
406 "def ImprovePolicy(env, v, gamma):\n",
407 " num_states = env.nS\n",
408 " num_actions = env.nA\n",
409 " prob_dynamics = env.P()\n",
410 "\n",
411 " q = np.zeros((num_states, num_actions))\n",
412 "\n",
413 " for state in prob_dynamics:\n",
414 " for action in prob_dynamics[state]:\n",
415 " #print(state, action)\n",
416 " for prob, new_state, reward, is_terminal in
prob_dynamics[state][action]:\n",
417 " #print(prob, new_state, reward, is_terminal)\n",
418 " q[state][action] += prob*(reward + gamma*v[new_state])\n",
419 "\n",
420 " new_pi = np.zeros((num_states, num_actions))\n",
421 "\n",
422 " for state in range(num_states):\n",
423 " opt_action = np.argmax(q[state])\n",
424 " new_pi[state][opt_action] = 1.0\n",
425 "\n",
426 " return new_pi"
427 ]
428 },
429 {
430 "cell_type": "markdown",
431 "id": "d8634e11",
432 "metadata": {
433 "id": "d8634e11"
434 },
435 "source": [
436 "### Policy Iteration\n",
437 "\n",
438 "<img src='assets/policy_iteration.png' width=\"500\" align=\"left\"></img>"
439 ]
440 },
441 {
442 "cell_type": "code",
443 "execution_count": null,
444 "id": "1860233b",
445 "metadata": {
446 "id": "1860233b"
447 },
448 "outputs": [],
449 "source": [
450 "def PolicyIteration(env, pi, gamma, tol = 1e-10):\n",
451 " num_states = env.nS\n",
452 " num_actions = env.nA\n",
453 " iterations = 0\n",
454 "\n",
455 " while True:\n",
456 " # print(pi)\n",
457 " iterations += 1\n",
458 " pi_old = pi\n",
459 " v = EvaluatePolicy(env, pi_old, gamma, tol)\n",
460 " pi = ImprovePolicy(env, v, gamma)\n",
461 "\n",
462 " is_equal = True\n",
463 " for s in range(num_states):\n",
464 " if np.argmax(pi_old[s]) == np.argmax(pi[s]):\n",
465 " continue\n",
466 " is_equal = False\n",
467 " if is_equal == True:\n",
468 " break\n",
469 " return pi, v, iterations\n",
470 "\n"
471 ]
472 },
473 {
474 "cell_type": "markdown",
475 "id": "5049ce61",
476 "metadata": {
477 "id": "5049ce61"
478 },
479 "source": [
480 "### Testing Policy Iteration"
481 ]
482 },
483 {
484 "cell_type": "code",
485 "execution_count": null,
486 "id": "9da9d38f",
487 "metadata": {
488 "id": "9da9d38f",
489 "outputId": "dfcaf066-c998-4339-c29c-24633da4e496"
490 },
491 "outputs": [
492 {
493 "name": "stdout",
494 "output_type": "stream",
495 "text": [
496 "Initial Policy: \n",
497 " [[1 0]\n",
498 " [0 1]\n",
499 " [0 1]\n",
500 " [1 0]]\n",
501 "Final Policy: \n",
502 " [[0. 1.]\n",
503 " [0. 1.]\n",
504 " [0. 1.]\n",
505 " [0. 1.]]\n",
506 "State Value Function: [1000. 1000. 1000. 1000.]\n",
507 "Number of iterations for Policy Iteration: 2\n",
508 "Iterations:\n",
509 "Min\t Max\t Average\n",
510 "1 \t 2 \t 1.91\n"
511 ]
512 }
513 ],
514 "source": [
515 "gamma = 0.9\n",
516 "env = CareerPathEnv()\n",
517 "\n",
518 "print(\"Initial Policy: \\n\",policy_random)\n",
519 "pi, v, iters = PolicyIteration(env, policy_random, gamma)\n",
520 "print(\"Final Policy: \\n\",pi)\n",
521 "print(\"State Value Function: \",v)\n",
522 "print(\"Number of iterations for Policy Iteration: \",iters)\n",
523 "\n",
524 "# average number of iterations required\n",
525 "avg_iters = 0\n",
526 "min_iters = 1000\n",
527 "max_iters = 0\n",
528 "for _ in range(100):\n",
529 " policy_random = np.array((np.random.permutation(2), np.random.permutation(2),
np.random.permutation(2), np.random.permutation(2)))\n",
530 " _, _, iters = PolicyIteration(env,policy_random, gamma)\n",
531 " avg_iters += iters\n",
532 " min_iters = min(min_iters, iters)\n",
533 " max_iters = max(max_iters, iters)\n",
534 "avg_iters /= 100\n",
535 "print(\"Iterations:\")\n",
536 "print(\"Min\\t\", \"Max\\t\" , \"Average\")\n",
537 "print(min_iters,\"\\t\", max_iters,\"\\t\", avg_iters)"
538 ]
539 },
540 {
541 "cell_type": "markdown",
542 "id": "b57f1d6c-dd20-4902-9581-ce7f8a0ec944",
543 "metadata": {
544 "id": "b57f1d6c-dd20-4902-9581-ce7f8a0ec944"
545 },
546 "source": [
547 "***"
548 ]
549 },
550 {
551 "cell_type": "markdown",
552 "id": "c086d4b9",
553 "metadata": {
554 "id": "c086d4b9"
555 },
556 "source": [
557 "### A1. Find an optimal policy to navigate the given environment using Value
Iteration (VI)\n",
558 "\n",
559 "<img src='assets/value_iteration.png' width=\"500\" align=\"left\"></img>"
560 ]
561 },
562 {
563 "cell_type": "code",
564 "execution_count": null,
565 "id": "82e66db5",
566 "metadata": {
567 "id": "82e66db5"
568 },
569 "outputs": [],
570 "source": [
571 "# write your code here\n"
572 ]
573 },
574 {
575 "cell_type": "markdown",
576 "id": "876d88b5",
577 "metadata": {
578 "id": "876d88b5"
579 },
580 "source": [
581 "### Testing Value Iterations"
582 ]
583 },
584 {
585 "cell_type": "code",
586 "execution_count": null,
587 "id": "90c71594",
588 "metadata": {
589 "id": "90c71594"
590 },
591 "outputs": [],
592 "source": [
593 "# write your code for testing value iteration here\n"
594 ]
595 },
596 {
597 "cell_type": "markdown",
598 "id": "83bf1175",
599 "metadata": {
600 "id": "83bf1175"
601 },
602 "source": [
603 "### A1.2 Compare PI and VI in terms of convergence (average number of iteration,
time required for each iteration). Is the policy obtained by both same?\n",
604 "\n",
605 "Write your answer here"
606 ]
607 },
608 {
609 "cell_type": "markdown",
610 "id": "4cc712ac",
611 "metadata": {
612 "id": "4cc712ac"
613 },
614 "source": [
615 "***"
616 ]
617 },
618 {
619 "cell_type": "markdown",
620 "id": "a47f6340",
621 "metadata": {
622 "id": "a47f6340"
623 },
624 "source": [
625 "## Part B : A Stochastic Career Path\n",
626 "\n",
627 "Now consider a more realistic Markov Decision Process below with four states and
two actions available at each state. In this setting Actions have nondeterministic
effects, i.e., taking an action in a state always leads to one next state, but which
state is the one next state is determined by transition probabilities. These
transition probabilites are shown in the figure attached to the transition arrows
from states and actions to states. There are two actions out of each state for the
agent to choose from: D for development and R for research. The same
_ultimately-care-only-about-money_ reward scheme is given along with the states.\n",
628 "\n",
629 "<img src='assets/mdp-nd.png' width=\"700\" align=\"left\"></img>"
630 ]
631 },
632 {
633 "cell_type": "code",
634 "execution_count": null,
635 "id": "73a0ac3e-223e-42c9-896f-d3d74c060258",
636 "metadata": {
637 "id": "73a0ac3e-223e-42c9-896f-d3d74c060258"
638 },
639 "outputs": [],
640 "source": [
641 "'''\n",
642 "Represents a Career Path problem Gym Environment which provides a Fully
observable\n",
643 "MDP\n",
644 "'''\n",
645 "class StochasticCareerPathEnv(gym.Env):\n",
646 " '''\n",
647 " StocasticCareerPathEnv represents the Gym Environment for the Career Path
problem environment\n",
648 " States : [0:'Unemployed',1:'Industry',2:'Grad School',3:'Academia']\n",
649 " Actions : [0:'Research', 1:'Development']\n",
650 " '''\n",
651 " metadata = {'render.modes': ['human']}\n",
652 "\n",
653 " def __init__(self,initial_state=3,no_states=4,no_actions=2):\n",
654 " '''\n",
655 " Constructor for the CareerPath class\n",
656 "\n",
657 " Args:\n",
658 " initial_state : starting state of the agent\n",
659 " no_states : The no. of possible states which is 4\n",
660 " no_actions : The no. of possible actions which is 2\n",
661 "\n",
662 " '''\n",
663 " self.initial_state = initial_state\n",
664 " self.state = self.initial_state\n",
665 " self.nA = no_actions\n",
666 " self.nS = no_states\n",
667 " self.prob_dynamics = {\n",
668 " # s: {\n",
669 " # a: [(p(s,s'|a), s', r', terminal/not), (p(s,s''|a), s'', r'',
terminal/not)]\n",
670 " # }\n",
671 "\n",
672 " 0: {\n",
673 " 0: [(1.0, 2, 0.0, False)],\n",
674 " 1: [(1.0, 1, 100.0, False)],\n",
675 " },\n",
676 " 1: {\n",
677 " 0: [(0.9, 0, -10.0, False),(0.1, 1, 100, False)],\n",
678 " 1: [(1.0, 1, 100.0, False)],\n",
679 " },\n",
680 " 2: {\n",
681 " 0: [(0.9, 3, 10.0, False),(0.1, 2, 0, False)],\n",
682 " 1: [(0.9, 1, 100.0, False),(0.1, 1, 100, False)],\n",
683 " },\n",
684 " 3: {\n",
685 " 0: [(1.0, 3, 10.0, False)],\n",
686 " 1: [(0.9, 1, 100.0, False),(0.1, 3, 10, False)],\n",
687 " },\n",
688 " }\n",
689 " self.reset()\n",
690 "\n",
691 " def reset(self):\n",
692 " '''\n",
693 " Resets the environment\n",
694 " Returns:\n",
695 " observations containing player's current state\n",
696 " '''\n",
697 " self.state = self.initial_state\n",
698 " return self.get_obs()\n",
699 "\n",
700 " def get_obs(self):\n",
701 " '''\n",
702 " Returns the player's state as the observation of the environment\n",
703 " '''\n",
704 " return (self.state)\n",
705 "\n",
706 " def render(self, mode='human'):\n",
707 " '''\n",
708 " Renders the environment\n",
709 " '''\n",
710 " print(\"Current state: {}\".format(self.state))\n",
711 "\n",
712 " def sample_action(self):\n",
713 " '''\n",
714 " Samples and returns a random action from the action space\n",
715 " '''\n",
716 " return random.randint(0, self.nA)\n",
717 " def P(self):\n",
718 " '''\n",
719 " Defines and returns the probabilty transition matrix which is in the form
of a nested dictionary\n",
720 " '''\n",
721 " self.prob_dynamics = {\n",
722 " 0: {\n",
723 " 0: [(1.0, 2, 0.0, False)],\n",
724 " 1: [(1.0, 1, 100.0, False)],\n",
725 " },\n",
726 " 1: {\n",
727 " 0: [(0.9, 0, -10.0, False),(0.1, 1, 100, False)],\n",
728 " 1: [(1.0, 1, 100.0, False)],\n",
729 " },\n",
730 " 2: {\n",
731 " 0: [(0.9, 3, 10.0, False),(0.1, 2, 0, False)],\n",
732 " 1: [(0.9, 1, 100.0, False),(0.1, 1, 100, False)],\n",
733 " },\n",
734 " 3: {\n",
735 " 0: [(1.0, 3, 10.0, False)],\n",
736 " 1: [(0.9, 1, 100.0, False),(0.1, 3, 10, False)],\n",
737 " },\n",
738 " }\n",
739 " return self.prob_dynamics\n",
740 "\n",
741 "\n",
742 " def step(self, action):\n",
743 " '''\n",
744 " Performs the given action\n",
745 " Args:\n",
746 " action : action from the action_space to be taking in the
environment\n",
747 " Returns:\n",
748 " observation - returns current state\n",
749 " reward - reward obtained after taking the given action\n",
750 " done - True if the episode is complete else False\n",
751 " '''\n",
752 " if action >= self.nA:\n",
753 " action = self.nA-1\n",
754 "\n",
755 " if self.state == 0 or (self.state == 1 and action == 1) or (self.state == 3
and action == 0):\n",
756 " index = 0\n",
757 " else:\n",
758 " index = np.random.choice(2,1,p=[0.9,0.1])[0]\n",
759 "\n",
760 " dynamics_tuple = self.prob_dynamics[self.state][action][index]\n",
761 " self.state = dynamics_tuple[1]\n",
762 "\n",
763 "\n",
764 " return self.state, dynamics_tuple[2], dynamics_tuple[3]"
765 ]
766 },
767 {
768 "cell_type": "markdown",
769 "id": "4e6a5212",
770 "metadata": {
771 "id": "4e6a5212"
772 },
773 "source": [
774 "### Navigating in Stochastic Career Path"
775 ]
776 },
777 {
778 "cell_type": "code",
779 "execution_count": null,
780 "id": "d3e69064",
781 "metadata": {
782 "id": "d3e69064",
783 "outputId": "8bba3991-3a27-43c3-dcca-5919f2277c1a"
784 },
785 "outputs": [
786 {
787 "name": "stdout",
788 "output_type": "stream",
789 "text": [
790 "State\t Action\t New State\t Reward\t is_Terminal\n",
791 " 3 \t 1 \t 1 \t 100.0 \t False\n",
792 " 1 \t 1 \t 1 \t 100.0 \t False\n",
793 " 1 \t 1 \t 1 \t 100.0 \t False\n",
794 " 1 \t 1 \t 1 \t 100.0 \t False\n",
795 " 1 \t 1 \t 1 \t 100.0 \t False\n",
796 " 1 \t 1 \t 1 \t 100.0 \t False\n",
797 " 1 \t 1 \t 1 \t 100.0 \t False\n",
798 " 1 \t 1 \t 1 \t 100.0 \t False\n",
799 " 1 \t 1 \t 1 \t 100.0 \t False\n",
800 " 1 \t 1 \t 1 \t 100.0 \t False\n",
801 "Total Number of steps: 10\n",
802 "Final Reward: 1000.0\n"
803 ]
804 }
805 ],
806 "source": [
807 "env = StochasticCareerPathEnv()\n",
808 "is_Terminal = False\n",
809 "start_state = env.reset()\n",
810 "steps = 0\n",
811 "total_reward = 0\n",
812 "\n",
813 "# you may change policy here\n",
814 "policy = policy_random\n",
815 "# policy = policy_1\n",
816 "# policy = policy_2\n",
817 "\n",
818 "print(\"State\\t\", \"Action\\t\" , \"New State\\t\" , \"Reward\\t\" ,
\"is_Terminal\")\n",
819 "steps = 0\n",
820 "max_steps = 5\n",
821 "\n",
822 "prev_state = start_state\n",
823 "\n",
824 "while steps < 10:\n",
825 " steps += 1\n",
826 "\n",
827 " action = np.random.choice(2,1,p=policy[prev_state])[0] #0 -> Research, 1 ->
Development\n",
828 " state, reward, is_Terminal = env.step(action)\n",
829 "\n",
830 " total_reward += reward\n",
831 "\n",
832 " print(\" \",prev_state, \"\\t \", action, \"\\t \", state, \"\\t\", reward,
\"\\t \", is_Terminal)\n",
833 " prev_state = state\n",
834 "\n",
835 "print(\"Total Number of steps:\", steps)\n",
836 "print(\"Final Reward:\", total_reward)"
837 ]
838 },
839 {
840 "cell_type": "markdown",
841 "id": "c01a3612",
842 "metadata": {
843 "id": "c01a3612"
844 },
845 "source": [
846 "### B1.1 Find an optimal policy to navigate the given SCP environment using Policy
Iteration (PI)"
847 ]
848 },
849 {
850 "cell_type": "code",
851 "execution_count": null,
852 "id": "eb1faa8a",
853 "metadata": {
854 "id": "eb1faa8a"
855 },
856 "outputs": [],
857 "source": [
858 "# [Hint] What would change for the stochastic MDP in the Policy Iteration code from
Part A?\n",
859 "# write your code here"
860 ]
861 },
862 {
863 "cell_type": "markdown",
864 "id": "3264b19c",
865 "metadata": {
866 "id": "3264b19c"
867 },
868 "source": [
869 "### B1.2 Find an optimal policy to navigate the given SCP environment using Value
Iteration (VI)"
870 ]
871 },
872 {
873 "cell_type": "code",
874 "execution_count": null,
875 "id": "77079cf6",
876 "metadata": {
877 "id": "77079cf6"
878 },
879 "outputs": [],
880 "source": [
881 "# [Hint] What would change for the stochastic MDP in the Value Iteration code from
Part A?\n",
882 "# write your code here"
883 ]
884 },
885 {
886 "cell_type": "markdown",
887 "id": "dda774d8",
888 "metadata": {
889 "id": "dda774d8"
890 },
891 "source": [
892 "### B1.3 Compare PI and VI in terms of convergence (average number of iteration,
time required for each iteration). Is the policy obtained by both same for SCP
environment?\n"
893 ]
894 },
895 {
896 "cell_type": "code",
897 "execution_count": null,
898 "id": "5c6f9de8",
899 "metadata": {
900 "id": "5c6f9de8"
901 },
902 "outputs": [],
903 "source": [
904 "# write your code for comparison here"
905 ]
906 },
907 {
908 "cell_type": "markdown",
909 "id": "5a61e7a8",
910 "metadata": {
911 "id": "5a61e7a8"
912 },
913 "source": [
914 "Write your comments compairing convergence and policies here."
915 ]
916 }
917 ],
918 "metadata": {
919 "colab": {
920 "provenance": []
921 },
922 "kernelspec": {
923 "display_name": "Python 3 (ipykernel)",
924 "language": "python",
925 "name": "python3"
926 },
927 "language_info": {
928 "codemirror_mode": {
929 "name": "ipython",
930 "version": 3
931 },
932 "file_extension": ".py",
933 "mimetype": "text/x-python",
934 "name": "python",
935 "nbconvert_exporter": "python",
936 "pygments_lexer": "ipython3",
937 "version": "3.9.12"
938 },
939 "vscode": {
940 "interpreter": {
941 "hash": "f9f85f796d01129d0dd105a088854619f454435301f6ffec2fea96ecbd9be4ac"
942 }
943 }
944 },
945 "nbformat": 4,
946 "nbformat_minor": 5
947 }
948

Essential n8n Playbook
From Everand
Essential n8n Playbook
Leandro Calado
No ratings yet
Stat 151 - Final Review
No ratings yet
Stat 151 - Final Review
15 pages
All
No ratings yet
All
10 pages
Donee
No ratings yet
Donee
61 pages
AIML PROGRAMS
No ratings yet
AIML PROGRAMS
24 pages
Ai Practical Journal
No ratings yet
Ai Practical Journal
24 pages
AI Artificial Intelligence
No ratings yet
AI Artificial Intelligence
28 pages
Ai Lab
No ratings yet
Ai Lab
26 pages
Ass1 Merged Merged
No ratings yet
Ass1 Merged Merged
15 pages
Lab Manual AI Lab VI Sem
No ratings yet
Lab Manual AI Lab VI Sem
34 pages
Stats_Practicals.ipynb (1)
No ratings yet
Stats_Practicals.ipynb (1)
9 pages
Ai Lab Manual Artificial Intelligence Lab Using Python (LC-CSE-326G)
No ratings yet
Ai Lab Manual Artificial Intelligence Lab Using Python (LC-CSE-326G)
29 pages
P03 A Star Algorithm 35 Anushka Shetty
No ratings yet
P03 A Star Algorithm 35 Anushka Shetty
23 pages
ARTIFICIAL INTELLIGENCE Lab
No ratings yet
ARTIFICIAL INTELLIGENCE Lab
8 pages
rldl
No ratings yet
rldl
23 pages
Exp 7 Text Sequence Generator LSTM
No ratings yet
Exp 7 Text Sequence Generator LSTM
12 pages
Artificial Intelligence Lab File
No ratings yet
Artificial Intelligence Lab File
16 pages
21BAI10063 MonteCarloLab
No ratings yet
21BAI10063 MonteCarloLab
18 pages
Lab Manual
No ratings yet
Lab Manual
20 pages
Ai Outputs
No ratings yet
Ai Outputs
12 pages
AIML Final Programs
No ratings yet
AIML Final Programs
8 pages
Artificial Intelligence Lab
No ratings yet
Artificial Intelligence Lab
11 pages
ML - 6 - Jupyter Notebook
No ratings yet
ML - 6 - Jupyter Notebook
5 pages
LAB06 Ai
No ratings yet
LAB06 Ai
16 pages
AD 304 Artificial Intelligence Lab Manual
No ratings yet
AD 304 Artificial Intelligence Lab Manual
33 pages
R22-AI Lab Manual
No ratings yet
R22-AI Lab Manual
24 pages
GNN 01 Intro
No ratings yet
GNN 01 Intro
8 pages
COMP1001 LAB5.ipynb
No ratings yet
COMP1001 LAB5.ipynb
4 pages
Cognitive Science Manual
No ratings yet
Cognitive Science Manual
17 pages
Python Lab Manual Detail
No ratings yet
Python Lab Manual Detail
49 pages
ai lab final
No ratings yet
ai lab final
27 pages
Artificial intelligence lab manual (1)
No ratings yet
Artificial intelligence lab manual (1)
35 pages
Exp No 1: Implementation of Toy Problems (Tic Tac Toe) : 8-Puzzle)
No ratings yet
Exp No 1: Implementation of Toy Problems (Tic Tac Toe) : 8-Puzzle)
12 pages
FA16-BCS-178 (AI ASSIGNMENT#2 8puzzle)
No ratings yet
FA16-BCS-178 (AI ASSIGNMENT#2 8puzzle)
6 pages
03_nonlin_poisson_pinns.ipynb
No ratings yet
03_nonlin_poisson_pinns.ipynb
59 pages
AD3511-DEEP LEARNING LAB MANUAL Revised
No ratings yet
AD3511-DEEP LEARNING LAB MANUAL Revised
72 pages
AI_LAB_15-Oct-2024_1
No ratings yet
AI_LAB_15-Oct-2024_1
213 pages
FINALailabfile
No ratings yet
FINALailabfile
26 pages
100_Numpy_exercises.ipynb
No ratings yet
100_Numpy_exercises.ipynb
30 pages
Lab Programs
No ratings yet
Lab Programs
14 pages
Vinz
No ratings yet
Vinz
6 pages
CISC 504 Assignment 6.ipynb
No ratings yet
CISC 504 Assignment 6.ipynb
2 pages
AI Journal TIT2324007
No ratings yet
AI Journal TIT2324007
27 pages
Safe_and_Sound_Protocol
No ratings yet
Safe_and_Sound_Protocol
42 pages
Dadi
No ratings yet
Dadi
5 pages
Amity University, Noida Aset (Cse) Batch: 2020-2024: Course Code: CSE401 Course: Artificial Intelligence
No ratings yet
Amity University, Noida Aset (Cse) Batch: 2020-2024: Course Code: CSE401 Course: Artificial Intelligence
31 pages
Lab Manual AI Lab VI Sem
No ratings yet
Lab Manual AI Lab VI Sem
50 pages
AI Practical File
No ratings yet
AI Practical File
36 pages
Ai Final Merge
No ratings yet
Ai Final Merge
11 pages
Lab Programs
No ratings yet
Lab Programs
16 pages
Practical
No ratings yet
Practical
6 pages
AI Dynamic
No ratings yet
AI Dynamic
9 pages
Text 1
No ratings yet
Text 1
5 pages
13 Object Oriented Programming - Python Solutions 1.5 Documentation
No ratings yet
13 Object Oriented Programming - Python Solutions 1.5 Documentation
5 pages
AI Experiment Part 2
No ratings yet
AI Experiment Part 2
13 pages
Ad3511 Deep Learning Lab Manual
No ratings yet
Ad3511 Deep Learning Lab Manual
80 pages
AD3511 Deep Learning Lab Manual
No ratings yet
AD3511 Deep Learning Lab Manual
54 pages
02_poisson_pinns.ipynb
No ratings yet
02_poisson_pinns.ipynb
60 pages
Ass1 Merged Merged
No ratings yet
Ass1 Merged Merged
19 pages
How to a Developers Guide to 4k: Developer edition, #3
From Everand
How to a Developers Guide to 4k: Developer edition, #3
Xinc Cyberwizard
No ratings yet
Computer Engineering Laboratory Solution Primer
From Everand
Computer Engineering Laboratory Solution Primer
Karan Bhandari
No ratings yet
Circuits and Signals I Exam Question Paper
No ratings yet
Circuits and Signals I Exam Question Paper
8 pages
Cutting Performance of Glass-Vinyl Ester Composite by Abrasive Water Jet
No ratings yet
Cutting Performance of Glass-Vinyl Ester Composite by Abrasive Water Jet
34 pages
4.2 Bonding, Structure and The Properties of Matter
No ratings yet
4.2 Bonding, Structure and The Properties of Matter
6 pages
08 Class Algebra Assg
No ratings yet
08 Class Algebra Assg
7 pages
COOLING
No ratings yet
COOLING
84 pages
Terrain Rainwater Brochure
No ratings yet
Terrain Rainwater Brochure
56 pages
CoE Screening Test Pattern & Syllabus - Medical Wing
100% (4)
CoE Screening Test Pattern & Syllabus - Medical Wing
2 pages
Introduktion Til DCS Systemer Erik Fjordside Insatech
No ratings yet
Introduktion Til DCS Systemer Erik Fjordside Insatech
63 pages
Dom Tutorial
No ratings yet
Dom Tutorial
7 pages
Deflection of Beams
No ratings yet
Deflection of Beams
5 pages
Case Study
No ratings yet
Case Study
13 pages
I614e V109 N3000-PLC-SpecialChannels
No ratings yet
I614e V109 N3000-PLC-SpecialChannels
7 pages
ملخص خصائص مكمنية
No ratings yet
ملخص خصائص مكمنية
5 pages
Density
No ratings yet
Density
18 pages
Manual Contactores
No ratings yet
Manual Contactores
4 pages
Introduction To Signal Processing: Professor Mike Brennan
No ratings yet
Introduction To Signal Processing: Professor Mike Brennan
40 pages
04 - 05-AI-Knowledge and Reasoning
No ratings yet
04 - 05-AI-Knowledge and Reasoning
61 pages
WPF LinearGradientBrush
No ratings yet
WPF LinearGradientBrush
6 pages
Keys, Splines, AND Couplings: Engr. Joseph Kimuel D. Caguete, Me
No ratings yet
Keys, Splines, AND Couplings: Engr. Joseph Kimuel D. Caguete, Me
39 pages
Exit exam MSE 1 and 2
No ratings yet
Exit exam MSE 1 and 2
8 pages
Learn Revit Parameters
No ratings yet
Learn Revit Parameters
16 pages
1 s2.0 S245232161930530X Main
No ratings yet
1 s2.0 S245232161930530X Main
10 pages
Cost of Production Report
No ratings yet
Cost of Production Report
7 pages
Digi Error Codes
No ratings yet
Digi Error Codes
3 pages
Operating and Service Manual Controlled Atmosphere: Star Cool Refrigeration Unit Model SCI - XX - X - CA
No ratings yet
Operating and Service Manual Controlled Atmosphere: Star Cool Refrigeration Unit Model SCI - XX - X - CA
50 pages
Human Activity Recognition
No ratings yet
Human Activity Recognition
6 pages
Data Multi Axial
No ratings yet
Data Multi Axial
2 pages
A106 Revision Slides (16-Feb-2024)
No ratings yet
A106 Revision Slides (16-Feb-2024)
20 pages
Single Phase Grid Tie Inverter User Manual
100% (1)
Single Phase Grid Tie Inverter User Manual
17 pages

Script

Uploaded by

Script

Uploaded by

1 {

You might also like