Example Questions For The Exam
Example Questions For The Exam
tion block)
1. Look at the Outlook - Summation slides for guidance.
This part of the exam will cover a wide range of subjects will smaller questions.
See what topics we had and what is covered below. Other things will be covered more in this
half.
Page 1 of 6
2 State Action Transition Graph
2. Draw the State-Action transition graph given in the table below. It has three states: {s1 , s2 }
and three actions: {a1 , a2 , a3 }.
(s, a) \ s s1 s2
(s1 , a1 ) 0.5 0.5
(s1 , a2 ) 0.3 0.7
(s1 , a3 ) 0.2 0.8
(s2 , a1 ) 0 1
(s2 , a2 ) 0.1 0.9
(s2 , a3 ) 1 0
3. Starting in s1 , what is the probability of ending up in s1 and s2 if the action a1 is taken twice?
Page 2 of 6
3 Heuristics
4. Perform A* step by step. Fill in the values for c (cost to arrive here) and f (total estimated cost)
alongside the heap.
c= c=
h = 120 h = 80
f= f=
20 8 15 12
c= c= c=
h = 130 h = 100 h = 40
f= f= f=
50 10 40
80
Start c=
90
h = 110 c=
f= h = Target
f=
85
30
70 25
c= c=
h = 160 h = 50
f= 60 f=
Start
5. Describe some aspect of heuristic algorithm or compare some algorithm or something similar.
Page 3 of 6
4 Reinforcement Learning
0.1 a1
s1
0.5
1 a2
0.9
0.2
a0 a2
0.8
0.3 0.5
s2
0.7
a1
1
a0
7. Give two different valid four step experiences for the state action transition graph above. (An
experience is a sequence of states and actions resulting in the next state, followed by an
action. You should take three actions to arrive at four steps.)
Take all three actions in each of your experiences.
Page 4 of 6
5 Game Theory
10. In this two player game both players have the option to cooperate and cheat (just as in the
lecture example and most ”basic” games of this kind). This gives the following outcome pairs.
Assume both players receive the same outcomes:
• Cheat/Cheat: 0/0
• Cheat/Cooperate: 0/-3
• (Cooperate/Cheat: -3/0)
• Cooperate/Cooperate: 5/5
Player 1 plays cheat three times, followed by two cooperate. Player 2 cheats twice and coop-
erates for the rest of the game. What is the total reward for each player?
11. How could you maximise your own reward if you were to play versus Player 2 and knew the
actions the player would choose?
How would you play to maximise the overall reward (reward of both players added up)?
Page 5 of 6
6 Population Generation
Suppose we have a genes consisting of int values between 0 and 3, matching to {west,east,south,north}.
And each individual’s genome consists of 4 genes. We have four individuals:
IndA: 0231, IndB: 2232, IndC: 3112 and IndD 0033
13. (6 points) Perform one point crossover between IndA and InB as well as IndC and IndD at
crossover point 2 (inclusive, counting starts at 1).
Page 6 of 6