Exercises
Exercises
COMPUTER ENGINEERING
ARTIFICIAL INTELLIGENCE
2nd semester
EXERCISES
2. Consider that we want to build a programming agent, able to write and debug computer programs.
Characterize the environment where this agent “lives”: is it fully observable? Deterministic?
Episodic? Static? Continuous? Please, justify your answer.
3. Consider a taxist agent. Please, characterize, as we have seen in the classes, the environment
where it lives.
4. What are the fundamental differences between a reactive agent and a goal-based agent?
5. Consider the states space below, where the values next to the arcs correspond to the cost of going
from one state to another, the values next to the states correspond to the value of the heuristic and
states I and K correspond to goal states:
1 / 29
Artificial Intelligence- Computer Engineering Exercises
Answers
A 5 2
BC B 4 C 8
CDE
9 4 9 10
DEFG
EFG D 10 E 1 F 13 G 2
FGHI
7 7 4 3
GHI
HIJK H 3 I 0 J 8 K 0
IJK
A 5 2
BC
B 4 C 8
DEC
EC 9 4 9 10
HIC
D 10 E 1 F 13 G 2
IC
7 7 4 3
H 3 I 0 J 8 K 0
Limit = 0
A
Limit = 1
A
A 10
BC
C 5 2
Limit = 2
B 4 C 8
A
BC 9 4 9 10
DEC
D 10 E 1 F 13 G 2
EC
7 7 4 3
C
FG H 3 I 0 J 8 K 0
G
Limit = 3
A
BC
2 / 29
Artificial Intelligence- Computer Engineering Exercises
DEC
EC
HIC
IC
A10 5 2
B4, C8
B 4 C 8
E1, C8, D10
I0, H3, C8, D10 9 4 9 10
D 10 E 1 F 13 G 2
7 7 4 3
H 3 I 0 J 8 K 0
1e) A* search: f = g + h A 10
5 2
A0 + 10 = 10
B5 + 4 = 9, C2 + 8 = 10 B 4 C 8
C2 + 8 = 10, E9 + 1 = 10, D14 + 10 = 24 9 4 9 10
E9 + 1 = 10, G12 + 2 = 14, D14 + 10 = 24, F11 + 13 = 24
G12 + 2 = 14, I16 + 0 = 16, H16 + 3 = 19, D14 + 10 = 24, F11 + 13 = 24 D 10 E 1 F 13 G 2
K15 + 0 = 15, I16 + 0 = 16, H16 + 3 = 19, D14 + 10 = 24, F11 + 13 = 24, J16 + 8 = 24 7 7 4 3
H 3 I 0 J 8 K 0
Is the heuristic admissible? Yes, because h(n) <= h*(n), where h*(n) is the smallest real cost f going
from n to the goal state
3 / 29
Artificial Intelligence- Computer Engineering Exercises
6. Consider the states space represented in the figure below, where the costs of the operators is
indicated next to the arcs and the values of the heuristic are indicated next to the states. Consider also
that we want to go from state I to one of the O states.
h=35 h=60
5
5 F C
h=40 75
3 10
h=30 B h=0
2 h=45 35
A 15 15 D O1
I h=10
1 0
10
h=35 h=0
E O2
6
Determine the solution found by the following search algorithms, showing the successive states of the
Frontier and indicating, for each node, the value of f (used to keep the frontier ordered):
a) Uniform-cost search.
b) Greedy search.
c) A* search.
d) Is the heuristic admissible?
Answers
a) Uniform-cost search: f = g
h=35 h=60
5
5 F C
h=40 75
3 10
h=30 B h=0
2 h=45 35
A 15 15 D O1
I h=10
1 0
10
h=35 h=0
E O2 Search tree
6
I0
B2, A15, D15
F5, A15, D15 Red states are states
A10, C10, D15 that are removed from
C10, D15, E20 the frontier
D15, E20, O185
E20, O150
O226, O150
Solution = I, B, F, A, E, O2
4 / 29
Artificial Intelligence- Computer Engineering Exercises
b) Greedy search: f = h
h=35 h=60
5
5 F C
h=40 75
3 10
h=30 B h=0
2 h=45 35
A 15 15 D O1
I h=10
1 0
10
h=35 h=0
E O2
6
I45
D10, A30, B40
O10, A30, B40
Solution: I D O1
c) A*: f = g + h
h=35 h=60
5
5 F C
h=40 75
3 10
h=30 B h=0
2 h=45 35
A 15 15 D O1
I h=10
1 0
10
h=35 h=0
E O2
6
I0+45=45
D15 + 10 = 25, B2 + 40 = 42, A15 + 30 = 45
B2 + 40 = 42, A15 + 30 = 45, O150 + 0 = 50
F5+35=40 A15 + 30 = 45, O150 + 0 = 50
A10 + 30 = 40, O150 + 0 = 50, C10 + 60 = 70
O150 + 0 = 50, E20 + 35 = 55 C10 + 60 = 70
Solution: I D O1
d) The heuristic is not admissible because the shortest path between E and O2 has cost 6 and the
heuristic value of E is 35, that is, there is at least one case for which h(n) > h*(n), where h*(n) is the
minimum cost of going from n to the goal state. Furthermore, you may have noticed that A* didn’t
find the optimum solution. This can only happen with a non-admissible heuristic.
Note: in the resolution of exercises with A* search, usually, we do not assume the heuristic to be
consistent unless it is explicitly stated that it is. This means that, when we add a state to the frontier,
if it is already in the frontier, we need to verify if the one already in the frontier has a smaller f value
than the one that we want to add. If this is the case, the new one is not added; if the one already in
the frontier has an f value larger than the new one, we must remove the one that is already in the
frontier and add the new one.
This procedure is similar to what we do in uniform-cost and greedy search algorithms. The only
difference is how we compute the value of f.
5 / 29
Artificial Intelligence- Computer Engineering Exercises
7. Consider the states space below, where the values next to the arcs correspond to the cost f going
from one state to another, A is the initial state, F is the goal state and the heuristic h(n) consists in the
minimum steps of going from n to state f:
For each of the following algorithms, show the evolution of the frontier when these algorithms are used
to solve the problem above. In exercises c), d) and e) show, for each state, the value of f, used to keep
the frontier ordered. Consider that, everything else being equal, the states are expanded by
alphabetical order.
a) Breadth first search:
A
BCE
CE
ED
DF
F
6 / 29
Artificial Intelligence- Computer Engineering Exercises
c) Uniform-cost search:
A0
B2, C4, E5
E3, C4
C4, F5, D6
F5, D6
d) Greedy search
A2
E1, B2, C2
F0, D1, B2, C2
e) A* search?
A0+2
B2+2=4, C4+2=6, E5+1=6
E3+1=4, C4+2=6
F5+0=5, C4+2=6, D6+1=7
8. Consider the vacuuming agent problem, described in the book “AI – A Modern Approach”, whose
state space is shown in the following figure. Initially, both squares are dirty and the agent is in the left
square. The agent can perform three different actions: the L action allows it to move left if possible; the
R action allows him to shift right if possible; action S allows the agent to suck up the garbage that is in
the square where it is located. The agent's goal is to suck up all the existing garbage in its small world
of two squares.
7 / 29
Artificial Intelligence- Computer Engineering Exercises
Considering that the agent prefers to perform the L action first, then the R action and only then the S
action and that it never chooses to expand an already expanded state, draw the search tree that would
be generated, referring to the order in which the nodes would be expanded if the algorithm used is:
a) The breadth-first search.
8 / 29
Artificial Intelligence- Computer Engineering Exercises
b) The uniform-cost search. Consider that the cost of actions L and R is 2 and that the cost of action S
is 1.
9 / 29
Artificial Intelligence- Computer Engineering Exercises
c) The greedy search. Consider that the agent uses the number of dirty squares as heuristic.
9. Consider the problem of moving a knight on a 3x4 chess board, starting at position S and ending at
position G, as illustrated in the figure below. The letter in each position of the board represents the
name of that position and the value in subscript represents the heuristic value (h(n)) for the
corresponding state. All transitions between states cost 1. Note: in chess a knight can move two places
in one direction and then one place in another or vice versa. For example, a knight placed in position S
can move to position A or to position B.
10 / 29
Artificial Intelligence- Computer Engineering Exercises
In what order are the states expanded and what solution is found if
b) the depth-first search algorithm is used?
S
AB Solution: S A C H G
CB
HB
EGB
GB
c) the uniform-cost search is used?
S0
A1 B1
B1 C2 Solution: S A C H G
C2 D2 E2
D2 E2 H3
E2 F3 H3 I3
F3 H3 I3
H3 I3 J4
I3 G4 J4
G4 J4
d) the A* is used? Is the algorithm guaranteed to find the optimal solution? Justify the answer.
S0+3
A1+2=3 B1+2=3 Solution: S A C H G
B1+2=3 C2+1=3
C2+1=3 D2+1=3 E2+2=4
D2+1=3 E2+2=4 H3+1=4
E2+2=4 H3+1=4 I3+1=4 F3+3=6
H3+1=4 I3+1=4 F3+3=6
G4+0=4 I3+1=4 F3+3=6
11 / 29
Artificial Intelligence- Computer Engineering Exercises
In exercises b) to d) show the evolution of the frontier. In exercises c) and d) always show, for each
state, the value of f. Consider that, in case of a tie, the states are explored in alphabetical order.
10. The world of blocks has been an important study platform for the field of Artificial Intelligence. A set
of blocks is placed on top of a table, some of which may be overlapping. The agent's objective is to
place the blocks according to a certain objective configuration. For example, starting from the situation
illustrated on the left in the following figure, it may be possible to reach the situation on the right.
A B
C B A
a) Assuming that the agent uses the operator transfer(X, Y), which transfers block X onto Y (Y can be
one of the blocks or the table), draw the state space of this problem for three blocks.
b) What heuristic would you use if you intended to use an informed algorithm? Is the heuristic
admissible? Why?
11. Suppose h1 and h2 are two admissible heuristics for a given problem. Is the heuristic
max(h1(state), h2(state)), where max(x, y) returns the largest of its input arguments, admissible? Justify
your answer.
12. Why is it that the first depth-first search algorithm that you implemented in the classes does not use
the set of explored nodes?
13. Does the fact that in the implementation of the depth-first search algorithm we first check for the
occurrence of cycles prevents a state from being expanded more than once during the search process?
Justify.
14. Consider a problem where all available operators have the same cost. Can the solution obtained
with the breadth-first algorithm have a different cost than the solution obtained with the A* algorithm if it
uses an admissible heuristic? What if the heuristic is not admissible? What if the heuristic is admissible
but the cost differs from operator to operator? Justify your answers.
15. No âmbito do algoritmo trepa-colinas, o que há de comum entre um máximo local, um planalto e
uma aresta horizontal? Quais as diferenças entre estas três situações? Que medidas se podem
tomar quando o algoritmo encontra este problema?
12 / 29
Artificial Intelligence- Computer Engineering Exercises
16. In the context of the hill-climbing algorithm, what is common between a local maximum, a plateau
and a horizontal edge? What are the differences between these three situations? What steps can be
taken when the algorithm encounters this problem?
17. “Genetic algorithms are a method of solving problems based on a parallel and stochastic search
process”. Comment this statement.
18. State the main differences between recombination and mutation genetic operators. To what extent
are both necessary?
19. Explain the importance of using each of the following aspects in a genetic algorithm:
a) Selection method.
b) Recombination operator.
c) Mutation operator.
20. Suppose you want to manufacture a given product (for example paper pulp) and that different
components are used in its manufacture (water, cellulose). During manufacture, samples are taken
from the product and its quality is analysed, obviously depending on some variables (e.g. texture
and impurities). The objective of the factory is to produce a product of the highest quality by
adapting the manufacturing conditions (amount of water, amount of cellulose, pressure,
temperature, etc.).
a) Indicate how you would represent individuals if you used a genetic algorithm to solve the
problem.
b) What could be the shape of the fitness function?
21. The Boolean Satisfiability Problem, also known as SAT, is a classic optimization problem and it was
the first problem to be proven to belong to the class of NP-Complete problems. Given a logical
formula such as, for example, (A ν B) Λ (A ᴠ C) Λ (¬C ᴠ D), the problem consists of finding a
combination of values of the input variables (A, B, C and D, in the given example) that makes the
formula true. Describe how you would use a genetic algorithm to find solutions to this problem. More
specifically, indicate how you would represent the individuals, how you would evaluate them and
which genetic operators you would use. Note: all propositional logic formulas can be represented in
the form of a conjunction of clauses, where each clause is a disjunction of literals, as in the example
above. When represented in this way, the formula is said to be in conjunctive normal form.
22. A Magic Square is an n x n matrix filled with numbers from 1 to n 2 in which no number repeats and
in which the sum of the numbers in each row, column and diagonal is the same. The figure below
shows an example where the constant or magic sum is 15. Describe how you would use a genetic
algorithm to find solutions to this problem. More concretely, indicate how you would represent
individuals and how you would evaluate them. Can you use the classic recombination and mutation
operators that you learned in class? Justify your answer.
2 7 6
9 5 1
4 3 8
23. Consider that you want to use a genetic algorithm to generate symmetric binary sequences of size
n. For example, sequences 001100 and 010010 are symmetric, but sequences 011100 and 011010
are not. Formulate an evaluation function for this problem.
24. The maximum stable set problem is an optimisation problem that consists of finding the largest
subset of vertices in an undirected graph between which there are no edges. Describe how you
13 / 29
Artificial Intelligence- Computer Engineering Exercises
would solve this problem using genetic algorithms. More specifically, indicate how you would
represent the individuals, how you would evaluate them and which genetic operators you would use.
25. Consider that you want to define the weights of a neural network to control several robots in a grid
environment so that they roam the space without colliding with each other. You can make the
following assumptions: the agents are able to perceive whether the four adjacent squares (N, S, E,
O) are occupied or not; the agents can move to the four adjacent squares (N, S, E, O); the structure
of the neural network is predefined, i.e. you only need to define the weights of the network. Any
other aspects of the problem not mentioned in this statement can be defined by you. Describe how
you would use a genetic algorithm to evolve a controller with the requirements listed above. More
specifically, indicate how you would represent the individuals, how you would evaluate them and
which genetic operators you would use.
26. Consider that you need to solve a system of non-linear equations given by:
𝑥2 + 𝑦2 − 𝑧 = 0
𝑥 − 𝑦3 + 𝑧2 = 0
𝑥 + 𝑦𝑧 = 0
{ 𝑥 + 𝑧2 = 0
Describe how you would solve a problem like this using genetic algorithms. More specifically, indicate
how you would represent the individuals, how you would evaluate them and which genetic operators
you would use.
Define the values of w1, w2 and θ, such that the perceptron is able to simulate the logic OR function.
Justify your answer.
Let us assume a learning rate 𝛼 = 1 and that all weights have initial value of 0 (we could use other
values since the exercise text doesn’t specifies the values that should be used).
Note: we are going to use the sign function because the original perceptron used this function. But you
can also try solving the exercise with the step function.
14 / 29
Artificial Intelligence- Computer Engineering Exercises
Sample [-1 1]
f(-1 * 0 + 1 * 0 + -1 * 0) = f(0) = -1 (Incorrect)
w1 = 0 + 1 * ( 1 - - 1) * -1 = -2
w2 = 0 + 1 * ( 1 - - 1) * 1 = 2
b = 0 + 1 * ( 1 - - 1) * -1 = -2
Sample [1 -1]
f(1 * -2 + -1 * 2 + -1 * -2) = f(-2) = -1 (Incorrect)
w1 = -2 + 1 * ( 1 - - 1) * 1 = 0
w2 = 2 + 1 * ( 1 - - 1) * -1 = 0
ɵ = -2 + 1 * ( 1 - - 1) * -1 = -4
Sample [1 1]
f(1 * 0 + 1 * 0 + -1 * -4) = f(4) = 1 (Correct)
As there was a change in the weights in this presentation of the training examples, we have to do a new
presentation.
Sample [-1 1]
f(-1 * 2 + 1 * 2 + -1 * -2) = f(2) = 1 (Correct)
Sample [1 -1]
f(1 * 2 + -1 * 2 + -1 * -2) = f(2) = 1 (Correct)
Sample [1 1]
f(1 * 2 + 1 * 2 + -1 * -2) = f(6) = 1 (Correct)
15 / 29
Artificial Intelligence- Computer Engineering Exercises
As there was a change in the weights in this presentation of the training examples, we have to do a new
presentation.
Note that the input patterns [-1 -1], [-1 1] and [1, -1] already get the correct result with the current
weights (see 2nd presentation of the training examples), so it's not worth it check again if the result
would be correct for these examples (because it would be). Thus, we can conclude the training process
and the final weights are [w1, w2, b] = [2, 2, -2]
30. Given the perceptron with weights (w1, w2, θ)T = (1, 1, 2)T, please draw in ℝ2 the corresponding line,
that divides the inputs space in two, and dash the area where the perceptron outputs 1.
(w1, w2, b)T = (1, 1, 2)T corresponds to line x1+ x2 – 2 = 0 (x1w1 + x2w2 – b = 0)
To draw the line, we replace, in turn, the values of x1 and x2 by a concrete value to obtain two points on
the line:
(2, 2)
x1 + x2 - 2 = 0
In that case, (w1, w2, b)T = (-1, -1, -2)T corresponds to line -x1-x2 + 2 = 0 (x1w1 + x2w2 – b = 0)
16 / 29
Artificial Intelligence- Computer Engineering Exercises
That is, in this case the perceptron answers -1 in the dashed area.
a) Show that all the perceptrons that define the line -x1 - 4x2 + 2 = 0 are incapable of separating
correctly the examples of the two classes.
This exercise can be solved in two ways: 1) we calculate the perceptron output for each of the
examples until we verify that there is one for which the perceptron gives a result different from
what is expected; 2) we draw the line defined by the perceptron and then verify that the line
cannot correctly separate the points whose result should be 1 from the points whose result
should be -1.
b) Train the perceptron that initially defines the line -x1 - 4x2 + 2 = 0 until it is able to correctly
classify all the examples. Use a learning rate of 0.5.
32. Given a perceptron with weights w1 = 2, w2 = 1, θ = 1, which of the following perceptrons have the
same hiperplane (a line, in this case) as this perceptron and which ones represent exactly the same
classification of the inputs, that is, whish output the same values for each input? Please, justify in the
case(s) having the same hyperplane but with a different output, if they exist.
First of all, we must realize that if two hyperplanes (lines in this case) are not equal, then the
classification will not be the same. If they are equal, they may or may not be, as we will see.
The first row of the table corresponds to the perceptron with the line x1+ 0.5x2 – 0.5 = 0, which
corresponds to the original line (the only difference is that we divide all the coefficients by 2, a positive
17 / 29
Artificial Intelligence- Computer Engineering Exercises
constant). As the coefficients have the same sign as the original line, this perceptron gives exactly the
same results as the original. It wasn't necessary, but if you want to check it, you can draw the line (as
you did in exercise 7) and calculate the neuron output for a point that doesn't belong to the line.
The neuron corresponding to the second line of the table also has the same line and the same
classification because it corresponds to multiplying the coefficients of the original line by a positive
constant (in this case, by 100). You can check as suggested above.
The neuron corresponding to the third line of the table does not have the same line and, therefore, does
not have the same classification. This is because the coefficients are not multiples of the original line.
The neuron corresponding to the fourth line of the table has the same line as the original neuron but the
classification is different because it corresponds to multiplying the coefficients of the original line by a
negative constant (in this case, by -1). You can check as suggested above.
wh1x1 -0.1
wh1x2 +0.2
x1 h1 o1 wh2x1 +0.2
wh2x2 -0.1
wo1h1 +0.2
wo1h2 -0.1
x2 h2 o2 wo2h1 +0.2
wo2h2 -0.1
a) Use the sigmoid function to compute the activation values of each unit (neuron), when the input
vector is [0, 1].
h1 = f(0 *(-0.1) + 1 * 0.2) = f(0.2) = 1/(1+ e^(-0.2)) = 0.5498
h2 = f(0 *0.2 + 1 * (-0.1)) = f(-0.1) = 1/(1+ e^0.1) = 0.4750
o1 = f(0.5498*0.2 + 0.4750 * (-0.1)) = f(0.0624) = 1/(1 + e^(-0.0624)) = 0.5156
o2 = f(0.5498*0.2 + 0.4750 * (-0.1)) = f(0.0624) = 1/(1 + e^(-0.0624)) = 0.5156
b) Compute the delta errors for each output and hidden unit considering that the desired output is
[1, 1].
delta_o1 = (d – a) * a * (1 – a) = (1 – 0.5156) * 0.5156 * (1 – 0.5156) = 0.12
delta_o2 = (d – a) * a * (1 – a) = (1 – 0.5156) * 0.5156 * (1 – 0.5156) = 0.12
delta_h1 = a * (1 – a) *S = 0.5498 * (1 - 0.5498) * (0.2 * 0.12 + 0.2 * 0.12) = 0.012
delta_h2 = a * (1 – a) *S = 0. 4750* (1 - 0. 4750) * ((-0.1) * 0.12 + (-0.1) * 0.12) = -0.006
c) Using a learning rate of = 0.25, compute the new weights’ values.
w_x1h1 = -0.1 + 0.25 * 0.012 * 0 = -0.1
w_x2h1 = 0.2 + 0.25 * 0.012 * 1 = 0.203
w_x1h2 = 0.2 + 0.25 * (-0.006) * 0 = 0.2
w_h2o1 = -0.1 + 0.25 * 0.12 * 0.4750 = …
…
18 / 29
Artificial Intelligence- Computer Engineering Exercises
34. Consider the neural network shown in the figure below, in which all the units use the sigmoid
function as their activation function.
a) Compute the output of the network for the input vector [𝑥1 , 𝑥2 ] = [1, 1].
b) Using the Backpropagation algorithm, compute the delta errors (𝛿) for the network units assuming
that the desired output is 0.
c) Change the weights of the network using the expression for changing the weights derived in the
classes for the gradient descent algorithm. Use a learning rate of 0.5.
35. Given the image and the filter represented in the gigure below, compute the output of the
convolutional operation assuming stride 1 is used.
36. Consider the following definition of a convolutional neural network. Remember that the first
parameter of the layers.Conv2D class indicates the number of feature maps and the second
parameter indicates the size of the filters. By default, the stride value used is 1. In the
layers.MaxPooling2D class, the parameter used in the code below indicates the size of the
pooling window. In this class, the stride value used by default is 2. In the layers.Dense class, the
value of the first parameter corresponds to the number of units in the layer.
Calculate the size of the output of each layer of the neural network, as well as the number of
parameters (weights) as indicated below the network definition code.
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))
Values to calculate:
19 / 29
Artificial Intelligence- Computer Engineering Exercises
37. Explain why convolutional neural networks (CNN) are better suited to image classification tasks than
Multilayer Perceptron (MLP) networks.
38. Explain what the convolution operation consists of in a convolutional neural network.
39. Using the ID3 algorithm, build an optimal decision tree that allows you to correctly classify the
following data:
Weight
20 / 29
Artificial Intelligence- Computer Engineering Exercises
Height
tall {1-, 2+, 4+, 6-, 7-, 8+} [3+, 3-] I(3, 3) = 1
short {3-, 5-} [0+, 2-] I(0, 2) = 0
Antlers
Yes {2+, 3-, 4+, 5-, 8+} [3+, 2-] I(3, 2) = -3/5 * log2 3/5 - 2/5 * log2 2/5 = 0.971
No {1-, 6-, 7-} [0+, 3-] I(0, 3) = 0
The Color attribute has the highest information gain, so it is the attribute chosen to represent the root of
the tree:
Color
black white
brown grey
{2+, 6-, 8+} {3-, 4+}
? - - ?
Let us first analyse the left node, which has T = {2+, 6-, 8+}
E(T) = 0.918
Weight
light {} [0+, 0-] I(0, 0) = ?
middle {6-, 8+} [1+, 1-] I(1, 1) = 1
heavy {2+} [1+, 0-] I(1, 0) = 0
Height
tall {2+, 6-, 8+} [2+, 1-] I(2, 1) = 0.918
short {} [0+, 0-] I(0, 0) = ?
Antlers
Yes {2+, 8+} [2+, 0-] I(2, 0) = 0
No {6-} [0+, 1-] I(0, 1) = 0
21 / 29
Artificial Intelligence- Computer Engineering Exercises
Attribute Antlers has the highest information gain, so it is the attribute chosen to represent the node
under study:
Let us now analyse the right node, which has T = {3-, 4+}
E(T) = 1
Wight
light {3-} [0+, 1-] I(0, 1) = 0
middle {} [0+, 0-] I(0, 0) = ?
heavy {4+} [1+, 0-] I(1, 0) = 0
Height
tall {4+} [1+, 0-] I(1, 0) = 0
short {3-} [0+, 1-] I(0, 1) = 0
Antlers
Yes {3-, 4+} [1+, 1-] I(1, 1) = 1
No {0} [0+, 0-] I(0, 0) = ?
The Weight and Height attributes have the greatest information gain. As the Height attribute leads to a
smaller tree (because it has only two values, and the Weight attribute has three values; see also the
resolution note at the end), we chose this attribute. The final tree is as follows:
Color
black white
brown grey
{2+, 6-, 8+} {3-, 4+}
Antlers - - Height
+ - + -
Note: If we had chosen the Weight attribute instead of Height on the right node, we would have three
branches instead of two. The leaf for the “medium” branch would have to be labeled with the most
frequent class in the original training set (class -) since there are no training examples with Color =
white and Weight = medium. Note that we really had to resort to the most common class of the original
training set because the set of examples {3-, 4+} has as many positive as negative examples.
40. NASA intends to be able to distinguish between Martians (M) and Humans (H) based on the
following characteristics: Green {Y, N}, Legs {2, 3}, Height {(S)hort, (T)all}, Smelly {Y, N}. The
training set is shown below. Determine the decision tree generated by the ID3 algorithm for this
training set.
22 / 29
Artificial Intelligence- Computer Engineering Exercises
Green
Y {1+, 2+, 9-, 10+} [3+, 1-] I(3, 1) = -3/4 * log2 3/4 - 1/4 * log2 1/4 = 0.811
N {3-, 4-, 5-, 6+, 7+, 8-} [2+, 4-] I(2, 4) = -2/6 * log2 2/6 - 4/6 * log2 4/6 = 0.918
Legs
2 {3-, 4-, 5-, 7+, 8-, 9-, 10+} [2+, 5-] I(2, 5) = -2/7 * log2 2/7 - 5/7 * log2 5/7 = 0.863
3 {1+, 2+, 6+} [3+, 0-] I(3, 0) = 0
Height
T {1+, 2+, 4-, 5-, 8-, 10+} [3+, 3-] I(3, 3) = 1
S {3-, 6+, 7+, 9-} [2+, 2-] I(2, 2) = 1
Smelly
Y {5-, 6+, 7+, 8-} [2+, 2-] I(2, 2) = 1
N {1+, 2+, 3-, 4-, 9-, 10+} [3+, 3-] I(3, 3) = 1
The Legs attribute has the highest information gain, so it is the attribute chosen to represent the root of
the tree:
Legs
2 3
{3-, 4-, 5-, 7+, 8-, 9-, 10+}
? +
23 / 29
Artificial Intelligence- Computer Engineering Exercises
We now analyse the left node, which has T = {3-, 4-, 5-, 7+, 8-, 9-, 10+}
E(T) = 0.863
Green
Y {9-, 10+} [1+, 1-] I(1, 1) = 1
N {3-, 4-, 5-, 7+, 8-} [1+, 4-] I(1, 4) = -1/5 * log2 1/5 - 4/5 * log2 4/5 = 0.722
Height
T {4-, 5-, 8-, 10+} [1+, 3-] I(1, 3) = -1/4 * log2 1/4 - 3/4 * log2 3/4 = 0.811
S {3-, 7+, 9-} [1+, 2-] I(1, 2) = -1/3 * log2 1/3 - 2/3 * log2 2/3 = 0.918
Smelly
Y {5-, 7+, 8-} [1+, 2-] I(1, 2) = -1/3 * log2 1/3 - 2/3 * log2 2/3 = 0.918
N {3-, 4-, 9-, 10+} [1+, 3-] I(1, 3) = -1/4 * log2 1/4 - 3/4 * log2 3/4 = 0.811
The Green attribute has the highest information gain, so it is the chosen attribute to represent the node
under study:
Legs
2 3
{3-, 4-, 5-, 7+, 8-, 9-, 10+}
Green +
Y N
{9-, 10+} {3-, 4-, 5-, 7+, 8-}
? ?
Let us first analyse the left node, which has T = {9-, 10+}
E(T) = 1
Height
T {10+} [1+, 0-] I(1, 0) = 0
S {9-} [0+, 1-] I(0, 1) = 0
Smelly
Y {} [0+, 0-] I(0, 0) = ?
N {9-, 10+} [1+, 1-] I(1, 1) = 1
The Height attribute has the highest information gain, so it is the attribute chosen to represent the node
under study.
Let us now analyse the right node, which has T = {3-, 4-, 5-, 7+, 8-}
E(T) = 0.722
Height
T {4-, 5-, 8-} [0+, 3-] I(0, 3) = 0
S {3-, 7+} [1+, 1-] I(1, 1) = 1
Smelly
Y {5-, 7+, 8-} [1+, 2-] I(1, 2) = -1/3 * log2 1/3 - 2/3 * log2 2/3 = 0.918
N {3-, 4-} [0+, 2-] I(0, 2) = 0
The Height attribute has the highest information gain, so it is the attribute chosen to represent the node
under study.
Legs
2 3
{3-, 4-, 5-, 7+, 8-, 9-, 10+}
Green +
Y N
{9-, 10+} {3-, 4-, 5-, 7+, 8-}
Height Height
T S T S
{3-, 7+}
+ - - ?
Let's analyse the right node with T = {3-, 7+}. Although there is only one attribute left, let's calculate its
gain and check how it divides the examples so that we can define the final tree.
E(T) = 1
Smelly
Y {7+} [1+, 0-] I(1, 0) = 0
N {3-} [0+, 1-] I(0, 1) = 0
G(T, Smelly) = 1 - (1/2 * 0 + 1/2 * 0) = 1
Legs
2 3
25 / 29
Artificial Intelligence- Computer Engineering Exercises
Green +
Y N
{9-, 10+} {3-, 4-, 5-, 7+, 8-}
Height Height
T S T S
{3-, 7+}
+ - - Smelly
Y N
+ -
41. Consider the following set of training examples that describe the Reaction concept. Each example
describes the characteristics of a movie and a person's reaction to that movie. The characteristics
considered are: Category, Duration, Year and if it is in color or not.
a) Compute the Entropy of this training set relative to the Reaction concept.
b) Compute the Information Gain for the Category attribute.
What is the decision tree generated by the ID3 algorithm for this training set?
E(T) = I(2, 6) = –2/8 * log2 2/8 – 6/8 * log2 6/8 = -0.25 * (-2.0) - 0.75 * (-0.415) = 0.81125
Gender
f {3-, 6-, 7-, 8-} [0+, 4-] I(0, 5) = 0
m {1+, 2+, 4-, 5-} [2+, 2-] I(2, 2) = 1
Age<26
sim {1+, 2+, 3-, 7-} [3+, 3-] I(3, 3) = 1
não {4-, 5-, 6-, 8-} [0+, 4-] I(0, 4) = 0
Has car
sim {2+, 3-, 4-, 8-} [1+, 3-] I(1, 3) = -1/4 * log2 1/4 – 3/4 * log2 3/4 = 0.81125
não {1+, 5-, 6-, 7-} [1+, 3-] I(1, 3) = -1/4 * log2 1/4 – 3/4 * log2 3/4 = 0.81125
Attribute Gender and Age<26 have the highest information gains. In this exercise, we choose Gender to
represent the root of the tree:
Gender
f m
{1+, 2+, 4-, 5-}
+ ?
Let's analyse the left node, with T = {1+, 2+, 4-, 5-}
E(T) = 1
Age<26
sim {1+, 2+} [2+, 0-] I(2, 0) = 0
não {4-, 5-} [0+, 2-] I(0, 2) = 0
Has car
sim {2+, 4-} [1+, 1-] I(1, 1) = 1
não {1+, 5-} [1+, 1-] I(1, 1) = 1
The Age<26 attribute has the highest information gain, so it is the attribute chosen to represent the right
branch:
Gender
f m
+ Age<26
yes no
+ -
27 / 29
Artificial Intelligence- Computer Engineering Exercises
43. Use the ID3 algorithm to build a decision tree that models function A V B Λ C.
Sample A B C AVBΛC
1 F F F F
2 F F V F
3 F V F F
4 F V V V
5 V F F V
6 V F V V
7 V V F V
8 V V V V
44. Use the ID3 algorithm to build a decision tree that models function A Λ B V C Λ D.
45. The ID3 algorithm can be extended to handle numerical values. The idea is the following: Suppose
that in a given attribute X the supplied values were, for each example, 1, 4 and 9. Then, new boolean
attributes of type X<3 and X<7 are created in such a way that the space of possible values is divided in
two. Use this ID3 variant to calculate the decision tree from the examples:
Assume that the range of values for weight and height is [0..100].
First, we reconstruct the truth table according to the instructions in the statement:
46. Consider the space of instances defined in the rectangular space [0.0, 1.0] x [0.0, 1.0]. Each
instance in this space is represented by a pair of decimal numbers in the range [0.0, 1.0], rounded to
the nearest tenth. In this space, suppose you are given the following training set:
28 / 29
Artificial Intelligence- Computer Engineering Exercises
What decision tree would be obtained with the ID3 algorithm (see previous exercise) if each tree node
performs a test of the type 𝑥𝑖 ≥ 𝑧, where 𝑧 is a number in the interval [0.0, 1.0] with one decimal place?
29 / 29