0% found this document useful (0 votes)
215 views4 pages

EXAMPLE Machine Learning (C395) Exam Questions

The document contains sample questions and solutions for a machine learning exam. The questions cover topics like gradient descent, deriving the LMS training rule, calculating information gain, designing neural networks, and applying genetic algorithms to optimize a decision tree for classifying cars. The solutions provide mathematical derivations, diagrams, and pseudocode to thoroughly explain the concepts.

Uploaded by

Nithin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
215 views4 pages

EXAMPLE Machine Learning (C395) Exam Questions

The document contains sample questions and solutions for a machine learning exam. The questions cover topics like gradient descent, deriving the LMS training rule, calculating information gain, designing neural networks, and applying genetic algorithms to optimize a decision tree for classifying cars. The solutions provide mathematical derivations, diagrams, and pseudocode to thoroughly explain the concepts.

Uploaded by

Nithin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

EXAMPLE Machine Learning (C395) Exam Questions

(1) Question: Explain the principle of the gradient descent algorithm. Accompany
your explanation with a diagram. Explain the use of all the terms and constants
that you introduce and comment on the range of values that they can take.

Solution: Training can be posed as an optimization problem, in which the goal


is to optimize a function (usually to minimize a cost function E) with respect to
a number of free variables, usually weights wi. The gradient decent algorithm
begins from an initialization of the weights (e.g. a random initialization) and in
an iterative procedure updates the weights wi by a quantity Δwi, where Δwi = –α
(∂E / ∂wi) and (∂E / ∂wi) is the gradient of the cost function with respect to the
weights, while α is a constant which takes small values in order to keep the
updates low and avoid oscillations.

(2) Question: Derive the gradient descent training rule assuming that the target
function representation is:

od = w0 + w1x1 + … + wnxn.

Define explicitly the cost/error function E, assuming that a set of training


examples D is provided, where each training example d ∈ D is associated with
the target output td.

Solution: The error function: E = ∑d D (td – od)2


The gradient decent algorithm: Δwi = –α (∂E / ∂wi)


First represent (∂E / ∂wi) in terms of the unit inputs xid, outputs od, and target
values td:
(∂E / ∂wi) = (∂∑d D (td – od)2) / ∂wi = ∑d D 2(td – od) (∂(td – od) / ∂wi) =
∈ ∈

∑d D 2(td – od) (–∂od / ∂wi) = –∑d D 2(td – od) (∂(w0 + … + wixid + … + wnxnd) /
∈ ∈

∂wi) = –∑d D 2(td – od) (xid)


=> Δwi = α ∑d D 2(td – od) xid



(3) Question: Prove that the LMS training rule performs a gradient descent to
minimize the cost/error function E defined in (2).

Solution: Given the target function representation


od = w0 + w1x1 + … + wnxn,
LMS training rule is a learning algorithm for choosing the set of weights wi to
best fit the set of training examples {< d, td >}, i.e., to minimize the squared
error E ≡ ∑d D (td – od)2.

LMS training rule works as follows:


(∀ < d, td >) use the current weights wi to calculate od
(∀wi) wi ← wi + η(td – od)xid (*)

From (2) à (∂E / ∂wi) = –∑d D 2(td – od)xid à –(1/2xid)(∂E / ∂wi) = (td – od)

Substitute this in (*) à (∀wi) wi ← wi + (η/2)(–∂E / ∂wi)


This shows that LMS alters weights in the very same proportion as does the
gradient descent algorithm (i.e., –∂E / ∂wi), proving that LMS performs gradient
descent.

(4) Question: Consider the following set of training examples:

Instance Classification a1 a2
1 + T T
2 + T T
3 - T F
4 + F F
5 - F T
6 - F T

What is the information gain of a2 relative to these training examples? Provide


the equation for calculating the information gain as well as the intermediate
results.

Solution:
Entropy E(S) = E([3+, 3-]) = -(3/6) log2 (3/6) - (3/6) log2 (3/6) = 1.
Gain (S, a2) = E(S) – (4/6)E(T) – (2/6)E(F) = 1 – 4/6 – 2/6 ≈ 0.
E(T) = E([2+, 2-]) = 1.
E(F) = E([1+, 1-]) = 1.

(5) Question: Suppose that we want to build a neural network that classifies two
dimensional data (i.e., X = [x1, x2]) into two classes: diamonds and crosses. We
have a set of training data that is plotted as follows:
X2

X1

Draw a network that can solve this classification problem. Justify your choice of
the number of nodes and the architecture. Draw the decision boundary that your
network can find on the diagram.

Solution:
A solution is a multilayer FFNN with 2 inputs, one hidden layer with 4 neurons
and 1 output layer with 1 neuron. The network should be fully connected, that is
there should be connections between all nodes in one layer with all the nodes in
the previous (and next) layer. We have to use two inputs because the input data
is two dimensional. We use an output layer with one neuron because we have 2
classes. One hidden layer is enough because there is a single compact region
that contains the data from the crosses-class and does not contain data from the
diamonds-class. This region can have 4 lines as borders, therefore it suffices if
there are 4 neurons at the hidden layer. The 4 neurons in the hidden layer
describe 4 separating lines and the neuron at the output layer describes the
square that is contained between these 4 lines.

(6) Question: Suppose that we want to solve the problem of finding out what a
good car is by using genetic algorithms. Suppose further that the solution to the
problem can be represented by a decision tree as follows:

size
large small
mid

brand no sport
yes
Volvo BMW SUV engine no
no yes no
F12 V10 V8
no yes no
What is the appropriate chromosome design for the given problem? Which
Genetic Algorithm parameters need to be defined? What would be the suitable
values of those parameters for the given problem? Provide a short explanation
for each.
What is the result of applying a single round of the prototypical Genetic
Algorithm? Explain your answer in a clear and compact manner by providing
the pseudo code of the algorithm.

Solution:
size = {large, mid, small} → 100, 010, 001, 011, …, 111, 000
brand = {Volvo, BMW, SUV} → 100, 010, 001, 011, …, 111, 000
sport = {yes, no} → 10, 01, 11, 00
engine = {F12, V12, V8} → 100, 010, 001, 011, …, 111, 000
GoodCar = {yes, no} → 10, 01, 11, 00
→ chromosome design:
size brand sport engine GoodCar
100 100 11 111 01

Fitness function for the given problem can be defined as a Sigmoid function f(x)
= 1 / (1+ e-x), where x is the percentage of all training examples correctly
classified by a specific solution (chromosome).
Selection method – e.g., rank selection method can be used;
Crossover technique – 2-point crossover can be used for the given problem with
a crossover mask 1111110000011; the reason is that either size + brand or
sport + engine define the solution
Crossover rate – usually k = 60%
Mutation rate – usually 1%
Termination condition – e.g., all training examples are correctly classified

GA pseudo code:
Step 1: Choose initial population.
Step 2: Evaluate the fitness of individuals in the population.
Step 3: Select k individuals to reproduce; breed new generation through
crossover and mutation; evaluate the individual fitness of offspring; replace k
worse ranked part of population with offspring.
Step 4: Repeat step 3 until the termination condition is reached.

s1/2: 1010101001010, 1011111110101, 0100011111101, 0011111001010,


1011011110101
s3: 1010101110110 (fit 1), 0011111001001 (fit 0), 1010101111110 (fit 2),
1011011001001 (fit 1), 0011111110101 (fit 2), 1011011110101 (fit 3)
result: 1011111110101, 1011011110101, 1011011110101, 0011111110101,
1010101111110

You might also like