Introduction To SVM
Introduction To SVM
(15) Define the concept of a linear Support Vector Machine (SVM) and show how a
separating hyperplane of maximal margin can be found by solving a quadratic optimiza-
tion problem.
(17) Define the notions of Lipschitz continuity, ↵-convexity, and -smoothness. Show
directly, without using the convergence bound for gradient descent for ↵-convex and
-smooth functions, that if ↵ = for some function f , then gradient descent with a
suitable step size converges in one iteration.
(18) Describe stochastic gradient descent and explain the benefits and limitations as
opposed to gradient descent.
3 Deep Learning
(19) Describe the backpropagation algorithm for deep neural networks.
(20) Consider the logical AND function on {0, 1}2 , defined as AND(x1 , x2 ) = 1 if
x1 = 1 and x2 = 1 and AND(x1 , x2 ) = 0 else. Specify a neural network with ReLU
activation (that is, ⇢(x) = max{x, 0}) that implements the AND function.
4
MA3K1 Mathematics of Machine Learning April 10, 2021
Describe a neural network with ReLU activation (that is, ⇢(x) = max{x, 0}) that
implements this function on the interval [0, 1].
(22) Suppose you are given the data set displayed in Figure ??. Describe the structure
of a neural network with logistic sigmoid activation function that separates the triangles
from the circles, and sketch a decision boundary. The output should consist of a number
p 2 [0, 1] such that p > 1/2 when the input is a circle and p < 1/2 otherwise.
(23) Describe the idea of a Convolutional Neural Network and the features that make
it useful for image classification tasks.
(24) Define the robustness of a classifier with a finite number of classes {1, . . . , K}
and the distance to misclassification of a point. Given a linear classifier h, describe an
algorithm that takes any input x and computes the smallest perturbation r such that
x + r is in a different class than x.
5
MA3K1 Mathematics of Machine Learning April 10, 2021
(25) Suppose we have two generators Gi : Zi ! X , i 2 {0, 1}, and one discriminator
D : X ! {0, 1}. Assume that on Z we have probability densities ⇢Zi , and if Zi is a
random variable on Zi distributed according to this density, then Xi = Gi (Zi ) is a
random variable on X distributed with density ⇢Xi for i 2 {0, 1}. The goal of D is to
determine whether an observed random sample x 2 X was generated by G0 or G1 .
Describe the problem of training D to distinguish data coming from G0 from data
coming from G1 as an optimization problem, and characterize an optimal solution.