0% found this document useful (0 votes)
7 views3 pages

Introduction To SVM

The document outlines various mathematical concepts related to machine learning, including Lagrange duality, Support Vector Machines, and gradient descent. It also discusses deep learning topics such as backpropagation, neural networks for logical functions, and Convolutional Neural Networks. Additionally, it covers classifier robustness and optimization problems in distinguishing data from different generators.

Uploaded by

spammysharky
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views3 pages

Introduction To SVM

The document outlines various mathematical concepts related to machine learning, including Lagrange duality, Support Vector Machines, and gradient descent. It also discusses deep learning topics such as backpropagation, neural networks for logical functions, and Convolutional Neural Networks. Additionally, it covers classifier robustness and optimization problems in distinguishing data from different generators.

Uploaded by

spammysharky
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

MA3K1 Mathematics of Machine Learning April 10, 2021

(14) Determine the Lagrange dual of the following problem.

minimize x2 1 subject to (x 3)(x 1)  0.

Does strong duality hold?

(15) Define the concept of a linear Support Vector Machine (SVM) and show how a
separating hyperplane of maximal margin can be found by solving a quadratic optimiza-
tion problem.

(16) Determine (with justification or counterexample) which of the following state-


ments is true and which is false.
(a) A convex function has a unique minimizer;
(b) A function f 2 C 2 (Rd ) is convex if and only if for all x, y 2 Rd ,

f (y) f (x) + hrf (y), y xi;

(c) Gradient descent converges linearly;


(d) A convex function is differentiable.

(17) Define the notions of Lipschitz continuity, ↵-convexity, and -smoothness. Show
directly, without using the convergence bound for gradient descent for ↵-convex and
-smooth functions, that if ↵ = for some function f , then gradient descent with a
suitable step size converges in one iteration.

(18) Describe stochastic gradient descent and explain the benefits and limitations as
opposed to gradient descent.

3 Deep Learning
(19) Describe the backpropagation algorithm for deep neural networks.

(20) Consider the logical AND function on {0, 1}2 , defined as AND(x1 , x2 ) = 1 if
x1 = 1 and x2 = 1 and AND(x1 , x2 ) = 0 else. Specify a neural network with ReLU
activation (that is, ⇢(x) = max{x, 0}) that implements the AND function.

(21) Consider the following function:

4
MA3K1 Mathematics of Machine Learning April 10, 2021

Figure 1: The saw function

Describe a neural network with ReLU activation (that is, ⇢(x) = max{x, 0}) that
implements this function on the interval [0, 1].

(22) Suppose you are given the data set displayed in Figure ??. Describe the structure
of a neural network with logistic sigmoid activation function that separates the triangles
from the circles, and sketch a decision boundary. The output should consist of a number
p 2 [0, 1] such that p > 1/2 when the input is a circle and p < 1/2 otherwise.

Figure 2: A classification problem

(23) Describe the idea of a Convolutional Neural Network and the features that make
it useful for image classification tasks.

(24) Define the robustness of a classifier with a finite number of classes {1, . . . , K}
and the distance to misclassification of a point. Given a linear classifier h, describe an
algorithm that takes any input x and computes the smallest perturbation r such that
x + r is in a different class than x.

5
MA3K1 Mathematics of Machine Learning April 10, 2021

(25) Suppose we have two generators Gi : Zi ! X , i 2 {0, 1}, and one discriminator
D : X ! {0, 1}. Assume that on Z we have probability densities ⇢Zi , and if Zi is a
random variable on Zi distributed according to this density, then Xi = Gi (Zi ) is a
random variable on X distributed with density ⇢Xi for i 2 {0, 1}. The goal of D is to
determine whether an observed random sample x 2 X was generated by G0 or G1 .
Describe the problem of training D to distinguish data coming from G0 from data
coming from G1 as an optimization problem, and characterize an optimal solution.

You might also like