Homework 1: EE 737 Spring 2019-20 Assigned: 30 Jan Due: Beginning of Class, 06 Feb
Homework 1: EE 737 Spring 2019-20 Assigned: 30 Jan Due: Beginning of Class, 06 Feb
Homework 1
Assigned: 30 Jan Due: Beginning of Class, 06 Feb
Instructions
• You are allowed to discuss these problems with others, but you need to write up the actual
solutions, including the programming assignments, alone.
• Googling for solutions, creative or otherwise, is best avoided. It defeats the purpose of homeworks.
• Start on your solutions early and visit the TA during office hours (TBA) for help.
Questions
1. Let Xt be a binary 0/1 sequence. There are N experts and expert i predicts Yi,t for Xt . Recall
the MAJORITY algorithm that predicts
( )
X̂t = 1
X X
Yi,t ≥ (1 − Yi,t )
i i
where 1 {(} ·) is the indicator function. In class, we showed that if there is a perfect expert, the
regret is upper bounded by log2 N. Now assume that the best expert will make upto m mistakes
and obtain a bound on the regret for this case.
(a) Consider the oracle which knows the value of p. What should be the oracle’s strategy and
what is the expected number of mistakes it makes in t rounds?
(b) A forecasting algoirthm that does not know p applies the majority algorithm on the past
data to predict X̂t . Specifically,
( t )
X̂t = 1
X X
−1Xs ≥ (1 − Xs )
s=1 i
i.e., the forecast for round t is the majority value in round 1, · · · , t − 1 with randomisation
if there is not majority. First, obtain the probability of the forecaster making a mistake in
round t. Then characterise the expected number of total mistakes that the forecaster makes
in t rounds. Also characterise the regret, computed as the difference between the expected
number of mistakes of the forecaster and that of the oracle
3. Consider the Randomized Exponentially Weighted Average (REWA) forecaster. that we studied
in class. Suppose one of the experts is a perfect expert but the forecaster does not know that
there is a perfect expert and is applying the REWA algorithm. Characterise the expected number
of mistakes that the REWA forecaster makes in t rounds.
4. The analysis on the various forecasting algorithms that we did in class assumed that the time
horizon t was known, i.e., the forecasting was to be done for a total t rounds. This means that the
parameters of the forecasting algorithm could be functions of this time horizon t. Now suppose
an online learning algorithm with a parameter η > 0 has a regret bound of βη + γηT for a total
of T rounds, where β and γ are some positive constants (like, for example, the constants that
we used in the Exponential Weights forecasting model). Assume that if the time horizon T is
1
q
β
known in advance, then setting η := γT minimizes the bound. Now consider the case when the
time horizon is not known and we need to set the parameter γ. Let use the the following tweak.
Time is divided into periods: the m−th period is formed by rounds 2m , 2m + 1, . . . , 2m+1 − 1,
where m = 0, 1, 2, . . . , . In every m−th period,qstarting at round 2m , the original algorithm is
β
re-initialized and run with a parameter ηm := γ2m . Prove that this modified algorithm has a
√
regret bound which is at most optimal √ 2 times the optimal regret bound. An algorithm that
2−1
does not require knowledge of T is called an anytime algorithm.
5. Consider prediction with expert advice with a convex loss (in the first argument) bounded in
[0, 1]. Suppose you know in advance what the best expert’s total loss is going to be at time T
(this could be much less than O(T ), e.g., a constant). Let us now see how this information a
priori can be used to learn faster and reduce the regret.
(a) First, prove that log E esX ≤ (es − 1)E [X] any random variable X ∈ [0, 1].
(b) Let the experts be indexed by {1, 2, . . . , N }. Use the preceding result to show that for the
Exponential Weights Algorithm, the loss in T rounds, denoted by LT , is upper bounded by
ηL∗T + log N
LT ≤
1 − e−η
Here, as in the class, η is the parameter of the Exponential weights algorithm, and L∗T is
loss of the best of the N experts at the end of round T.
(c) Show that for any η, η ≤ (eη − e−η )/2. Use this in the bound of the previous part.
(d) Then, assuming that the value of L∗T is known beforehand, show that setting
q
η = log 1 + 2 log N/LT ∗
gives regret at most 2L∗T log N + log N which can be significantly small when the best
p
6. Consider the data given in synthetic data.txt and three forecasters which use Auto Regressive
(AR) models to provide predictions. In particular Forecaster 1 uses an AR(1) model, Forecaster 2
uses an AR(2) model and Forecaster 3 uses an AR(3) model. You combine the expert predictions
using the REWA forecaster. The loss between the prediction ŷt and the true value yt is given by
l(ŷt , yt ) = |ŷt − yt |/2. Run the REWA forecaster 100 times and report the average loss.
7. Consider the data given in stock price dataset.txt and five forecasters which use AR(1), · · · ,
AR(5) models to provide predictions. With 5 experts, repeat the previous exercise for this dataset.
Check this Wikipedia page and this Python tutorial to get started with AR models. You should
use libraries of the programming language to construct the experts. You will have to submit your code
and your output.