0% found this document useful (0 votes)
11 views4 pages

Question

Uploaded by

Hank Lin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views4 pages

Question

Uploaded by

Hank Lin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

SECTION A.

Answer each of the following questions [ Marks per questions as shown; 25% total ]

1. (12 points) Suppose X and Y are random variables with the following joint distribution:

Y
1 2
X
0 0.2 0.1
1 0.0 0.2
2 0.3 0.2

Compute each of the following, showing all your working.

(a) (2 points) Compute the marginal distributions of X and Y , i.e. determine the values
of P(X = x) for all x, and P(Y = y) for all y.
(b) (2 points) Compute the conditional distribution of X given that the value Y = 2, i.e.
compute the values of P(X = x|Y = 2) for all x.
(c) (2 points) Compute E[X] and E[Y ].
(d) (4 points) Compute E[XY ].
(e) (2 points) Are X and Y independent? Explain your answer.

2. (5 points) You have a large bucket which contains 999 fair coins and one biased coin. The
biased coin is a two-headed coin (i.e. it always lands heads). Suppose you pick one coin
out of the bucket, flip it 10 times, and get all heads.

(a) (1 point) State Bayes’ rule.


(b) (1 point) State the Law of Total Probability.
(c) (3 points) What is the probability that the coin you choose is the two-headed coin?

3. (5 points)
(a) (1 point) In one or two sentences, explain what is the difference between the Bayesian
and Frequentist approaches to parameter estimation.
(b) (1 point) In one or two sentences, explain what the maximum likelihood estimate for
a parameter is.
(c) (1 point) In one or two sentences, explain what is a prior distribution in the context
of Bayesian inference.
(d) (1 point) In one or two sentences, explain what the Maximum a posteriori (MAP)
estimate for a parameter is.
(e) (1 point) In one or two sentences, explain the connections between MLE and the
MAP estimates. (Hint: what priors should you use?)
4. (3 points) A biased coin is flipped until the first head occurs. Denote by Z the number of
flips required with p being the probabilty of obtaining a head. Compute P(Z = k) stating
clearly the possible values of k.
(Hint: Write down the probabilities of some possible outcomes.)

Page 2 of 5 – INFORMATION THEORY – COMP2610/6261


SECTION B.
Answer each of the following questions [ Marks per questions as shown; 25% total ]

5. (4 points) Suppose the random variables X and Y are related by the following probabilities
P(X = x, Y = y) :

Y
0 1
X
0 1/3 1/3
1 0 1/3

Compute the following:

(a) (2 points) H (X ), H (Y ).
(b) (2 points) H (X |Y ), H (Y |X ).

6. (5 points)

(a) (3 points) What is a typical set? How does the typical set relate to the smallest
delta-sufficient subset?
(b) (2 points) Is the most likely sequence always a member of the typical set? Explain
your answer.

7. (3 points) Construct a Huffman code for the ensemble with alphabet A X = {a, b, c} and
probabilities p = (0.6, 0.3, 0.1). Show all your working.

8. (7 points) Suppose a single fair six-sided die is rolled, and let Y be the outcome.

(a) (3 points) Compute the expected value of Y and the variance of Y .


(b) (1 point) Calculate an upper bound for the quantity P(Y ≥ 6) using Markov’s
inequality.
(c) (3 points) Calculate an upper bound for the quantity P(Y ≥ 6) using Chebyshev’s
inequality and compare this to the answer obtained in part (b). Which is closer to
the true value of P(Y ≥ 6)?

9. (6 points)

(a) (1 point) What is the purpose of the sigmoid function in logistic regression?
(b) (2 points) Suppose you have a trained logistic regression model with weights w.
Roman proposes to classify a new point xnew as positive by checking if xTnew w > 0.
Alice proposes to classify it as positive by checking if σ(xnew w) > 0.5, where σ(·)is
the sigmoid function. Will these result in the same prediction? Explain why or why
not.
(c) (3 points) In one or two sentences, explain the relationship between logistic regression
and the maximum entropy principle.

Page 3 of 5 – INFORMATION THEORY – COMP2610/6261


SECTION C.
Answer each of the following questions [ Marks per questions as shown; 25% total ]

10. (10 points) Suppose Y is an ensemble equipped with AY = {a, b, c} and probabilities
p = (0.5, 0.25, 0.25).

(a) (2 points) Write down the alphabet and probabilities for the extended ensemble Y 2 .
(b) (3 points) Assuming the symbols for Y 2 are in alphabetical order, so that e.g. aa
appears before ab, what is the binary interval for ab in a Shannon-Fano-Elias code
for Y 2 ?
(c) (3 points) What is the smallest δ−sufficient subset for Y 2 when δ = 0.45?
(d) (2 points) What is the essential bit content Hδ (Y 2 ) when δ = 0.45?

11. (6 points)

(a) (1 point) Is every prefix-free code uniquely decodable? If yes, explain why. If no,
provide a counter-example.
(b) (1 point) Is the code C = {0, 01, 011} uniquely decodable? Explain your answer.
(c) (2 points) Explain the difference between a lossless and uniquely decodable code.
(d) (2 points) Consider a source W = {a, b, c, d, e}. Explain if it is possible to construct
a prefix code for this source with the proposed lengths: l a = 1, l b = 2, l c = 3, l d =
4, l e = 4, without actually giving an example of a code? (Hint: What conditions
should you check?)

12. (9 points) Let X be an ensemble with alphabet A X = {a, b, c, d} with probabilities


p = (1/2, 1/4, 1/8, 1/8) and the code C = (0000, 01, 11, 0).

(a) (2 points) What is the entropy H (X ) as a single number?


(b) (2 points) What is the expected code length L(C, X )?
(c) Which of these are Shannon codes? Justify your answers.
i. (1 point) A = {0, 10, 110, 111}
ii. (1 point) B = {000, 001, 010, 111}
iii. (1 point) C = {0, 01, 001, 010}
(d) (2 points) Is the code A in part (c)[i] optimal? Explain why or why not.

Page 4 of 5 – INFORMATION THEORY – COMP2610/6261


SECTION D.
Answer each of the following questions [ Marks per questions as shown; 25% total ]
13. [5 points] Consider a (5, 3) block code C.

(a) (3 points) What is the length of each codeword in class C? How many codewords
does class C define? Compute the rate for class C.
(b) (2 points) Do there exist codes with rate equal to that of class C, that can achieve
arbitrarily small probability of block error over a channel of capacity 0.4? Justify
your answer.

14. [15 points] Let X = {a, b} and Y = {a, b} be the input and output alphabets for the
following two channels:
" # " #
1 0.5 0.5 0
R= and R = †
0 0.5 0.5 1
(a) (2 points) Describe the behaviour of R and R† when each of the symbols a and b are
used as input.
(b) Define an arbitrary distribution pX = (p, q) with p ∈ [0, 1] and p + q = 1 over X.
Express the following for R:
i. (2 points) The probabilities P(Y = a) and P(Y = b) in terms of p, where Y is a
random variable denoting the output of the channel Y.
ii. (2 points) The entropy of H (Y ) in terms of the probability p and the function
H2 (·) defined by
H2 (ϑ) = −ϑ · log2 ϑ − (1 − ϑ) · log2 (1 − ϑ).
iii. (2 points) The mutual information I (X; Y ) in terms of p and the function H2
defined above.
(c) (4 points) Using the previous results or otherwise, compute the input distribution
that achieves the channel capacity for R.
(d) (3 points) Suppose you used the channels R and R† to send messges by first flipping
a fair coin and sending a symbol through R if it landed heads and through R† if it
landed tails. Construct a matrix Q that represents the channel defined by this process.
15. (5 marks) For an arbitrary noisy channel Q with N input symbols and M output symbols,
show that its capacity ρ satisfies
ρ ≤ min{log2 N, log2 M }.

– – – END OF EXAM – – –

Page 5 of 5 – INFORMATION THEORY – COMP2610/6261

You might also like