0% found this document useful (0 votes)

12 views25 pages

Dabel Info Theory

These notes cover key concepts from Chapters 2-5 and 7-9 of the Cover & Thomas Information Theory textbook, focusing on topics such as entropy, data compression, and channel capacity. The document includes definitions, identities, and inequalities related to information theory, as well as examples and theorems. The notes were created during an APMA course at Brown University and are intended for educational purposes.

Uploaded by

karthikr90637

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views25 pages

Dabel Info Theory

Uploaded by

karthikr90637

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Information Theory Notes

Cover & Thomas Chapters 2-5, 7-9

David Abel∗
[email protected]

I made these notes while taking APMA 1710 at Brown during Fall 2016 (taught by Prof. Govind
Menon1 ), which followed the 2nd edition of the Cover & Thomas Information Theory Textbook [1].
If you find typos, please let me know at the email above. The images are of course based on the
textbook, but are of my own creation.

Contents
1 Chapter Two: Entropy 3
1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Identities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Chapter Three: AEP 5

2.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 AEP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Typical Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.4 Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.5 Probable Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3 Chapter Four: Entropy Rates, Stochastic Processes 9

3.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Entropy Rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3 Thermodynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.4 Main Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

4 Chapter Five: Data Compression 12

4.1 Types of Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.2 Minimizing Length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.3 Huffman Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.4 Dyadic distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
∗
http://david-abel.github.io
1
http://www.dam.brown.edu/people/menon/

APMA 1710 Page 1 David Abel

5 Chapter Seven: Channel Capacity 16
5.1 Example Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.2 Symmetric Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.3 Properties of Channel Capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.4 Channel Coding Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.4.1 Error and Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.4.2 Jointly Typical Sets and Joint AEP . . . . . . . . . . . . . . . . . . . . . . . 18
5.5 Hamming Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.6 Source Channel Coding Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

6 Chapter Eight: Differential Entropy 21

6.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
6.1.1 Uniform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
6.1.2 Gaussian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
6.2 Mutual Info of Multivariate Normal . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
6.3 Multivariate Normal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
6.4 Identities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
6.5 AEP For Continuous R.Vs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
6.6 Maximum Entropy Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

7 Chapter Nine: Gaussian Channel 24

7.1 Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
7.2 Band Limited Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
7.3 Shannon-Nyquist Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

APMA 1710 Page 2 David Abel

1 Chapter Two: Entropy
1.1 Definitions
P
Entropy: H(X) = − x∈X p(x) log p(x)
P P
Joint Entropy: H(X, Y ) = − x∈X y∈Y p(x, y) log p(x, y)
P P
Conditional: H(X | Y ) = − x∈X y∈Y p(x, y) log p(x | y)

p(x) log p(x)

P
Relative Entropy: D(p || q) = x∈X q(x)
P P p(x,y)
Mutual Information: I(X; Y ) = D(p(x, y) || p(x)p(y)) = x∈X y∈Y p(x, y) log p(x)p(y)

Others: I(X; Y | Z), I(X1 , . . . , Xn ; Y | Z), H(X1 , . . . , Xn | Z), H(X, Y | Z), D(p(y | x) || q(y | x))

H(X, Y | Z) = H(X | Z) + H(Y | X, Z)

I(X; Y | Z) = H(X | Z) − H(X | Y, Z)

Note: can get at these from a three-way venn diagram

1.2 Identities

H(X,Y)

H(X | Y) H(Y | X)

H(X) H(Y)
I(X;Y)

Bounds:

• 0 ≤ H(X) ≤U log |X |, where ≤U is with equality iff p(x) is the uniform distribution.

• 0 ≤ H(Y | X) ≤I H(Y ), where ≤I is with equality iff X and Y are independent.

• 0 ≤ H(X, Y ) ≤I H(X) + H(Y ), where ≤I is with equality iff X and Y are independent.

• I(X; X) = H(X)

• 0 ≤ I(X; Y ) ≤ H(X)

• D(p || q) ≥ 0, with equality only if p = q.

APMA 1710 Page 3 David Abel

1.3 Convexity
Convexity: for λ ∈ [0, 1]:

f (λx1 + (1 − λ)x2 ) ≤ λf (x1 ) + (1 − λ)f (x2 ) (1)

1 2

Where 1 specifies any point on the function between x1 and x2 , and 2 specifies any point on the
line connecting f (x1 ) and f (x2 ). A function is convex when it’s second derivative is non negative.

Jensen’s Inequality: for f a convex function:

f (E[X]) ≤ E[f (X)] (2)

Data Processing Inequality: Suppose X → Y → Z forms a Markov Chain. Then:

I(X; Y ) ≥ I(X; Z) (3)

APMA 1710 Page 4 David Abel

2 Chapter Three: AEP
2.1 Definitions
Markov’s Inequality: deviation of a r.v. from some value
E[X]
Pr(X ≥ a) ≤ (4)
a
Chebyshev’s Inequality: deviation of r.v. from its mean:
V ar[X]
Pr(|X − µ| ≥ ε) ≤ (5)
ε2
Convergence in Probability: for a given sequence {Ai }∞
i=1 :

lim Pr(|Ai − L| > ε) = 0

n→∞
,Ai → L
Pr

Weak Law of Large Numbers: if {Xi }ni=1 are iid random variables with mean µ and variance
σ 2 < ∞, then the sample mean approaches the true mean as you get more samples:
n
1X
Xi → µ = E[X] (6)
n Pr
i=1

2.2 AEP
AEP: Consider a sequence {Xi }∞i=1 where each Xi is iid with pmf p(x) and entropy H(X). Then
the sample entropy approaches the true entropy as you get more samples. Or:
1
− log p(X1 , . . . , Xn ) → H(X)
n Pr

1
lim Pr − log p(X1 , . . . , Xn ) − H(X) > ε = 0
n→∞ n

2.3 Typical Set

Typical Set: contains all sequences with “sample entropy” ≈ H(X).

1 1
A(n)
ε = x n
∈ X n
: ≤ p(x n
) ≤ (7)
2n(H(x)+ε) 2n(H(x)−ε)
So the probability of the sequences is roughly the average probability sequence.
(n)
A few properties about Aε :

(1) The first is that the “sample entropy” is close to the true entropy. That is, for each
(n)
xn ∈ Aε :
1
H(X) − ε ≤ − log p(xn ) ≤ H(X) + ε (8)
n

APMA 1710 Page 5 David Abel

(2) Sampling a sequence from X n has probability greater than 1 − ε to be in the typical set:

Pr xn ∈ A(n)ε >1−ε (9)

Which follows from the AEP, basically. That is, since the sample entropy converges to
H(X) in probability, there must exist a nε such that:
1
lim Pr(| − log p(xn ) − H(X)| > ε) < δ (10)
n→∞ nε
1

(n)
But note that the actual event, term 1 , can be rewritten into xn ∈ Aε :
1
−ε ≤ − log p(xn ) − H(X) ≤ ε
nε
1
H(X) − ε ≤ − log p(xn ) ≤ ε + H(X)
nε
−nε (H(X) − ε) ≥ log p(xn ) ≥ −nε (ε + H(X))
2−nε (H(X)−ε) ≥ p(xn ) ≥ 2−n(ε+H(X))

Which is exactly the condition for being in the typical set. Therefore, for nε , Equation 10
(n)
occurs, which implies that xn ∈ Aε . Therefore, the probability of being in the typical set
goes to 1 for n sufficiently large.

(3,4) The last two properties give bounds on the size of the typical set:

(1 − ε)2n(H(X)−ε) ≤ Aε(n) ≤ 2n(H(X)+ε) (11)

Where the left hand size (lower bound) is for n sufficiently large.
We know the total number of length n sequences is |X n |, but surely the typical set doesn’t
(n)
contain every length n sequence. Since we know that each xn ∈ Aε has bounds on the
(n)
probability of that sequence. We can leverage this to bound the size of Aε :
X
p(xn ) = 1
xn ∈X n
X
≥ p(xn )
(n)
xn ∈Aε
X
≥ 2−n(H(X)+ε)
(n)
xn ∈Aε

APMA 1710 Page 6 David Abel

But this last term doesn’t depend on xn , so:
X
≥ 2−n(H(X)+ε) = |A(n)
ε |·2
−n(H(X)+ε)
(12)
(n)
xn ∈Aε

From the first equality, we see:

−n(H(X)+ε)
|A(n)
ε |·2 ≤1
(n)
|Aε |
∴ ≤1
2n(H(X)+ε)
∴ |A(n)
ε |≤2
n(H(X)+ε)

(n)
Recall that P (xn ∈ Aε ) > 1 − ε for n large. We bound the LHS by:
X X
P (xn ∈ A(n)
ε ) ≤M p(xn ) = 2−n(H(X)−ε) = |A(n)
ε 2
−n(H(X)−ε)
(13)
(n) (n)
xn ∈Aε xn ∈Aε

Where ≤M follows since the right hand side gives the maximal probability of the set, since
every element is maximally probable.

2.4 Codes
Code: Assigns a unique binary sequence to every sequence in X n .

Need:

• n(H(X) + ε) + 2 bits to make a code for each item in the typical set.

• n log |X | + 1 bits to make a code for each item not in the typical set.
(n)
Since there are |Aε | ≤ 2−n(H(X)+ε) , if we just enumerate all items in the typical set, we need
n(H(X) + ε) bits to code each item. We then add 1 in case that’s not an integer (could take ceil
just easily), and add 1 so we prefix all typical sequences with a 0. Therefore we have a code that
encodes all sequences in the typical set with n(H(X) + ε) + 2 bits. We get the same for the non
typical sets, giving us a total code length of n log |X | + 1 (don’t need +2 since guaranteed integer).
(n)
If n is sufficiently large so that Pr(xn ∈ Aε ) > 1 − ε, the expected length of a code word in the
typical set is:
1 n
E `(X ) ≤ H(X) + ε (14)
n
On average, each element of the sequence takes about the entropy of the r.v. to encode. Thus we
can represent sequences X n using around nH(X) bits on average.

APMA 1710 Page 7 David Abel

X
E[`(X n )] = p(xn )`(xn )
xn
   
 X   X
= p(xn )(n(H(X) + ε) + 2) +  p(xn )(n log |X | + 1)

(n) (n)
xn 6∈Aε xn 6∈Aε

= Pr(xn ∈ Aε (n) )(n(H(X) + ε) + 2) + Pr(xn 6∈ A(n)

ε )(n log |X | + 1)
≤ (n(H(X) + ε) + 2) + (n log |X | + 1)

2.5 Probable Set

Probable Set: what is the relationship between the typical set and the smallest such set that
(b)
contains most the probability? Let Bδ be the smallest set such that:
(b)
Pr(xn ∈ Bδ ) ≥ 1 − δ (15)

Then we’ll see that, for δ < 1

2 and δ 0 > 0:

1 (n)
log Bδ > H(X) − δ 0 (16)
n
(n) (n)
Thus, Bδ must have at least 2nH(x) elements, which is about the same size as Aε .

APMA 1710 Page 8 David Abel

3 Chapter Four: Entropy Rates, Stochastic Processes
3.1 Definitions
Stochastic Process: S is an indexed sequence of random variables, {Xk }∞
k=1

Stationary: A process S is stationary if the statistics don’t change as you move in time:

Pr(Xk+1 = x1 , . . . , Xk+m = xm ) = Pr(X1 = x1 , . . . , Xm = xm ) (17)

For all choices of m, k, and x.

Markov Process: S is a Markov Process (or Markov Chain) if:

Pr(Xk+1 = xk+1 | X1 , . . . , Xk ) = Pr(Xk+1 = xk+1 | Xk ) (18)

Time Invariance: property of a Markov Chain if P (Xn | Xn−1 ) = P (X2 | X1 ).

Markov Chain can be characterized by a transition matrix, P :

 
P1,1 P2,1 . . . Pn,1
P =  ... .. ..  (19)

. . 
P1,n P1,n . . . P1,n

And a start state. Typically we’ll ask for a start distribution. If the distribution after one transition
is identical to the start distribution, then we say it’s the stationary distribution:

µ1 , µ2 , . . . , µn = µ1 , µ2 , . . . , µn P (20)

We solve for the stationary distribution using Eigenvalue decomposition with eigenvalue one, or
just solving the system of equations.

That is, to solve for eigen values, we do Av = λv, so det(A−λI)v = 0, which gives the characteristic
polynomial. Solve for the roots gives eigen vectors.

With the stationary distribution, we see µ = µP , so µ is already known to be an eigenvector with

eigenvalue 1. Thus, we just solve det(P − λI)µ = 0, for λ = 1.

3.2 Entropy Rates

Q: How does the entropy of a sequence X1 , . . . , Xn grow with n?

Entropy Rate: The per symbol entropy of the n random variables, when the limit exists:
1
H(S) = lim H(X1 , . . . , Xn ) (21)
n→∞ n

And a related quantity, the conditional entropy rate of the last random variable given the sequence:

H 0 (S) = lim H(Xn | Xn−1 , . . . , X1 ) (22)

n→∞

APMA 1710 Page 9 David Abel

For a stationary stochastic process, H(S) = H 0 (S), and the limit exists. H 0 (S) existing follows by
conditioning reducing entropy and non-negativity of entropy, so the probability has to pile up. The
second one, H(S), follows by applying the chain rule, so we get a running average of conditional
entropies. By the Cesaro Mean, a running average of things that converge to B also converges to
B. Thus, H(S) converges to H 0 (S), so they’re equal and the limit exists.

Entropy Rates. We have two definitions, H(S) and H 0 (S). Moreover, they’re equivalent which
is convenient for computing the entropy rate of a stationary Markov chain.

If S is a stationary Markov chain, then H(S) is:

H(S) = H 0 (S) = lim H(Xn | Xn−1 , . . . , X1 )

n→∞
=M lim H(Xn | Xn−1 )
n→∞
=T lim H(X2 | X1 )
n→∞
= H(X2 | X1 )

Where =M follows from the Markov property and =T follows by time invariance.

So the entropy rate of a stationary Markov chain is H(X2 | X1 ). Let µ be the stationary distribution
and P be the transition matrix. Then:
X X
H(X2 | X1 ) = − p(x1 , x2 ) log p(x2 | x1 ) (23)
x1 ∈X x2 ∈X

Where by the chain rule of probability, p(x1 , x2 ) = p(x2 | x1 )p(x1 ). Note that the transition matrix
P denotes the probability of going to state x2 given state x1 , and µ denotes the probability of being
in state x1 . So: XX
H(X2 | X1 ) = − µi Pij log Pij (24)
µi Pij

3.3 Thermodynamics
Relative entropy between two distributions decreases with time:

D(µn || µ0n ) ≥ D(µn+1 || µ0n+1 ) (25)

Argument follows from chain rule for entropy (or expanding and using total law of prob).

From this we also see that the relative entropy between any distribution and the stationary distri-
bution decreases with time. Let µ0n = κ be stationary, then µ0n+1 = µ0n , so, applying the previous
result:
D(µn || κ) ≥ D(µn+1 || κ) (26)

3.4 Main Theorems

• Theorem 4.2.1: stationary stochastic process, limits of H and H 0 exist and are equal in the
limit.

APMA 1710 Page 10 David Abel

• 4.2.2: H(Xn | Xn−1 . . . X1 ) is non-increasing in n and has limit H 0 .

• Cesaro Means

• Entropy Rate of Stationary Markov Chains

• Formula for the previous one

• Random Walks

APMA 1710 Page 11 David Abel

4 Chapter Five: Data Compression
Definitions:

• A Source Code for a r.v. is a mapping from X to D∗ , the set of finite-length strings from
a size D alphabet. C(x) is the codeword of x and `(x) is the length of C(x).
P
• A code word’s expected length is: L(C) = x∈X p(x)`(x)

4.1 Types of Codes

Three different types of source codes:

1. Nonsingular: every element of the alphabet of X maps to a different codeword:

x 6= x0 → C(x) 6= C(x0 ) (27)

2. Uniquely Decodable: Every sequence of coded strings decodes to exactly one message.

3. Instantaneous/Prefix: Can read each character as you read the code.

Where Prefix ⊂ Uniquely Decodable ⊂ Nonsingular ⊂ Codes:

All codes

Nonsingular codes

Uniquely
decodable codes

Al
Instantaneous
codes

4.2 Minimizing Length

So what we really want are prefix/instantaneous codes (they have the nicest properties). Thus, our
whole goal with source codes is to come up with the prefix coding scheme that yields the shortest
possible expected length. Clearly we can’t make every single codeword super short and still be a
prefix code. The Kraft Inequality gives us a fundamental limitation on the length of codewords:

Theorem (Kraft Inequality): Assume C is a prefix code. Then:

X
D−`(x) ≤ 1 (28)
x∈X

APMA 1710 Page 12 David Abel

Theorem (Converse Kraft Inequality): Given a set of lengths `(x) that satisfy:
X
D−`(x) ≤ 1 (29)
x∈X

There exists a prefix code with these lengths.

Proof.
We can think of a prefix code as a binary tree, where each branch represents choosing one
of the D symbols for the next symbol of the code. Then a prefix code guarantees that each
codeword has no children in the tree.

Consider the length of the longest codeword `max . Now consider all codewords at this level
of the tree.

In the complete tree (so no children are pruned, they’re just listed as a “descendent”), we
have: X
D`max −`i ≤ D`max (30)

Converse Proof

Proof.
Given lengths, `1 . . . , `k that satisfy the Kraft inequality, we can always come up with a
prefix tree.

We care about finding the prefix code with minimum expected length: that is, due to Kraft, we
want to find a prefix code that satisfies the kraft inequality, that minimizes the expected code word
length. So: !
X
min(L) = min pi `i (31)
` `
i

Over all integers `1 . . . satisfying: X

D−`i ≤ 1 (32)
i

We solve this using Lagrange Multipliers:

X X
min J = min pi `i + λ D−`i (33)
λ λ

Where we differentiate w.r.t. `i , to get:

∂J
= pi − λD−`i loge D (34)
∂`i

APMA 1710 Page 13 David Abel

We set this equal to 0 to get:
pi
D−ì = (35)
λ loge D
Now, we revisit our constraint from the Kraft Inequality:
X
D−ì = 1
X pi 1
∴ = =1
λ loge D λ loge D
1
∴λ =
loge D
So we conclude that pi = D−ì .

1
Thus, the optimal code lengths are `i = logD pi . Later we’ll force this to an integer with the
ceiling operator.

Theorem: Expected length L of any instantaneous D-ary code for a r.v. X is lower bounded
below by the entropy HD (X):

HD (X) ≤ L ≤ HD (X) + 1 (36)

Proof of lower bound idea: Write out the difference L − HD (X) and turn the result into a relative
entropy quantity plus a positive constant, by the information inequality we conclude L−HD (X) ≥ 0.

Proof of upper bound idea: Let each length `i = dlogD p1i e, so it’s guaranteed to be between logD 1
pi
and logD p1i + 1. Then we multiplty both sides by pi and sum over i to get the bounds.

So the entropy is the central limitation on Data Compression.

4.3 Huffman Codes

Definition 1 (Huffman Code): The Huffman code is an algorithm for computing an optimal
prefix code:

Input: a pmf, p, and an alphabet, A.

Output: a code.

There are two steps:

(1) Cluster

(2) Rerank

The cluster step takes a probability vector of m elements in order of mass: hp1 , p2 , . . . , pm i and
computes an length m − 1 vector, also in order, where the two smallest elements are merged:
hp1 , p2 , . . . , pm−1 + pm i.

APMA 1710 Page 14 David Abel

• Procedure for making them

• Examples

• They’re optimal

4.4 Dyadic distributions

Generating distributions from fair coins, properties.

APMA 1710 Page 15 David Abel

5 Chapter Seven: Channel Capacity
Summary: we find the maximum number of distinguishable signals for n uses of a communication
channel.

W Xn Channel Yn Ŵ
Encoder Decoder
p(y | x) Estimate of
Message
Message

Definition 2 (Channel Capacity): We define the information channel capacity of a discrete

memoryless channel (DMC) as:
max I(X; Y ) (37)
p(x)

Intuition: if we could control exactly how bits are sent, what’s the most info we can send over the
channel? How much shared info is there between the output and the input.

5.1 Example Channels

• Noiseless binary channel, Noisy channel with nonoverlapping outputs, Noise typewriter

• Binary Symmetric Channel with crossover probability p:

I(X; Y ) = H(Y ) − H(Y | X) = H(Y ) − H(p) ≤ 1 − H(p) (38)

• Binary Erasure Channel with erasure probability α:

I(X; Y ) = 1 − α (39)

5.2 Symmetric Channels

Definition 3 (Symmetric Channel): A channel is said to be Symmetric if the rows of the

channel transition matrix p(y | x) are permutations of each other and the columns are permu-
tations of each other.

Definition 4 (Weakly Symmetric Channel): A channel is said to be Weakly Symmetric if

Pevery row of the transition matrix p(· | x) is a permutation of every other row and the sums
if
x p(y | x) are all equal.

APMA 1710 Page 16 David Abel

Theorem: For a weakly symmetric channel:

C = log |Y| − H(row of transition matrix) (40)

Which is achieved by a uniform distribution on the inputs.

5.3 Properties of Channel Capacity

1. C ≥ 0 since I(X; Y ) ≥ 0

2. C ≤ min {log |X |, log |Y|}, C ≤ min {H(X), H(Y )}.

3. I(X; Y ) is a continuous function of p(x) and is concave.

5.4 Channel Coding Theorem

Definition 5 ((M, n) Code): An (M, n) code for the channel (X , p(y | x), Y) consists of the
following:

1. An index set {1, . . . , M }.

2. An encoding function X n : {1, . . . , M } 7−→ X n , resulting in codewords xn (1), xn (2), . . ..

The set of codewords is called the code book.

3. A decoding function g : Y n 7−→ {1, . . . , M }.

5.4.1 Error and Rate

We have a several definitions relevant to error.

1. First: λi , which is the probability of error in sending message i over the channel:

p(y n | xn )1{y n 6= g(xn )}

X
λi = Pr(g(Y n ) 6= i | X n (i) = xn (i)) = (41)
yn

2. The maximal probability of error is just the maximal error term over all λi :

λ(n) = max λi (42)

(n)
3. The average probability of error Pe for an (M, n) code is:
1 X
Pe(n) = λi (43)
M
i

APMA 1710 Page 17 David Abel

Definition 6 (Rate): The rate, denoted R, of an (M, n) code is:

log M
R= bits per transmission (44)
n

So the rate is, per bit sent over the channel, how much of the actual message does it actually capture?

A rate is achievable if there exists a sequence of d2nR e, n codes such that the maximal probability

of error goes to zero as n goes to infinity.

5.4.2 Jointly Typical Sets and Joint AEP

Definition 7 (Jointly Typical Set): The jointly typical set for two r.v.’s is:

A(n) n n n n
ε , {(x , y ) : fε (x , y )} (45)

Where:
1
fε (xn , y n ) = − log p( ) − H( ) < ε (46)
n
Where can be x , or can be y , or both at the same time (xn , y n ). Where:
n n

n
Y
p(xn , y n ) = p(xi , yi ) (47)
i=1

That is, it must satisfy all three conditions.

Joint AEP gives us the same properties as the AEP:

(n)
1. Pr((X n , Y n )) ∈ Aε → 1 as n → ∞
(n)
2. |Aε | ≤ 2n(H(X,Y )+ε)

3. Consider (X̃ n , Ỹ n ) ∼ p(xn )p(y n ): that is, the tilde vars are sampled independently but with
the same marginals as X n and Y n . Then:

(1 − ε)2−n(I(X;Y )+3ε) ≤ Pr((X̃ n , Ỹ n ) ∈ A(n)

ε )≤2
−n(I(X;Y )−3ε)
(48)

Takeaway from 3. is that, the probability that the independently sampled var-pairs are in the
typical set is controlled by the mutual information.

Channel Coding Intuition: All rates below capacity C are achievable, and all rates above capacity
are not.

APMA 1710 Page 18 David Abel

Theorem (Channel Coding Theorem): For a discrete memoryless channel, all rates below ca-
pacity C are achievable. Specifically, for every rate R < C, there exists a sequence of (2nR , n)
codes with maximum probability of error λ(n) → 0.

Conversely, any sequence of (2nR , n) codes with λ(n) → 0 must have R ≤ C.

Proof idea:

• Pick a random codebook.

• Send messages as usual over the channel: so we receive y n .

• Decode via joint typicality:

(n)
– Search the codebook for the pair (xn (i), y n ) ∈ Aε . That is, find the input message
that, when coded, probably led to the output.
– We assume this is the message sent. We might error due to two things:
1. We don’t find such a pair.
2. We find a pair but it’s the wrong one.
– Per the properties of the jointly typical set, both of these occur negligibly often.

5.5 Hamming Codes

The actual coding scheme used for the Channel Coding Theorem is highly impractical (it’s random!).
Hamming codes solve that. Best description is from the Venn Diagram:

1
0 1

Place the 4 information bits into the 4 central intersecting regions. To code, place 1s in each of
the remaining regions so that each circle has an even number of bits. Then when you receive the
message, reconstruct the venn diagrams and you can identify where bits may have been flipped.

Hamming code is the elements of the null space of the matrix denoting the possible messages.
That is, each column is a possible message. We compute the null space of matrix h, which is the
set of vectors such that Hv = 0.

Just solve: Hv = 0 and you’re done.

APMA 1710 Page 19 David Abel

5.6 Source Channel Coding Theorem
Here we combine the two central results:
1. Data compression: R > H
2. Data Transmission: R < C

Theorem (Source Channel Coding Theorem) Let V = V1 , V2 , . . . , Vn be any stochastic process

that satisfies the AEP, and H(V) < C. Then there is a source-channel code with probability
of error Pr(Vb n 6= V n ) → 0.

Conversely, for any stationary stochastic process, if H(V) > C, it’s not possible to send the
process over the channel with arbitrarily low probability of error.

Takeaway: The separation theorem says that the separate encoder can achieve the same rates as
the joint encoder. That is, the following two are the same:

Vn Xn(Vn) Channel Yn V̂ n
Encoder Decoder
p(y | x) Estimate of
Message
Message

Vn Xn(Vn) Channel Yn V̂ n
Source Channel Channel Source
Encoder Encoder p(y | x) Decoder Decoder
Message Estimate of
Message

Proof idea:
• Since the stochastic process satisfies the AEP, it implies there exists a typical set.
• Index all sequences in the typical set.
• There are at most 2n(H(X)+ε) elements in the typical set, so we need at most n(H(X) + ε)
bits to encode them.
• If H(V) + ε = R < C, we can transmit the sequence with low probability of error:

Pr(V n 6= Vb n ) ≤ Pr(V n 6∈ A(n) n n n (n)

ε ) + Pr(g(Y ) 6= V | V ∈ Aε ) ≤ ε + ε (49)

Converse combines Fano’s Inequality and the Data Processing Inequality.

b such that X → Y → X,
Fano’s Inequality: For any estimator X b with Pe = Pr(X 6= X),
b we have:

H(Pe ) + Pe log |X | ≥ H(X | X)

b ≥ H(X | Y ) (50)

Or:
1 + Pe log |X | ≥ H(X | Y ) (51)

APMA 1710 Page 20 David Abel

6 Chapter Eight: Differential Entropy
Suppose X is a r.v. taking values in R with pdf f (x), with support {x | f (x) > 0}. Then we get
our usual definitions:
R
1. Differential Entropy: h(X) = S f (x) log f (x)dx.

2. Joint Entropy: h(X1 , X2 , . . . , Xn ) = − f (xn ) log f (xn )dxn .

R
3. Conditional Entropy: h(X | Y ) = − f (x, y) log f (x | y)dx dy.

4. KL-Divergence: D(f || g) = Sf ∩Sg f (x) log fg(x)

(x)
R
dx

f (x, y) log f f(x)f

(x,y)
R
5. Mutual Information: I(X; Y ) = (y) dx dy

We can translate between Differential Entropy and Discrete Entropy. Consider quantizing f (x)
according to some fixed step size, ∆. That is, approximate the curve with blocks of width ∆.

P∞
Let H(X∆ ) = − −∞ pi log pi . Where pi is going to be the value of those rectangles.
R xk+∆
Consider a point xk . Then along the interval, [xk , xk+∆ ], let p(xk ) = xk f (x)dx, where
H(X∆ ) = − ∞
P
−∞ p(xk ) log p(xk ).

Idea: We’re taking rectangles and putting them over each interval so that the area of the rectangle
is identical to the area of the curved piece of the function.

But p(xk ) = pk = f (xk ) · ∆, since it just describes a box that approximates the pdf for that interval
(width ∆ and height f (xk )). So: 4
H(X∆ ) ≈ h(x) − log ∆ (52)

APMA 1710 Page 21 David Abel

In more detail, we have that:
∞
X
H(X∆ ) = − pi log pi
−∞
∞
X
=− f (xi )∆ log (f (xi )∆)
−∞
∞
X
=− f (xi )∆(log f (xi ) + log ∆)
−∞
∞
X ∞
X
=− f (xi )∆ log f (xi ) − f (xi )∆ log ∆
−∞ −∞
| {z } | {z }
goes to h(f ) as ∆ → 0 =log ∆

P∞
Therefore, we add the right term to get H(X∆ ) + log ∆ = − −∞ f (xi )∆ log f (xi ), which, as
∆ → 0, becomes h(X).

6.1 Examples
Now, some example continuous channels.

6.1.1 Uniform
Let X be a r.v. with uniform probability on the interval [0, a]. Then:
Z a Z a
1 1
h(X) = − f (x)logf (x)dx = − log dx = log a (53)
0 0 a a

6.1.2 Gaussian
Let X be a r.v. with a Gaussian density function:
1 −x2
f (x) ∼ φ(x) = √ exp − (54)
2πσ 2 2σ 2
Then it’s entropy is:
∞
1
Z
h(X) = − φ ln φ = ln 2πeσ 2 (55)
−∞ 2

6.2 Mutual Info of Multivariate Normal

σ 2 ρσ 2

Suppose K = . Then:
ρσ 2 σ 2
I(X; Y ) = h(X) + h(Y ) − h(X, Y ) (56)
And we can compute h(X) and h(Y ) via the entropy of a normal distribution: 21 log 2πeσ 2 , and

we can compute h(X, Y ) as the entropy of a multivariate normal: 21 log [2πe det(K)], and we’re
done.

APMA 1710 Page 22 David Abel

6.3 Multivariate Normal
1
h(X1 , . . . , Xn ) = h(Nn (µ, K)) = log [2πe det(K)] (57)
2

6.4 Identities
Key: In general, h(X) + n is the number of bits on the average required to describe X to n-bit
accuracy.

• Venn Diagram: I(X; Y ) = h(X) − h(Y | X) = h(Y ) − h(X | Y ) = h(X) + h(Y ) − h(X, Y )
• Information Inequality: D(f || g) ≥ 0. Consequently:
– I(X; Y ) ≥ 0, equality iff independent.
– h(X) ≥ h(X | Y ), equality iff independent.
– h(X1 , . . . , Xn ) ≤ ni=1 h(Xi ), equality iff independent.
P

• Hadamard’s Inequality:
n
Y
det(K) ≤ Ki,i (58)
i=1

• h(X + c) = h(X)
• h(aX) = h(X) + log |a|

6.5 AEP For Continuous R.Vs

Let X1 , . . . , Xn be a sequence of r.v.s drawn i.i.d. from density f (x). Then:
1
− log f (X1 , . . . , Xn ) → h(X) (59)
n
Typical set is the same, but the set-cardinality is translated into the Volume of the continuous set.

6.6 Maximum Entropy Bound

If a pdf has variance N , then the entropy of the pdf is upper bounded by:
1
h(f ) ≤ h(N (0, N )) = log [2πeN ] (60)
2
Proof idea:
• Consider the relative entropy of a pdf with variance/covariance K and a normal with vari-
ance/covariance K: φK . This is: D(f ||φK ).
• Expanding, we get:
f
Z Z Z
0 ≤ D(f ||φK ) = f (x) log dx = f log f − f log φK = −h(f ) + h(φk ) = h(φk ) − h(f )
φK
(61)
And we know the last piece is ≥ 0, so we’re done.

APMA 1710 Page 23 David Abel

7 Chapter Nine: Gaussian Channel
Xi is a r.v. with a continuous alphabet X , we have a time discrete channel with noise Yi = Xi + Zi ,
where the noise Zi ∼ N (0, N ). The noise is assumed to be independent of the signal.

If there’s no constraint on the input, then the capacity could be infinite since X can take any
real value, so we can just spread the input values arbitrarily far apart subject to whatever noise
is present in the channel. To avoid this (which is clearly unrealistic) we impose an input power
constraint.

Power Constraint:
n
1X 2
xi ≤ P (62)
n
i=1
Capacity is the same, but now subject to a power constraint:

1 P
C = max I(X; Y ) ≤ πe 1 + (63)
p(x):E[x]≤P 2 N

Proof idea is just to write it out: The noise is just Z, so I(X; Y ) = h(Y ) − h(Y | X) = h(Y ) − h(Z),
where h(Z) = 12 log [2πeN ]. Plug and chug.

Note that:

E[Y 2 ] = E[(X + Z)2 ] = E[X 2 + XZ + Z 2 ] = E[X 2 ] + E[XZ] + E[Z 2 ] (64)

We know E[Z] = 0,

7.1 Codes
We can
P make codes in the same way, only now the encoding function produces codewords such
that: ni=1 xi (w)2 ≤ nP .

A rate is achievable there exists a code that satisfies the power constraint and the usual notion of
achievability is obtained.

Channel Coding Theorem: We get the channel coding theorem again for Gaussian channels.
That is, any Rate R < C is achievable, and the converse, that any rate R ≥ C is not achievable.

7.2 Band Limited Channels

Consider the frequency domain, with ω ranging from −∞ to ∞, corresponding to different frequen-
cies. Consider a continuous function of time f (t) that spits out different frequencies.

Definition 8 (Bandlimited): We say f (t) is bandlimited to W if F (ω) = 0 for |ω| > W .

So if there is some value for which, outside that interval, there are no frequencies!

APMA 1710 Page 24 David Abel

7.3 Shannon-Nyquist Theorem
Idea: if f is bandlimited to W (so it only has frequencies in the interval [−2πW, 2πW ]), from dis-
crete samples we can reconstruct the full continuous function. That is, we can reconstruct the full
1 n

signal from samples taken at every 2W seconds. So the full function f (t) is determined by f 2W ,
for n ∈ Z.

2πW
n
Z
f (n/2W ) = 1/2π F (ω) exp(iω )dw (65)
−2πW 2W

References
[1] Thomas M Cover and Joy A Thomas. Elements of information theory. John Wiley & Sons,
2012.

APMA 1710 Page 25 David Abel

Elements of Information Theory: Thomas M - Cover
No ratings yet
Elements of Information Theory: Thomas M - Cover
8 pages
Data Mining - Business Report: Clustering Clean - Ads
100% (4)
Data Mining - Business Report: Clustering Clean - Ads
24 pages
(397 P. COMPLETE SOLUTIONS) Elements of Information Theory 2nd Edition - COMPLETE Solutions Manual (Chapters 1-17)
85% (55)
(397 P. COMPLETE SOLUTIONS) Elements of Information Theory 2nd Edition - COMPLETE Solutions Manual (Chapters 1-17)
397 pages
Joseph Victor Michalowicz, Jonathan M. Nichols, Frank Bucholtz - Handbook of Differential Entropy (2013, Chapman and Hall - CRC) PDF
No ratings yet
Joseph Victor Michalowicz, Jonathan M. Nichols, Frank Bucholtz - Handbook of Differential Entropy (2013, Chapman and Hall - CRC) PDF
241 pages
Numerical Methods in Finance. Part A. (2010-2011)
No ratings yet
Numerical Methods in Finance. Part A. (2010-2011)
23 pages
Lecture Notes in Quantum Mechanics: Fundamentals I The Green Function Approach ( )
No ratings yet
Lecture Notes in Quantum Mechanics: Fundamentals I The Green Function Approach ( )
300 pages
Calculus 3 Practice Problems, Methods, and Solution
100% (1)
Calculus 3 Practice Problems, Methods, and Solution
143 pages
Web Coding Book
No ratings yet
Web Coding Book
550 pages
Syllabus
No ratings yet
Syllabus
12 pages
IAL FP3 June 2019 Mark Scheme PDF
No ratings yet
IAL FP3 June 2019 Mark Scheme PDF
28 pages
Least Mean Square Adaptive Filters
No ratings yet
Least Mean Square Adaptive Filters
502 pages
Dynamique SCIA 11
No ratings yet
Dynamique SCIA 11
151 pages
NX Nastran Basic Dynamic Analysis Users Guid
No ratings yet
NX Nastran Basic Dynamic Analysis Users Guid
368 pages
MATLAB Linear Algebra Functions
No ratings yet
MATLAB Linear Algebra Functions
15 pages
It Lectures
No ratings yet
It Lectures
342 pages
Matrix (Mathematics)
No ratings yet
Matrix (Mathematics)
19 pages
Lecture 3 State-Space Solutions and Realization - v1
No ratings yet
Lecture 3 State-Space Solutions and Realization - v1
76 pages
Ed
No ratings yet
Ed
300 pages
Information Theory For Electrical Engineers
No ratings yet
Information Theory For Electrical Engineers
277 pages
Webxyu Xcoding Book
No ratings yet
Webxyu Xcoding Book
391 pages
Course Outline of BS Mathematics in UoK
No ratings yet
Course Outline of BS Mathematics in UoK
46 pages
Jawaharlal Nehru Technological University Hyderabad: B.Tech. I Year COURSE STRUCTURE & SYLLABUS (R18 Regulations)
No ratings yet
Jawaharlal Nehru Technological University Hyderabad: B.Tech. I Year COURSE STRUCTURE & SYLLABUS (R18 Regulations)
27 pages
Mit6 441s16 Course Notes
No ratings yet
Mit6 441s16 Course Notes
295 pages
Information Theory: Mike Brookes E4.40, ISE4.51, SO20
No ratings yet
Information Theory: Mike Brookes E4.40, ISE4.51, SO20
114 pages
Lecture Notes
100% (1)
Lecture Notes
324 pages
Angular Momentum University of Notre Dame
No ratings yet
Angular Momentum University of Notre Dame
25 pages
Information Theory: Lecture Notes For
No ratings yet
Information Theory: Lecture Notes For
193 pages
Gate 2018 Syllabus
No ratings yet
Gate 2018 Syllabus
3 pages
Information Theory and Coding: Universit' A Degli Studi Di Siena Facolt'a Di Ingegneria
No ratings yet
Information Theory and Coding: Universit' A Degli Studi Di Siena Facolt'a Di Ingegneria
156 pages
Elements of Information Theory-Chapter1-2
No ratings yet
Elements of Information Theory-Chapter1-2
63 pages
Advanced Topics Information Theory-Lecture Notes - Stefan M. Moser 2.5 PDF
No ratings yet
Advanced Topics Information Theory-Lecture Notes - Stefan M. Moser 2.5 PDF
416 pages
Information Theory
No ratings yet
Information Theory
122 pages
Shannon's Theorems: Math and Science Summer Program 2020
No ratings yet
Shannon's Theorems: Math and Science Summer Program 2020
28 pages
Renes Lecture Notes14 PDF
No ratings yet
Renes Lecture Notes14 PDF
187 pages
Week 10 - Differential Entropy
No ratings yet
Week 10 - Differential Entropy
22 pages
An Exact Analysis For Vibration of Simply-Supported Homogeneous and Laminated Thick Rectangular Plates PDF
No ratings yet
An Exact Analysis For Vibration of Simply-Supported Homogeneous and Laminated Thick Rectangular Plates PDF
13 pages
Information Theory
No ratings yet
Information Theory
114 pages
Teaching Plan - MA-102 - Engineering Mathematics-II - Odd Sem 2022
No ratings yet
Teaching Plan - MA-102 - Engineering Mathematics-II - Odd Sem 2022
3 pages
Quantecon Python
100% (4)
Quantecon Python
1,413 pages
EE 376A: Information Theory: Lecture Notes
No ratings yet
EE 376A: Information Theory: Lecture Notes
75 pages
Information Theory and Applications
No ratings yet
Information Theory and Applications
293 pages
R19 CHEM Syllabus I Year
No ratings yet
R19 CHEM Syllabus I Year
52 pages
Web Coding Book
No ratings yet
Web Coding Book
473 pages
Three Tutorial Lectures
No ratings yet
Three Tutorial Lectures
36 pages
Lecture Notes in Information Theory Volume II
No ratings yet
Lecture Notes in Information Theory Volume II
293 pages
Lec 04
No ratings yet
Lec 04
70 pages
Quantum Information Theory (Lecture Notes)
No ratings yet
Quantum Information Theory (Lecture Notes)
101 pages
Information Theory Lecture Notes
100% (1)
Information Theory Lecture Notes
97 pages
CSE291 Course Notes
No ratings yet
CSE291 Course Notes
69 pages
Cs Theorists Toolkit
No ratings yet
Cs Theorists Toolkit
95 pages
Information Theory For Single-User Systems With Arbitrary Statistical Memory
No ratings yet
Information Theory For Single-User Systems With Arbitrary Statistical Memory
111 pages
Information Theory: Ying Nian Wu UCLA Department of Statistics
No ratings yet
Information Theory: Ying Nian Wu UCLA Department of Statistics
41 pages
Syllabus RKMV
No ratings yet
Syllabus RKMV
1 page
Introduction To Probability and Random Signals
100% (9)
Introduction To Probability and Random Signals
139 pages
M1 R08 MayJune 12 PDF
No ratings yet
M1 R08 MayJune 12 PDF
3 pages
Notes It
No ratings yet
Notes It
46 pages
TIFR Written 2024
No ratings yet
TIFR Written 2024
7 pages
Probabilistic Methods in Information Theory
No ratings yet
Probabilistic Methods in Information Theory
48 pages
ELEN0060-2 Information and Coding Theory: Université de Liège
No ratings yet
ELEN0060-2 Information and Coding Theory: Université de Liège
7 pages
Electrical Engineering 229A Lecture Notes Information Theory and Coding
No ratings yet
Electrical Engineering 229A Lecture Notes Information Theory and Coding
117 pages
Elements of Information Theory 2006 Thomas M. Cover and Joy A. Thomas
No ratings yet
Elements of Information Theory 2006 Thomas M. Cover and Joy A. Thomas
16 pages
Notes For EE 229A: Information and Coding Theory UC Berkeley Fall 2020
100% (1)
Notes For EE 229A: Information and Coding Theory UC Berkeley Fall 2020
70 pages
Lec7 InformationTheory
No ratings yet
Lec7 InformationTheory
41 pages
Lecture Notes
No ratings yet
Lecture Notes
495 pages
Problem Set 1
No ratings yet
Problem Set 1
2 pages
Randomised Algorithm
No ratings yet
Randomised Algorithm
385 pages
K It UKFq 5 Vi
No ratings yet
K It UKFq 5 Vi
14 pages
Lecture 1: Introduction, Entropy and ML Estimation
No ratings yet
Lecture 1: Introduction, Entropy and ML Estimation
5 pages
Ann Unit V
No ratings yet
Ann Unit V
30 pages
Info Theory
No ratings yet
Info Theory
59 pages
Chapter 16
No ratings yet
Chapter 16
71 pages
Inf Theory 3
No ratings yet
Inf Theory 3
76 pages
ACaticha-Entropic Physics Book-July 2022
No ratings yet
ACaticha-Entropic Physics Book-July 2022
364 pages
Stanford Statistics311 InformationTheoryAndStatistics
No ratings yet
Stanford Statistics311 InformationTheoryAndStatistics
304 pages
Full Notes
No ratings yet
Full Notes
197 pages
Experiment 6
No ratings yet
Experiment 6
6 pages
Campa Presentation
No ratings yet
Campa Presentation
19 pages
1669-Article Text-6807-2-10-20230602
No ratings yet
1669-Article Text-6807-2-10-20230602
8 pages
EC39201 - Expt4 - Lab Report - Grp-24
No ratings yet
EC39201 - Expt4 - Lab Report - Grp-24
5 pages
Exp 04 SpeechRecTemporal
No ratings yet
Exp 04 SpeechRecTemporal
1 page
Thesis THE FIRST ONE
No ratings yet
Thesis THE FIRST ONE
2 pages
Week 4 - Channel Capacity (Chapter 7) and Differential Entropy (Chapter 8)
No ratings yet
Week 4 - Channel Capacity (Chapter 7) and Differential Entropy (Chapter 8)
16 pages
Week 3 - AEP (Chapter 3) and Channel Coding (Chapter 7)
No ratings yet
Week 3 - AEP (Chapter 3) and Channel Coding (Chapter 7)
10 pages
Functional and Structural Tensor Analysis For Engineers Draft Rebecca M Brannon Download
No ratings yet
Functional and Structural Tensor Analysis For Engineers Draft Rebecca M Brannon Download
83 pages
A Federated Learning-Based Industrial Health Prognostics For Heterogeneous Edge Devices Using Matched Feature Extraction
No ratings yet
A Federated Learning-Based Industrial Health Prognostics For Heterogeneous Edge Devices Using Matched Feature Extraction
15 pages
Quantitative Economics With Python
No ratings yet
Quantitative Economics With Python
300 pages
Multi-If: Benchmarking Llms On Multi-Turn and Multilingual Instructions Following
No ratings yet
Multi-If: Benchmarking Llms On Multi-Turn and Multilingual Instructions Following
23 pages
Remotesensing 12 04052
No ratings yet
Remotesensing 12 04052
21 pages
Hamlet Had an Uncle: A Comedy of Honor
From Everand
Hamlet Had an Uncle: A Comedy of Honor
James Branch Cabell
4.5/5 (7)
Gray Hat Hacking the Ethical Hacker's
From Everand
Gray Hat Hacking the Ethical Hacker's
Çağatay Şanlı
5/5 (1)
Kellory the Warlock
From Everand
Kellory the Warlock
Lin Carter
No ratings yet
Operation Exile
From Everand
Operation Exile
E. Hoffmann Price
3.5/5 (1)
A Discourse Analysis of 1 Peter
From Everand
A Discourse Analysis of 1 Peter
Ervin Ray Starwalt
No ratings yet
Osama the Gun
From Everand
Osama the Gun
Norman Spinrad
5/5 (1)
Bimbo Heaven: Stone Angel #7
From Everand
Bimbo Heaven: Stone Angel #7
Marvin H. Albert
No ratings yet
Deadline Yemen (The Elizabeth Darcy Series)
From Everand
Deadline Yemen (The Elizabeth Darcy Series)
Peggy Hanson
5/5 (1)