0% found this document useful (0 votes)
5 views

Source coding - Suton

The document discusses source coding in communications, detailing the functions of source encoders and decoders, including multiplexing, filtering, encryption, watermarking, and compression. It explains the concepts of information, redundancy, and how to quantify information using bits, as well as the principles of entropy and coding efficiency. Additionally, it covers fixed and variable length coding, Huffman coding, and arithmetic coding as methods for efficient data representation.

Uploaded by

JOB OMONDI
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Source coding - Suton

The document discusses source coding in communications, detailing the functions of source encoders and decoders, including multiplexing, filtering, encryption, watermarking, and compression. It explains the concepts of information, redundancy, and how to quantify information using bits, as well as the principles of entropy and coding efficiency. Additionally, it covers fixed and variable length coding, Huffman coding, and arithmetic coding as methods for efficient data representation.

Uploaded by

JOB OMONDI
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

ELEC1011 Communications and Control

(7/12) Source Coding


ELEC1011 Communications: Source Coding Rob Maunder

Communication schemes

L7/12: p1/20 1
ELEC1011 Communications: Source Coding Rob Maunder

Source enCOder and DECoder (codec) (CEP 1.4.3.1 – 1.4.3.2)

• Source encoder converts the information to a format that is suitable for transmission.

• Source decoder converts it back again.

• e.g.
– Multiplexing combines several signals into one. e.g. the left and right channels
of stereo audio, the red, green and blue components of component video or the
audio and video of a television signal.
– Low Pass Filtering (LPF) to limit the bandwidth of the signal in order to avoid
aliasing or to reduce the amount of spectrum required. (Lecture 6)
– Analogue-to-Digital Conversion (ADC) if we want to use digital modulation to
transmit an analogue signal. This uses sampling and quantisation. (Lecture 1)

L7/12: p2/20 2
ELEC1011 Communications: Source Coding Rob Maunder

Source enCOder and DECoder (codec) continued

• More examples of source codec functions


– Encryption to protect the information from being decoded by an unauthorised
receiver. Decoding is performed using a key that is built in to authorised receivers
or SIM cards.
– Watermarking to prove that the information has not been tampered with or sent
from an unauthorised transmitter. The source decoder compares the received
watermark with one that it has built in.
– Compression to reduce the amount of information we have to transmit.

L7/12: p3/20 3
ELEC1011 Communications: Source Coding Rob Maunder

Information and redundancy

• Any message can be considered to contain two things:


– Information, which cannot be removed without harming the integrity of the
message.
– Redundancy, which can be removed without harming the integrity of the message.

• For example, suppose you wanted your friend to know when your evening class
starts. You could say “My evening class starts at 7pm”. Here, the “pm” part is
redundant. It wouldn’t be an evening class if it started at 7am. The message can be
shortened to “My evening class starts at 7” without losing any of the information.

L7/12: p4/20 4
ELEC1011 Communications: Source Coding Rob Maunder

Compression

• Compression is a source coding technique for reducing the length of a message


required to convey some information:
– Lossless compression removes only redundancy from the message. e.g. zip file
to compress a computer program.
– Lossy compression also removes some information, but (hopefully) only the least
important information. e.g. jpeg file to compress an image.

L7/12: p5/20 5
ELEC1011 Communications: Source Coding Rob Maunder

Quantifying information

• The amount of information in a message can be quantified in bits.

• For example, a message that conveys the result of flipping a coin contains k = 1
bit of information. This is because there are N = 2 (equally likely) outcomes of
flipping a coin (heads and tails) and N = 2 values that k = log2(N ) = 1 bit can
have (0 and 1).

• If somebody asked me which season I was born in, my reply would contain k = 2 bits
of information. This is because there are N = 4 (equally likely) replies that I could
give (Winter, Spring, Summer, Autumn) and N = 4 values that k = log2(N ) = 2
bits can have (00, 01, 10 and 11).

• A message that conveys the result of throwing an N = 6-sided dice would contain
k = log2(N ) = 2.59 bits of information. A message doesn’t have to contain an
integer number of bits of information!

L7/12: p6/20 6
ELEC1011 Communications: Source Coding Rob Maunder

Quantifying information cont


• When two dice are rolled, different
• Suppose a message is selected from a sums occur with different probabilities...
set of N possibilities. The number of i pi ki = log2(1/pi)
bits of information ki that is conveyed 2 1/36 5.17
by the ith possibility is related to its 3 2/36 4.17
probability of being selected pi according 4 3/36 3.59
to ki = log2(1/pi) 5 4/36 3.17
6 5/36 2.85
• In the case of the N = 6-sided dice, 7 6/36 2.59
each possibility (1,2,3,4,5,6) has the 8 5/36 2.85
same probability pi = 1/6. When every
9 4/36 3.17
possibility has the same probability, we
10 3/36 3.59
get 1/pi = N and ki = log2(N ) as on
11 2/36 4.17
the previous slide.
12 1/36 5.17

L7/12: p7/20 7
ELEC1011 Communications: Source Coding Rob Maunder

Entropy
The entropy H of a source is equal to the expected (i.e. average) information content
of its messages.
XN N
X
H= p i ki = pi log2(1/pi)
i=1 i=1

The entropy of a six-sided dice is H = 2.59 bits.


The entropy of the sum of two-six sided dice is H = 3.27 bits.
Note that this is less than the entropy of two six-sided dice, which is H = 2 × 2.59 =
5.18 bits.
It makes sense for the sum of the two dice rolls to contain less information than the
dice rolls separately. This is because the sum could be calculated if you knew the two
dice rolls, but the two dice rolls couldn’t always be determined if you only knew the
sum.
It is impossible to losslessly compress messages from a particular source using an
average number of bits that is less than the entropy.

L7/12: p8/20 8
ELEC1011 Communications: Source Coding Rob Maunder

Fixed length coding


For the sum of two dice example...
i pi ki = log2(1/pi) ci
• Similar to Pulse Coded 2 1/36 5.17 1110
Modulation (PCM), fixed 3 2/36 4.17 1101
length coding represents each 4 3/36 3.59 1100
of the N message possibilities i 5 4/36 3.17 1000
with a different codeword ci. 6 5/36 2.85 0100
7 6/36 2.59 0000
• In order for the codewords to 8 5/36 2.85 0010
be unique, they need to have a 9 4/36 3.17 0001
length of L = dlog2(N )e. 10 3/36 3.59 0011
11 2/36 4.17 1011
• The coding efficiency is given by 12 1/36 5.17 0111
R = H/L and can never be H = 3.27, L = 4, R = 0.82.
greater than 1. The sequence of sums [2,7,4,8,7,8,3,7,12]
is encoded as the sequence of 36 bits
111000001100001000000010110100000111.

L7/12: p9/20 9
ELEC1011 Communications: Source Coding Rob Maunder

Variable length encoding


For the sum of two dice example...
i pi ki = log2(1/pi) ci li
• Variable length coding represents
each of the message possibilities 2 1/36 5.17 10000 5
i with different codewords ci 3 2/36 4.17 0110 4
having various lengths li. 4 3/36 3.59 1001 4
5 4/36 3.17 001 3
• No codeword is allowed to be a 6 5/36 2.85 101 3
prefix of any other. 7 6/36 2.59 111 3
8 5/36 2.85 110 3
• The average codeword length is 9 4/36 3.17 010 3
PN
given by L = i=1 pili. 10 3/36 3.59 000 3
11 2/36 4.17 0111 4
• The coding efficiency is given by 12 1/36 5.17 10001 5
H = 3.27, L = 3.31, R = 0.99.
R = H/L and can never be
The sequence of sums [2,7,4,8,7,8,3,7,12]
greater than 1.
is encoded as the sequence of 33 bits
100001111001110111110011011110001.

L7/12: p10/20 10
ELEC1011 Communications: Source Coding Rob Maunder

Variable length decoding


• The codebook on the previous slide can be represented by this binary tree.
0
1

10 5 9 6 8 7

3 11 4

2 12

• The sequence of bits 100001111001110111110011011110001 can be decoded by


repeatedly traversing the binary tree from the root node.
• When a leaf node is reached, a sum of two dice is identified and we start at the root
node again.
• We get the same sequence of sums [2,7,4,8,7,8,3,7,12] that we started with.

L7/12: p11/20 11
ELEC1011 Communications: Source Coding Rob Maunder

Huffman coding

• The example on the previous slides is a Huffman code, which is a special type of
variable length code.

• Huffman codewords are specially allocated in order to minimise the average


codeword length L and therefore maximise the coding efficiency R.

• Short codewords are used to represent frequently occurring message possibilities


and long codewords are used to represent rare ones.

• More specifically, the message possibility i is allocated a codeword ci having an


integer-valued length li that is typically close to the possibility’s information content
ki .

• The design of Huffman codes is not within the scope of ELEC1011, but the use of
them is.

L7/12: p12/20 12
ELEC1011 Communications: Source Coding Rob Maunder

Arithmetic coding

• The coding efficiency of Huffman coding is limited because it has to use an integer
number of bits li to represent the message possibility i.

• It would have an efficiency of R = 1 if it could use ki number of bits to represent


the message possibility i.

• This motivates arithmetic coding, which represents a sequence of messages together,


rather than individually.

L7/12: p13/20 13
ELEC1011 Communications: Source Coding Rob Maunder

Arithmetic encoding step 1

• Draw a number line that goes from 0 to 1.

• Divide the number line into N portions, where the ith portion has a width equal to
the probability of the ith message possibility pi.
1/36
2/36

3/36

4/36

5/36

6/36

5/36

4/36

3/36

2/36
1/36
2 3 4 5 6 7 8 9 10 11 12
0.0000000000
0.0277777778

0.0833333333

0.1666666667

0.2777777778

0.4166666667

0.5833333333

0.7222222222

0.8333333333

0.9166666667
0.9722222222
1.0000000000
L7/12: p14/20 14
L7/12: p15/20
ELEC1011 Communications: Source Coding

the number range 0.0122125384 to 0.0122125387.


divide that portion using the message probabilities.
represents the next message in the sequence and

• The sequence of messages [2,7,4,8,7,8,3,7,12] gives


in the sequence, a numerical range has been identified.
• Once a portion has been selected for the last message
• Repeatedly select the portion of the number line that

0.0122125272 0.0122124985 0.0122124640 0.0122072545 0.0121849280 0.0119598765 0.0115740741 0.0000000000 0.0000000000


2
2
2
2
2
2
2
2
2

0.0122125275 0.0122125004 0.0122124985 0.0122075026 0.0121864164 0.0119705933 0.0117026749 0.0007716049 0.0277777778


Arithmetic encoding step 2

3
3
3
3
3
3
3
3
3

0.0122125282 0.0122125042 0.0122125674 0.0122079987 0.0121893933 0.0119920267 0.0119598765 0.0023148148 0.0833333333


4
4
4
4
4
4
4
4
4

0.0122125291 0.0122125100 0.0122126708 0.0122087430 0.0121938586 0.0120241770 0.0123456790 0.0046296296 0.1666666667


5
5
5
5
5
5
5
5
5

0.0122125304 0.0122125176 0.0122128086 0.0122097352 0.0121998123 0.0120670439 0.0128600823 0.0077160494 0.2777777778


6
6
6
6
6
6
6
6
6

0.0122125320 0.0122125272 0.0122129809 0.0122109756 0.0122072545 0.0121206276 0.0135030864 0.0115740741 0.4166666667


7
7
7
7
7
7
7
7
7

0.0122125339 0.0122125387 0.0122131876 0.0122124640 0.0122161851 0.0121849280 0.0142746914 0.0162037037 0.5833333333


8
8
8
8
8
8
8
8
8

0.0122125355 0.0122125483 0.0122133599 0.0122137044 0.0122236273 0.0122385117 0.0149176955 0.0200617284 0.7222222222


9
9
9
9
9
9
9
9
9

0.0122125368 0.0122125559 0.0122134977 0.0122146967 0.0122295810 0.0122813786 0.0154320988 0.0231481481 0.8333333333


10
10
10
10
10
10
10
10
10

0.0122125377 0.0122125617 0.0122136010 0.0122154409 0.0122340464 0.0123135288 0.0158179012 0.025462963 0.9166666667


0.0122125384 0.0122125655 0.0122136699 0.0122159371 0.0122370232 0.0123349623 0.0160751029 0.0270061728 0.9722222222
11 12
11 12
11 12
11 12
11 12
11 12
11 12
11 12
11 12

0.0122125387 0.0122125674 0.0122137044 0.0122161851 0.0122385117 0.0123456790 0.0162037037 0.0277777778 1.0000000000


Rob Maunder

15
ELEC1011 Communications: Source Coding Rob Maunder

Arithmetic encoding step 3

• Find the shortest binary fraction that represents a number in the identified decimal
range.

• 0.0000001100100000010111000110011 is the shortest binary fraction in the


decimal range 0.0122125384 to 0.0122125387.

1 1 1 1 1 1 1 1 1 1 1
• This gives 2 7 + 2 8 + 2 11 + 2 18 + 2 20 + 2 21 + 2 22 + 2 26 + 2 27 + 2 30 + 231
= 0.0122125386.

• Since the binary fraction will always start with “0.” we only transmit the bits after
the binary point.

• The sequence of messages [2,7,4,8,7,8,3,7,12] is represented by 31 bits using an


arithmetic code, 33 bits using a Huffman code and 36 bits using a fixed length
code.

L7/12: p16/20 16
ELEC1011 Communications: Source Coding Rob Maunder

Arithmetic encoding step 3 cont 0.000000000


+ 211 ?
0.000000000 0.500000000
1
don’t add 2p + 212 ?
add 1 0.000000000 0.250000000
2p
+ 213 ?
0.000000000 0.125000000
+ 214 ?
0.000000000 0.062500000

We can determine the shortest binary fraction within the 0.000000000


+ 216 ?
+ 215 ?
0.031250000

desired range by building a binary tree as follows. 0.000000000


0.000000000
+ 217 ?
0.007812500
0.015625000

+ 218 ?

Starting at the root node: 0.0078125000 0.0117187500

0.0117187500
+ 219 ?
0.0136718750
+ 2110 ?
0.0117187500 0.0126953125

• if the result on the right-hand branch is within the 0.0117187500


+ 2111 ?
0.0122070313

0.0122070313
+ 2112 ?
0.0124511719

desired range then output a bit value of 1 and stop, 0.0122070313


+ 2114 ?
+ 2113 ?
0.0123291016

0.0122070313 0.0122680664
+ 2115 ?
0.0122070313 0.0122375488
+ 2116 ?

• else if the result on the right-hand branch is below


0.0122070313 0.0122222900
+ 2117 ?
0.0122070313 0.0122146606
+ 2118 ?

the desired range then output a bit value of 1 and 0.0122070313 0.0122108459

0.0122108459
+ 2119 ?
0.0122127533
+ 2120 ?

follow the right-hand branch, 0.0122108459 0.0122117996

0.0122117996
+ 2121 ?
0.0122122765
+ 2122 ?
0.0122122765 0.0122125149
+ 2123 ?
0.0122125149 0.0122126341

• else output a bit value of 0 and follow the left-hand 0.0122125149


+ 2125 ?
+ 2124 ?
0.0122125745

0.0122125149 0.0122125447

branch. + 2126 ?
0.0122125149 0.0122125298
+ 2127 ?
0.0122125298 0.0122125372
+ 2128 ?
0.0122125372 0.0122125410

+ 2129 ?
0.0122125372 0.0122125391

+ 2130 ?
0.0122125372 0.0122125382
+ 2131 ?
0.0122125382 0.0122125386

L7/12: p17/20 17
ELEC1011 Communications: Source Coding Rob Maunder

Arithmetic decoding

Step 1 Convert the received binary sequence into a decimal fraction.

Step 2 Divide a number line from 0 to 1 into portions according to the message
probabilities.

Step 3 Repeatedly select the portion having the range into which the decimal fraction
falls and divide the portion according to the message probabilities. Output the
messages that correspond to the selected portions and stop once the required
number of messages have been output (this assumes that the required number is
known to the receiver).

L7/12: p18/20 18
ELEC1011 Communications: Source Coding Rob Maunder

Exercise
Four friends, Hamilton, Button, Schumacher and Alonso have a race every fortnight.
Hamilton tends to win most often and Alonso tends to win least often, as specified
by the probability pi provided for each racer i in the table below. Some source coding
schemes have been devised to transmit the victor of a sequence of races.
i pi cFLC
i cHuff
i
Hamilton 0.5 00 0
Button 0.25 01 10
Schumacher 0.125 10 110
Alonso 0.125 11 111

1. Determine the amount of information ki that is conveyed by messages saying that


each racer i has won.

2. Determine the entropy H of messages saying who has won.

3. Determine the coding efficiencies R associated with the fixed length codewords
cFLC
i and the Huffman codewords cHuff
i .

L7/12: p19/20 19
ELEC1011 Communications: Source Coding Rob Maunder

Exercise continued

4. Why does the Huffman code perform so well in this case?

5. Determine the bit sequences that result when the sequence of victors
[Schumacher,Hamilton,Button,Hamilton] is represented using the fixed length
codewords cFLC
i , the Huffman codewords cHuff
i and an arithmetic code.

6. Determine the sequence of victors that is represented by the bit sequence 00110001,
which was obtained using the fixed length codewords cFLC i .

7. Draw a binary tree for the Huffman codewords cHuff i and use it to determine the
sequence of victors that is represented by the bit sequence 11111000.

8. Determine the sequence of four victors that is represented by the bit sequence
010111, which was obtained using an arithmetic code.

L7/12: p20/20 20

You might also like