EC02-Information Coding
EC02-Information Coding
Ing. Aeroespacial
Computers
2
Information coding Informática
Basic concepts
Numbering systems
Encoding signed numbers
Encoding real numbers
Encoding texts
Redundancy and compression
3
Introduction Informática
6
Information coding Informática
Basic concepts
Numbering systems
Encoding signed numbers
Encoding real numbers
Encoding texts
Redundancy and compression
7
Numbering systems Informática
(1)
Example:
- In decimal b =10 and D= {0,1,2, ..., 9}
8
Numbering systems Informática
26 2
13 2
0 2
1 6 2
0 3 1 2
26 (10 = 11010 (2 1
1 0 Stop
9
Numbering systems Informática
Binary arithmetics
Truth tables for operations with one bit:
Overflows:
(**) 1 plus 1 = 0 carry 1
(*) 0 minus 1 = 1 borrow 1
12
Numbering systems Informática
Binary arithmetics
Algorithms for operations with two bits or more:
carry
13
Numbering systems Informática
8 A C 2 16
14
Information coding Informática
Basic concepts
Numbering systems
Encoding signed numbers
Encoding real numbers
Encoding texts
Redundancy and compression
15
Encoding signed numbers Informática
Sign-and-magnitude representation
Allocate the Most Significant Bit (MSB) to represent the sign.
- 0 for positive numbers, 1 for negative numbers.
The remaining bits indicate the magnitude (or absolute value).
Example:
- 001010102 = 4210 , 101010102 = -4210
Representation range with n bits: [−2n−1 + 1, ..., 2n−1 − 1].
- With 8 bits: [−127,...,+127]
Disadvantages:
- Two different zeros: 00000000 (0) and 10000000 (-0).
16
Encoding signed numbers Informática
Two’s-complement representation
The representation of a negative number -x in n-bits is defined as its two’s-
complement.
The two’s complement can be calculated in decimal as 2n − x modulus 2n:
‣ -x ≡ C2(x,n) = (2n − x) % 2n (with |x| < 2n)
‣ Examples:
‣ C2(3,4) = (24 − 3) % 24 = 1310 = 11012 → -3
‣ C2(3,8) = (28 − 3) % 28 = 25310 = 1111 11012 (sign extension) → -3
‣ C2(-3,4)= (24 + 3) % 24 = 310 = 00112 → 3
‣ C2(0,4) = (24 − 0) % 24 = 010 = 00002 → 0
17
Encoding signed numbers Informática
Two’s-complement representation
The two’s complement of a binary n-bit representation is a new
representation with range [−2n−1, ..., 2n−1 − 1] in which:
- Codes [0,... ,2n−1 − 1] → positive integers [0,... ,2n−1 − 1]
- Codes [2n−1,... ,2n − 1] → negative integers [-2n-1,... ,− 1]
Example
- n=4 → range = [-8, 7]
- codes [0,7]→positive integers [0,7], codes [8,15] → negative integers [-8,-1]
Binary code unsigned Two’s compl Binary code unsigned Two’s compl
0000 0 0 1000 8 -8
0001 1 1 1001 9 -7
0010 2 2 1010 10 -6
0011 3 3 1011 11 -5
0100 4 4 1100 12 -4
0101 5 5 1101 13 -3
0110 6 6 1110 14 -2
0111 7 7 1111 15 -1
18
Encoding signed numbers Informática
Two’s-complement representation
Converting from decimal to two’s complement:
- if it is a positive number, then convert directly to binary:
‣ Example: 5: 0101
- if it is a negative number, then first convert its absolute value to binary, and
then obtain the two’s complement: 1) Invert the bits 2) Add one;
‣ Example: –3: 0011 →(invert)→ 1100 →(add 1)→ 1101
19
Encoding signed numbers Informática
Two’s-complement representation
Property 1: x - y = x + (-y) = x + C2(y,n)
‣ Example
‣ 2 - 3 = 0010 − 0011 = 1111 = -110
‣ 2 - 3 = 2 + (-3) = 0010 + 1101 = 1111 = -110
- Example 2 (Corolary)
‣ -(-x) = x i.e. applying twice the two‘s complement you obtain the original number
‣ -3 = 1101; -(-3) 1101→(invert)→ 0010 →(add 1)→ 0011 (it‘s 3)
20
Encoding signed numbers Informática
Excess-K representation
Excess-K (also called offset binary or biased representation) of an n-bit
representation is a representation with range [−K, ..., 2n−1−K] that uses a
pre-specified number K as a biasing value to displace the origin of the
representation so as to map the most negative number of the representation
(-K) to the code 0000. The value of K is the offset.
Example
- Excess-K, K=8, n=4 → range = [-8, 7]
- codes [0,7]→negative integers [-8,-1], codes [8,15] → positive integers [0,7]
Binary code unsigned Excess-8 Binary code unsigned Excess-8
0000 0 -8 1000 8 0
0001 1 -7 1001 9 1
0010 2 -6 1010 10 2
0011 3 -5 1011 11 3
0100 4 -4 1100 12 4
0101 5 -3 1101 13 5
0110 6 -2 1110 14 6
0111 7 -1 1111 15 7 21
Encoding signed numbers Informática
Excess-K representation
Converting from decimal to excess-K:
- Add K to x in decimal and then convert it to binary
- Examples: Assume K=8, n=4
‣ x=-3 →(add 8)→ -3+8=5 →(binary)→ 0101
‣ x=3 →(add 8)→ 3+8=11 →(binary)→ 1011
Converting from excess-K to decimal:
- Convert it to decimal and then subtract K
- Examples: Assume K=8, n=4
‣ x=0011 →(decimal)→ 3 →(subtract 8)→ 3-8=-5
‣ x=1011 →(decimal)→ 11 →(subtract 8)→ 11-8=3
22
Encoding signed numbers Informática
Excess-K representation
Property 1: it is monotonic increasing, so it eases to perform comparisons (>,
<, etc.).
Property 2: the excess-K representation with K=2n−1 matches the two’s-
complement representation by inverting the most significant bit (sign bit).
23
Encoding signed numbers Informática
Basic concepts
Numbering systems
Encoding signed numbers
Encoding real numbers
Encoding texts
Redundancy and compression
25
Encoding real numbers Informática
Fixed-point representation
It has a fixed number of digits n after the decimal point and (sometimes) a
fixed number of digits m before the decimal point.
- Examples in decimal: n = 3, m = 4 → 0200.003, 1200.100
- Examples in binary: n = 3, m = 4 → 0100.001, 1100.100
Drawbacks: loss of accuracy and overflow
- Overflow occurs, f.e., when the result of a fixed-point multiplication could
potentially have as many bits as the sum of the number of bits in the two
operands.
‣ Example 1000.101 x 0100.001 = ???.
- Loss of accuracy occurs, f.e., after a sequence of operations with truncated
results.
‣ Example with n = 3 and m = 4
‣ 0023.941 x 0000.001 = 0000.023.
‣ If we want to recover the original number: 0000.023 x 1000.000 = 0023.000
26
Encoding real numbers Informática
Floating-point representation
It consists of a fixed number of significant digits, called mantissa, which are
scaled using an exponent. The base for the scaling is usually 2 or 10:
mantissa × baseexponent
- Examples of the same number using different exponents (scaling factors):
1125.0×100 112.5×101 11.25×102 1.125×103 0.1125×104
- The point can ”float”, i.e., be placed anywhere relative to the significant digits
of the number.
Normalized representation: the one that the point follows the most
significant digit different from zero: 1.125×103.
Advantage: it supports a much wider range of values with the same number
of digits
27
Encoding real numbers Informática
2.2251e-308
1.7977e+308
29
Encoding real numbers Informática
31
Encoding real numbers Informática
32
Information coding Informática
Basic concepts
Numbering systems
Encoding signed numbers
Encoding real numbers
Encoding texts
Redundancy and compression
33
Encoding texts Informática
decimal hex character decimal hex char decimal hex char decimal hex char
The ASCII code
American
Standard
Code for
Information
Interchange
34
Encoding texts Informática
35
Encoding texts Informática
- UTF-16 - it uses a 16-bit code for the Basic Multilingual Plane (BMP) and two 16-bit
(surrogates pairs) for additional less frequent planes.
- UTF-32 - 32-bit encoding of fixed length, and the simplest of the three.
World Wide Web was ASCII until December 2007, when it was surpassed by UTF-8.
38
Encoding texts Informática
Formatted text
A markup language is a way to encrypt a document which, in addition to the
text, includes labels or markings to specify the structure of the text.
Presentational markup: used by traditional text editors. Marking is performed
by the text editor in such a way that marking is hidden from human users
producing the WYSIWYG (What You See Is What You Get) effect.
Procedural marking: used by LaTeX and some HTML editors. In these systems
the user explicitly writes the formatting labels in the source file.
HTML example (web pages)
39
Information coding Informática
Basic concepts
Numbering systems
Encoding signed numbers
Encoding real numbers
Encoding texts
Redundancy and compression
40
Redundancy and compression Informática
Sample exercise
A meteorological satellite takes 2 images every second, with a resolution
of 1920x1080 pixels (dots) using 8 bits to encode each colour (reg, green,
blue: RGB), which has to be transmitted to ground using a wireless link.
- A) Obtain the space required (in bytes) to store the image..
‣ For each pixel (dots) it is required 24 bits = 3bytes
1920x1080x3=6,220,800 bytes that is roughly 5.9MB.
- B) Calculate the necessary transfer rate (bps) to transmit the 2 images
‣ Each image occupies 6,220,800 bytes, which are 49,766,400 bits. If you have to
transmit 2 images per second, then:
49,766,400x2=99,532,800 bps = approx. 100 Mbps.
In computing, the capacity of the link (transfer rate) is usually called bandwidth.
- Samples: Ethernet 10Mbs-1Gbps, WiFi: 55-600Mbps
Redundancy and compression Informática
Redundant encoding
Information may get corrupted when it is transmitted through
communication lines or stored in disks or other storing devices.
Redundancy is used to detect and to detect-and-correct errors.
- Error detection: parity bit, checksums
- Error detection and correction: ECC. They require higher levels of
redundancy.
43
Redundancy and compression Informática
Redundant encoding
A parity bit: redundant bit added to a set of bits to ensure that the number of
bits with value 1 in the outcome is even or odd.
- Even parity: 100 00111
- Odd parity: 100 00111
Parity bits are often used when transmitting ASCII characters from/to
peripherals.
44
Redundancy and compression Informática
Information compression
Data compression: process of transforming an encoded information using
fewer bits than the original representation uses.
- Goal: to reduce the information volume and the consumption of expensive
resources, such as hard disk space or transmission bandwidth.
It has a cost: extra processing for compressing-decompressing.
- Trade-off between the costs of encoding and decoding: time consuming
compression → time efficient decompressing. And viceversa.
Two types of compression:
- Lossless compression: the encoded data is not distortioned or modified, so
it can reconstructed from the compressed data.
‣ Example: text compression. ZIP format
- Lossy compression: the original data is only approximately represented. It
only allows to reconstruct an approximation of the original data.
‣ Example: image/audio compression. PNG, GIF, MPEG, MP3 formats
45
Redundancy and compression Informática
Lossless compression
Lossless algorithms usually exploit statistical redundancy in such a way that
more frequent data are represented with fewer bits.
Huffman coding
- Example: text with only four characters: ’ ’, ’A’, ’B’, ’C’ with frequencies
45%, 35%, 15% and 5% respectively.
- Compressing ratio:
46
Redundancy and compression Informática
Lossy compression
It compresses data by discarding (losing) some of it.
Usually based on perceptual coding: transforming the raw data obtained
from a device to a domain that more accurately reflects the information
content.
- Example: a sound file can be more efficiently represented as the frequency
spectrum over time than as the amplitude levels.
Lossy encoding/decoding programs are usually known as codecs.
Key point: required accuracy or Quality of Service (QoS).
- Example: image qualities for video conference 640x480, 800x600,
1920x1080, ...
47