0% found this document useful (0 votes)
32 views

EC02-Information Coding

1) Binary encoding represents information using only two digits, 0 and 1. It is commonly used in digital computers. 2) Numbers can be represented in binary using positional notation, with each bit position corresponding to a power of two. Common binary numbering systems include 8-bit bytes and hexadecimal. 3) Signed numbers can be encoded using either sign-and-magnitude or two's-complement representation in binary. Two's-complement avoids having two representations for zero.

Uploaded by

Jaimuchu 13
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views

EC02-Information Coding

1) Binary encoding represents information using only two digits, 0 and 1. It is commonly used in digital computers. 2) Numbers can be represented in binary using positional notation, with each bit position corresponding to a power of two. Common binary numbering systems include 8-bit bytes and hexadecimal. 3) Signed numbers can be encoded using either sign-and-magnitude or two's-complement representation in binary. Two's-complement avoids having two representations for zero.

Uploaded by

Jaimuchu 13
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

Informática

Ing. Aeroespacial

Computers

C2: Information coding

DISCA: Departamento de Informática de Sistemas y Computadores


Rev. 04
Introduction Informática

How do we interpret all the information perceived by our


senses? (mainly sight and hearing)
What is information?

Do we use codes for all this information?


Finally, how do computers encode information?

2
Information coding Informática

Basic concepts
Numbering systems
Encoding signed numbers
Encoding real numbers
Encoding texts
Redundancy and compression

3
Introduction Informática

Basic concepts (I)


What is information?: is not a simple concept (see Wikipedia)
- Information is data (values, knowledge, facts, etc.) and with a meaning
- We (humans) encode all the perceived information to interpret and store it.
All information processed by a digital computer needs to be encoded:
transformed it into a form of representation suitable for the computer.
- Numerical values: magnitudes of computer applications related to geometry
(longitudes, angles,...), physics (pressure, temperature, volumes, forces,...),
mathematics, statistics, finances, etc.
- Text information in different formats, like books, reports, manuals, etc.
- Media information: graphics, images, videos, sounds, etc.
- Computer programs
Computers encode information using a binary numbers rather than decimal
numbers.
- Binary encoding will be introduced in short.
4
Introduction Informática

Basic concepts (II)


Information range: the area of variation between upper and lower limits of a
magnitude. All the set of different values or codes that the information may
take.
- Example: assume that a magnitude like a length is encoded in decimal with
3 integer digits and 2 decimal digits. The information range is [000.00 ...
999.99].
Information accuracy: information resolution, i.e, the minimum
representable information value.
- Example: in the above representation, accuracy is 0.01 units.
Information volume: amount of information. Number of measurements
(information instances) of a magnitude times the number of digits of each
measurement.
- Example: in the above representation, 1000 length measurements have a
volume of 5000 digits.
- Useful to compute the capacity of a storing device.
5
Introduction Informática

Basic concepts (III)


Information compression: reduction of the information volume by
- Removing redundancy
‣ Represent repeated items in a compact form. Example 1000 consecutive white
pixels in a graphic.
- Reducing accuracy
‣ Use a representation with less digits.

6
Information coding Informática

Basic concepts
Numbering systems
Encoding signed numbers
Encoding real numbers
Encoding texts
Redundancy and compression

7
Numbering systems Informática

Positional numbering systems


Numbers are represented as a sequence of digits where each digit has a
weight according to its position.
Numbering in base b uses set of digits D= {0,1,2, ..., b-1}
Assume X is represented as XnXn−1...X2X1X0,X−1,X−2,...X−m in base b.

(1)

Example:
- In decimal b =10 and D= {0,1,2, ..., 9}

8
Numbering systems Informática

The binary system


In binary b =2 and D= {0,1}
Converting from binary to decimal: use eq. (1)

Converting from decimal to binary:

26 2
13 2
0 2
1 6 2
0 3 1 2
26 (10 = 11010 (2 1
1 0 Stop

9
Numbering systems Informática

The binary system


A bit is binary digit.
The range of a representation in base b with n digits is [0 ... bn−1]
- The range corresponds to Pr(2,n): n-permutations of 2 elements with
repetition.
- Example: b=2, n=3
‣ [0 ... 7] = [000, 001, 010, 011, 100, 101, 110, 111]
A byte is an 8-bit binary code.
- The range of this representation is [010 ... 25510].
The information volume in the binary system is usually measured as the
number of bytes. Multiples of the byte:
- KB Kilobyte = 210 bytes =1,024 bytes
- MB Megabyte = 220 bytes =1,024 Kbytes
- GB Gigabyte = 230 bytes =1,024 Mbytes
- TB Terabyte = 240 bytes =1,024 Gbytes
10
Numbering systems Informática

The binary system


Binary codes can be used to encode numbers, colors, products of a vending
machine, etc...

Information volume: it is usual to round up the result to the nearest power of


two.
- Example: capacity of a disk to store the sequence of codes of sold products in
a vending machine. Assume that the machine capacity is 1000 products and
there are 16 different products.
‣ 16 products require 4-bit encoding
‣ 4 bits x 1000 pr. = 4000 bits = 4000/8 bytes = 500 bytes ≾ 29 bytes = 512
bytes
11
Numbering systems Informática

Binary arithmetics
Truth tables for operations with one bit:

Overflows:
(**) 1 plus 1 = 0 carry 1
(*) 0 minus 1 = 1 borrow 1

Overflows: an operation yields a result that exceeds the representation


range.
- Using a 1-bit representation, 1+1 is 10, which requires a 2-bit representation.
- We say carry is 1.

12
Numbering systems Informática

Binary arithmetics
Algorithms for operations with two bits or more:

Addition Subtraction Multiplication


carry borrow

carry

13
Numbering systems Informática

Hexadecimal (“hex”) system


In hexadecimal b=16, d = {0, 1, 2, ..., 9, A, B, C, D, E, F}
Converting from hex to decimal:

Converting from decimal to hex: algorithm successive divisions


Converting between binary and hex
- Group binary digits in fours. Four binary digits correspond to one hex digit

8 A C 2 16

14
Information coding Informática

Basic concepts
Numbering systems
Encoding signed numbers
Encoding real numbers
Encoding texts
Redundancy and compression

15
Encoding signed numbers Informática

Sign-and-magnitude representation
Allocate the Most Significant Bit (MSB) to represent the sign.
- 0 for positive numbers, 1 for negative numbers.
The remaining bits indicate the magnitude (or absolute value).
Example:
- 001010102 = 4210 , 101010102 = -4210
Representation range with n bits: [−2n−1 + 1, ..., 2n−1 − 1].
- With 8 bits: [−127,...,+127]
Disadvantages:
- Two different zeros: 00000000 (0) and 10000000 (-0).

16
Encoding signed numbers Informática

Two’s-complement representation
The representation of a negative number -x in n-bits is defined as its two’s-
complement.
The two’s complement can be calculated in decimal as 2n − x modulus 2n:
‣ -x ≡ C2(x,n) = (2n − x) % 2n (with |x| < 2n)
‣ Examples:
‣ C2(3,4) = (24 − 3) % 24 = 1310 = 11012 → -3
‣ C2(3,8) = (28 − 3) % 28 = 25310 = 1111 11012 (sign extension) → -3
‣ C2(-3,4)= (24 + 3) % 24 = 310 = 00112 → 3
‣ C2(0,4) = (24 − 0) % 24 = 010 = 00002 → 0

17
Encoding signed numbers Informática

Two’s-complement representation
The two’s complement of a binary n-bit representation is a new
representation with range [−2n−1, ..., 2n−1 − 1] in which:
- Codes [0,... ,2n−1 − 1] → positive integers [0,... ,2n−1 − 1]
- Codes [2n−1,... ,2n − 1] → negative integers [-2n-1,... ,− 1]
Example
- n=4 → range = [-8, 7]
- codes [0,7]→positive integers [0,7], codes [8,15] → negative integers [-8,-1]

Binary code unsigned Two’s compl Binary code unsigned Two’s compl

0000 0 0 1000 8 -8
0001 1 1 1001 9 -7
0010 2 2 1010 10 -6
0011 3 3 1011 11 -5
0100 4 4 1100 12 -4
0101 5 5 1101 13 -3
0110 6 6 1110 14 -2
0111 7 7 1111 15 -1
18
Encoding signed numbers Informática

Two’s-complement representation
Converting from decimal to two’s complement:
- if it is a positive number, then convert directly to binary:
‣ Example: 5: 0101
- if it is a negative number, then first convert its absolute value to binary, and
then obtain the two’s complement: 1) Invert the bits 2) Add one;
‣ Example: –3: 0011 →(invert)→ 1100 →(add 1)→ 1101

Converting from two’s complement to decimal:


- if it is a positive number (MSB=0), then apply weighted digits eq. (1):
‣ Example: 0111 = 0 x 23 + 1 x 22 + 1 x 21 + 1 x 20 = 710
- if it is a negative number (MSB=1), then compute the two’s complement and
apply weighted digits eq. (1) to get the absolute value. Next, change the sign
of the absolute value.
‣ Example: 1111 →(invert)→ 0000 →(add 1)→ 0001
‣ 0001 = 0 x 23 + 0 x 22 + 0 x 21 + 1 x 20 = 1 → -1

19
Encoding signed numbers Informática

Two’s-complement representation
Property 1: x - y = x + (-y) = x + C2(y,n)
‣ Example
‣ 2 - 3 = 0010 − 0011 = 1111 = -110
‣ 2 - 3 = 2 + (-3) = 0010 + 1101 = 1111 = -110
- Example 2 (Corolary)
‣ -(-x) = x i.e. applying twice the two‘s complement you obtain the original number
‣ -3 = 1101; -(-3) 1101→(invert)→ 0010 →(add 1)→ 0011 (it‘s 3)

Property 2: only one zero → 0......0000


Property 3: sign extension.
- Positive numbers have MSB=0
- Negative numbers have MSB=1
- C2(3,4) = 11012 → C2(3,8) = 1111 11012 (sign extension)

20
Encoding signed numbers Informática

Excess-K representation
Excess-K (also called offset binary or biased representation) of an n-bit
representation is a representation with range [−K, ..., 2n−1−K] that uses a
pre-specified number K as a biasing value to displace the origin of the
representation so as to map the most negative number of the representation
(-K) to the code 0000. The value of K is the offset.
Example
- Excess-K, K=8, n=4 → range = [-8, 7]
- codes [0,7]→negative integers [-8,-1], codes [8,15] → positive integers [0,7]
Binary code unsigned Excess-8 Binary code unsigned Excess-8
0000 0 -8 1000 8 0
0001 1 -7 1001 9 1
0010 2 -6 1010 10 2
0011 3 -5 1011 11 3
0100 4 -4 1100 12 4
0101 5 -3 1101 13 5
0110 6 -2 1110 14 6
0111 7 -1 1111 15 7 21
Encoding signed numbers Informática

Excess-K representation
Converting from decimal to excess-K:
- Add K to x in decimal and then convert it to binary
- Examples: Assume K=8, n=4
‣ x=-3 →(add 8)→ -3+8=5 →(binary)→ 0101
‣ x=3 →(add 8)→ 3+8=11 →(binary)→ 1011
Converting from excess-K to decimal:
- Convert it to decimal and then subtract K
- Examples: Assume K=8, n=4
‣ x=0011 →(decimal)→ 3 →(subtract 8)→ 3-8=-5
‣ x=1011 →(decimal)→ 11 →(subtract 8)→ 11-8=3

22
Encoding signed numbers Informática

Excess-K representation
Property 1: it is monotonic increasing, so it eases to perform comparisons (>,
<, etc.).
Property 2: the excess-K representation with K=2n−1 matches the two’s-
complement representation by inverting the most significant bit (sign bit).

23
Encoding signed numbers Informática

Summary of signed representations


Sign &
Binary code unsigned Two’s compl. Excess-8
magnitud
0000 0 0 0 -8
0001 1 1 1 -7
0010 2 2 2 -6
0011 3 3 3 -5
0100 4 4 4 -4
0101 5 5 5 -3
0110 6 6 6 -2
0111 7 7 7 -1
1000 8 -0 -8 0
1001 9 -1 -7 1
1010 10 -2 -6 2
1011 11 -3 -5 3
1100 12 -4 -4 4
1101 13 -5 -3 5
1110 14 -6 -2 6
1111 15 -7 -1 7
24
Information coding Informática

Basic concepts
Numbering systems
Encoding signed numbers
Encoding real numbers
Encoding texts
Redundancy and compression

25
Encoding real numbers Informática

Fixed-point representation
It has a fixed number of digits n after the decimal point and (sometimes) a
fixed number of digits m before the decimal point.
- Examples in decimal: n = 3, m = 4 → 0200.003, 1200.100
- Examples in binary: n = 3, m = 4 → 0100.001, 1100.100
Drawbacks: loss of accuracy and overflow
- Overflow occurs, f.e., when the result of a fixed-point multiplication could
potentially have as many bits as the sum of the number of bits in the two
operands.
‣ Example 1000.101 x 0100.001 = ???.
- Loss of accuracy occurs, f.e., after a sequence of operations with truncated
results.
‣ Example with n = 3 and m = 4
‣ 0023.941 x 0000.001 = 0000.023.
‣ If we want to recover the original number: 0000.023 x 1000.000 = 0023.000

26
Encoding real numbers Informática

Floating-point representation
It consists of a fixed number of significant digits, called mantissa, which are
scaled using an exponent. The base for the scaling is usually 2 or 10:
mantissa × baseexponent
- Examples of the same number using different exponents (scaling factors):
1125.0×100 112.5×101 11.25×102 1.125×103 0.1125×104
- The point can ”float”, i.e., be placed anywhere relative to the significant digits
of the number.
Normalized representation: the one that the point follows the most
significant digit different from zero: 1.125×103.
Advantage: it supports a much wider range of values with the same number
of digits

27
Encoding real numbers Informática

IEEE Standard for Floating- Point Arithmetic (IEEE 754)


It describes several formats with different accuracies. A given format comprises:
- Finite numbers. Described by three integers (s,c,q). The value of the number is:
(−1)s × c × bq
- Two infinities: +∞ and −∞.
- Two kinds of NaN (Not a Number)
Finite numbers:
- s: the sign (zero or one).
- c: is the mantissa (also called “significand” or “coefficient”).
‣ Uses sign-and-magnitude format. The sign of the mantissa is the sign bit.
‣ Normalized format → the point follows the most significant digit different from
zero. Since this bit is always a 1, it is implied and there is no need to store it.
- q: is the exponent.
‣ Uses excess-K representation with K = 2ne −1 − 1 ,where ne: number of bits of
the exponent.
‣ K=15 for IEEE-16, K=127 for IEEE-32, and K=1023 for IEEE-64.
- b: is the base which may be 2 or 10.
28
Encoding real numbers Informática

IEEE Standard for Floating- Point Arithmetic (IEEE 754)

2.2251e-308

1.7977e+308
29
Encoding real numbers Informática

IEEE Standard for Floating-Point Arithmetic (IEEE 754)


Converting from floating-point to decimal - Example single: 7F7F FFFF16
- Sign : leading bit → 0 → positive number.
- Exponent: 8 bits after the sign → 111 1111 02 = 25410.
It is in in excess-127 → 254 − 127 = 127
- Mantissa: 23 bits after the exponen plus the implied bit which is always 1.
It is: 1,11111....1. Represented with sign-and magnitude.
Use eq (1) to get the value in decimal:
1×20 + 1×2−1 + 1×2−2 + ··· + 1×2−23 = 1.999999880790710 ≃ 2
Result: +2 × 2127 ≃ 3.4028 × 1038
Example using https://www.h-schmidt.net/FloatConverter/IEEE754.html

Matlab code for converting


30
Encoding real numbers Informática

IEEE Standard for Floating- Point Arithmetic (IEEE 754)


Converting from decimal to floating-point (1) - Example -29.6875 to double
1) Convert the absolute value of the number to binary. Convert the integral and
fractional parts separately:
2910 = 111012
0.6875 × 2 = 1.375 → 1
0.3750 × 2 = 0.750 → 0
0.75 × 2 = 1.5 → 1
0.5 × 2 = 1.0 → 1
0.687510= 0.10112 → 29.687510= 11101.10112 = 11101.10112 × 20
2) Normalize the number:
11101.10112 × 20 → 1.110110112 × 24
3) Generate the mantissa. Omit the implied one. Fill with zeros on the right up to
the 52 bits of the mantissa. Using hex notation:
1101 1011 0000 0000 ... 00002 = D B000 0000 000016

31
Encoding real numbers Informática

IEEE Standard for Floating- Point Arithmetic (IEEE 754)


Converting from decimal to floating-point (2): example -29.6875 to double
4) Generate the exponent: expressed in excess-1023. For IEEE-64 the bias is
1023. Add the bias:
410 + 102310 = 102710 = 100 0000 00112 = 40316
5) Set the sign bit: 1 → negative
6) Place the sign, exponent, and mantissa into the fields of the IEEE
format:
1100 0000 0011 1101 1011 0000 0000 ... 0000
Result: −29.687510 = C03D B000 0000 000016 IEEE64
Example of Matlab code for converting from decimal to floating-point

32
Information coding Informática

Basic concepts
Numbering systems
Encoding signed numbers
Encoding real numbers
Encoding texts
Redundancy and compression

33
Encoding texts Informática

decimal hex character decimal hex char decimal hex char decimal hex char
The ASCII code
American
Standard
Code for
Information
Interchange

34
Encoding texts Informática

The ASCII code


The American Standard Code for Information Interchange is a 7-bit coding
scheme that supports the English alphabet and control characters.
Example:

- sends to the console the following codes (decimal):


65 110 32 9 32 65 83 67 73 73 32 13 32 10 32 116 101 120 116 32 10

Drawback: it lacks for symbols from other languages


Solutions:
- Extended 8-bit ASCII coding: ISO 8859-1 standard, known as ISO Latin 1
- Unicode ...

35
Encoding texts Informática

Handling ASCII codes in Matlab


char() : converts ASCII codes to characters
>> char(65)
ans =
A
>> char([65,66,67,68])
ans =
ABCD
abs() : converts strings to arrays of ASCII codes
>> abs('A')
ans =
65
>> abs('Hello')
ans =
72 101 108 108 111 92
36
Encoding texts Informática

The Unicode standard (ISO/IEC 10646)


Attempt to create a universal character set with support for most of the
world’s writing systems.
Not only a character chart; it defines a complete encoding methodology. It
deals with aspects like:
- Character properties (upper and lower case)
- Rules for composition of characters with different types of accents

- Normalization rules for obtaining equivalent forms, etc


It specifies a name and a unique numeric identifier for each character or
symbol, named the code point.
Originally this identifier was intended to be coded as a 16-bit integer, but
over time it proved to be insufficient.
You can even code emojis!! 😀 (code: 1F600)
37
- See https://en.wikipedia.org/wiki/Emoticons_(Unicode_block)
Encoding texts Informática

The Unicode standard (ISO/IEC 10646)


Unicode defines three encoding forms under the name UTF (Unicode
Transformation Format):
- UTF-8 - byte oriented coding with variable length symbols (1 to 4 bytes per Unicode
character). “Simple” texts (for example in English and Spanish) use 1 bytes per
character. More complex characters requires more bytes.
‣ For example ‘Hello’, is 48 65 6C 6C 6F (ASCII codes)
‣ Examples of more complex characters.

- UTF-16 - it uses a 16-bit code for the Basic Multilingual Plane (BMP) and two 16-bit
(surrogates pairs) for additional less frequent planes.
- UTF-32 - 32-bit encoding of fixed length, and the simplest of the three.
World Wide Web was ASCII until December 2007, when it was surpassed by UTF-8.

38
Encoding texts Informática

Formatted text
A markup language is a way to encrypt a document which, in addition to the
text, includes labels or markings to specify the structure of the text.
Presentational markup: used by traditional text editors. Marking is performed
by the text editor in such a way that marking is hidden from human users
producing the WYSIWYG (What You See Is What You Get) effect.
Procedural marking: used by LaTeX and some HTML editors. In these systems
the user explicitly writes the formatting labels in the source file.
HTML example (web pages)

39
Information coding Informática

Basic concepts
Numbering systems
Encoding signed numbers
Encoding real numbers
Encoding texts
Redundancy and compression

40
Redundancy and compression Informática

Capacity and transmission speed units


Capacity: the unit is byte (although sometimes the bit is also used)
- Representation: byte= B, bit=b ó bit
- Multiples:
‣ When it refers to main memory (RAM): expressed as powers of 2:
1KB = 210=1024 bytes; 1MB=220=1024KB; 1GB=230=1024MB.
Example: 4GB RAM is 4*230=4,294,967,296 bytes.
‣ When it refers to secondary memory (DISK): expressed as powers of 10:
1KB = 103=1000 bytes; 1MB=106=1000KB; 1GB=109=1000MB
Example: 2TB DISK is 2*1012=2,000,000,000,000 bytes.

Bit rate (or transmission/transfer speed): it is measured as bits per second.


- Representation: bps, b/s, bit/s
- Multiples:
‣ Always expressed as powers of 10.
1Kbps = 103=1000 bps; 1Mbps=106=1000 kbps; 1Gbps=109=1000Mbps
Example: 300Mbps (Optical fibre) is 300,000,000 bps.
41
Redundancy and compression Informática

Sample exercise
A meteorological satellite takes 2 images every second, with a resolution
of 1920x1080 pixels (dots) using 8 bits to encode each colour (reg, green,
blue: RGB), which has to be transmitted to ground using a wireless link.
- A) Obtain the space required (in bytes) to store the image..
‣ For each pixel (dots) it is required 24 bits = 3bytes
1920x1080x3=6,220,800 bytes that is roughly 5.9MB.
- B) Calculate the necessary transfer rate (bps) to transmit the 2 images
‣ Each image occupies 6,220,800 bytes, which are 49,766,400 bits. If you have to
transmit 2 images per second, then:
49,766,400x2=99,532,800 bps = approx. 100 Mbps.
In computing, the capacity of the link (transfer rate) is usually called bandwidth.
- Samples: Ethernet 10Mbs-1Gbps, WiFi: 55-600Mbps
Redundancy and compression Informática

Redundant encoding
Information may get corrupted when it is transmitted through
communication lines or stored in disks or other storing devices.
Redundancy is used to detect and to detect-and-correct errors.
- Error detection: parity bit, checksums
- Error detection and correction: ECC. They require higher levels of
redundancy.

43
Redundancy and compression Informática

Redundant encoding
A parity bit: redundant bit added to a set of bits to ensure that the number of
bits with value 1 in the outcome is even or odd.
- Even parity: 100 00111
- Odd parity: 100 00111
Parity bits are often used when transmitting ASCII characters from/to
peripherals.

44
Redundancy and compression Informática

Information compression
Data compression: process of transforming an encoded information using
fewer bits than the original representation uses.
- Goal: to reduce the information volume and the consumption of expensive
resources, such as hard disk space or transmission bandwidth.
It has a cost: extra processing for compressing-decompressing.
- Trade-off between the costs of encoding and decoding: time consuming
compression → time efficient decompressing. And viceversa.
Two types of compression:
- Lossless compression: the encoded data is not distortioned or modified, so
it can reconstructed from the compressed data.
‣ Example: text compression. ZIP format
- Lossy compression: the original data is only approximately represented. It
only allows to reconstruct an approximation of the original data.
‣ Example: image/audio compression. PNG, GIF, MPEG, MP3 formats

45
Redundancy and compression Informática

Lossless compression
Lossless algorithms usually exploit statistical redundancy in such a way that
more frequent data are represented with fewer bits.
Huffman coding
- Example: text with only four characters: ’ ’, ’A’, ’B’, ’C’ with frequencies
45%, 35%, 15% and 5% respectively.

- Compressing ratio:

46
Redundancy and compression Informática

Lossy compression
It compresses data by discarding (losing) some of it.
Usually based on perceptual coding: transforming the raw data obtained
from a device to a domain that more accurately reflects the information
content.
- Example: a sound file can be more efficiently represented as the frequency
spectrum over time than as the amplitude levels.
Lossy encoding/decoding programs are usually known as codecs.
Key point: required accuracy or Quality of Service (QoS).
- Example: image qualities for video conference 640x480, 800x600,
1920x1080, ...

47

You might also like