0% found this document useful (0 votes)
8 views96 pages

Data Representation

The document provides an overview of data representation, focusing on various numbering systems including decimal, binary, octal, and hexadecimal. It explains the concept of positional value and base systems, detailing how to convert between these systems and perform calculations. The document also outlines the methods for converting decimal numbers to other bases and vice versa.

Uploaded by

shitalsingh5612
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views96 pages

Data Representation

The document provides an overview of data representation, focusing on various numbering systems including decimal, binary, octal, and hexadecimal. It explains the concept of positional value and base systems, detailing how to convert between these systems and perform calculations. The document also outlines the methods for converting decimal numbers to other bases and vice versa.

Uploaded by

shitalsingh5612
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 96

DATA REPRESENTATION

• What is Data ?
• Number System
• Types of Number System
– Decimal
– Binary
– Octal
– Hexadecimal
Numbering Systems
• A number system is a method to represent
(write) numbers.
• Every number system has a set of unique
characters or literals.
• The count of these literals is called the radix or
base of the number system.
Numbering Systems

• Each number system has a base also called a


Radix.
• A decimal number system is a system of base 10;
binary is a system of base 2; octal is a system of
base 8; and hexadecimal is a system of base 16.
• What are these varying bases?
• The answer lies in what happens when we count
up to the maximum number that the numbering
system allows.
• Number systems are also called positional number system because the value of
each symbol (i.e., digit and alphabet) in a number depends upon its position
within the number.
• Number may also have a fractional part similar to decimal numbers used by us.
• The symbol at the right most position in the integer part in a given number has
position 0.
• The value of position (also called position value) in the integer part increases
from right to left by 1.
• On the other hand, the first symbol in the fraction part of the number has
position number –1,
• which decreases by 1 while reading fraction part from left to right.
• Each symbol in a number has a positional value, which is computed using its
position value and
• the base value of the number system.
• The symbol at position number 3 in a decimal system with base 10 has positional
value 103.
• Adding the product of positional value and the symbol value results in the given
number.
Computation of decimal number using its
positional value
Base value of a number
• Base value of a number system is used to
distinguish a number in one number system
from another number system.
• Base value is written as the subscript of the
given
• number.
• For example,
• (70)8 represents 70 as octal number and (70)10
denotes 70 as decimal number.
Decimal Number System
• Base 10
• Consists 0 – 9 digits
• A number is presented by its two values —
symbol value (any digit from 0 to 9) and
positional value (in terms of base value).
3652.14
10 3
10 10 2
10 .
1
10 10
0 -1 -2

3 6 5 2 . 1 4

MSD LSD

3*103 + 6*102 + 5*101 + 2*100 + 1*10-1 + 4*10-2


Binary Number System
• The ICs (Integrated Circuits) in a computer are
made up of a large number of transistors which
are activated by the electronic signals (low/high)
they receive.
• The ON/ high and OFF/low state of a transistor is
represented using the two digits 1 and 0,
respectively.
• These two digits 1 and 0 form the binary number
system.
• This system is also referred as base-2 system as it
has two digits only.
Binary Number System
• Base 2
• Consists 0 – 1 digits
• Value based on the position of the digits
(positional-value system)
• 1011.01
23 22 21 20 . 2-1 2-2
1 0 1 1 . 0 1

MSB LSB

1*23 + 0*22 + 1*21 + 1*20 + 0*2-1 + 1*2-2


Binary Number System
23 22 21 20 . 2-1 2-2
• 110 = 012 1 0 1 1 . 0 1
• 210 = 102 8 4 2 1 0.5 0.25

• 310 = 112
• 410 = 1002
• 610 = 1102
• 1010 = 10102

• 00, 01, 10, 11, 100, 101, 110, 111, 1000……..


Decimal to Binary Conversion
11710 = ? 11101012
2 117
2 58 1 Solve :
2 29 0 • 4310 = 1010112
2 14 1 • 8510 = 10101012
2 7 0
2 3 1
2 1 1
0 1
Decimal Fraction to Binary Conversion

0.37510 = ? 0.0112

0.375 * 2 = 0.750 0
0.75 * 2 = 1.50 1
0.50 * 2 = 1.00 1

Solve :
• 0.812510 = 0.11012
• 0.1562510 = 0.001012
Binary ↔ Decimal Conversion
11.2510
23 22 21 20 . 2-1 2-2
1 0 1 1 . 0 1
8 4 2 1 0.5 0.25

1*23 + 0*22 + 1*21 + 1*20 + 0*2-1 + 1*2-2


8+0+2+1+0+0.25 = 11.2510
Binary to Decimal
(double dabble method)
1
• 1011
(2 * 1 ) + 0 = 2
(2 * 2 ) + 1 = 5
(2 * 5 ) + 1 = 1110

Solve :
• 1001012 = 3710
• 1101012 = 5310
Octal Number System
• Base 8
• Consists 0 – 7 digits
• Value based on the position of the digits
(positional-value system)
• 3714.01
83 82 81 80 . 8-1 8-2
3 7 1 4 . 0 1

MSB LSB

3*83 + 7*82 + 1*81 + 4*80 + 0*8-1 + 1*8-2


Octal Number System
83 82 81 80 . 8-1 8-2
• 110 = 018
1 0 1 1 . 0 1
• 210 = 028 8 4 2 1 0.5 0.25

• 710 = 078
• 810 = 108
• 1210 = 148
• 1810 = 228

• 00, 01, 02,03,04,05,06,07, 10, 11, 12, 13, 14, 15, 16,
17, 20, 21, …….,77, 100 , 101, 102……..
Decimal to Octal Conversion
11710 = ? 1658
8 117
8 14 5 Solve :
8 1 6 • 673.110 = 1241.063148
0 1 • 149810 = 27328
Octal to Decimal Conversion
84 83 82 81 80
1 0 2 6 3
8 4 2 1

1*84 + 0*83 + 2*82 + 6*81 + 3*80


4096+0+128+48+3 = 4275

Solve :
• 3728 = 25010
• 24.68 = 20.7510
• 20208 = 104010
Octal to Binary Conversion
• Converting from octal to binary has a very
straightforward method because octal numbers
are shortened versions of binary strings.
• Remember that each octal digit represents three
binary digits. Therefore, one octal digit should
give three binary digits (bits).
• Octal can be converted to binary indirectly (first
to decimal, then to binary).
Octal to Binary Conversion
7 2 5
• 7258 = 111 010 1012
• 372.158 = 011 111 010 . 001 1012

Solve :
• 4368 = 100 011 1102
• 14.18 = 001 100 . 0012
Octal to Binary Conversion
A direct method to convert octal to binary with steps is as follows:

• Write down the octal number separating the digits. Each octal digit
represents three binary digits and therefore is equal to a power of 2. The
rightmost digit equals to 20 (1), the next one equals to 21 (2) and the
leftmost one equals to 22 (4). Write these numbers (4, 2 and 1) below the
octal digits.
• Step 2: Determine which powers of two (4, 2 or 1) sum up to your octal
digits. For example, if one of your octal numbers is 6, this means 4 and 2
sum up to 6 (and 1 is not used). If your octal number is 2, only 2 is used; 4
and 1 are not.
• Step 3: Write down 1 below those 4, 2 and 1’s that are used. Write down 0
below those that are not used.
• Step 4: Read 1’s and 0’s you just wrote from left to right. You will get your
binary number.

• Let's apply these steps to the octal number (456)8


Binary to Octal Conversion
• Every octal digit represents 3 binary bits
(8=23).

• 1110101012 = 7258
• 11111010 . 011012 = 372.328

Solve :
• 1001011 . 110112 = 113.668
• 10101111002 = 12748
Hexadecimal Number System
• Base 16
• Hexadecimal Number System is a base-16 numeral system used
in diverse fields, especially in computing and digital electronics.
• Consists 0 – 9 digits, A-F characters
• Value based on the position of the digits (positional-value
system)
• B2C.1

162 161 160 . 16-1


B 2 C . 1

MSB LSB

11*162 + 2*161 + 12*160 + 1*16-1


Hexadecimal Number System
163 162 161 160 . 16-1 16-2
• 110 = 0116
1 0 1 1 . 0 1
• 210 = 0216 8 4 2 1 0.5 0.25

• 1010 = 0A16
• 1510 = 0F16
• 1810 = 1216
• 2610 = 1A16

• 00, 01, 02,…, 09, 0A, 0B, 0C, 0D, 0E, 0F, 10, 11, 12,
13……F0, F1,…. FF, 100, 101,....
Decimal to Hexadecimal Conversion

11710 = ? 7516
16 117
16 7 5 Solve :
0 7 • 4610 = 2E16
• 15310 = 9916
• 286010 = B2C16
Hexadecimal to Decimal Conversion
164 163 162 161 160
1 2 C 5 B
8 4 2 1

1*164 + 2*163 + 12*162 + 5*161 + 11*160


65536 + 8192 + 3072 + 80 + 11 = 76891

Solve :
• 1D916 = 47310
• 80E116 = 3299310
• 10CE16 = 430210
Hexadecimal to Binary Conversion
2 D 5
• 2D516 = 0010 1101 01012
• 8EF16 = 0100 1110 11112

Solve :
• 4FA16 = 0100 1111 10102
• 2C116 = 0010 1100 00012
Binary to Hexadecimal Conversion
• 10101110102 = 2BA16
• 11101100101001.110112 = 3B29.D816

Solve :
• 11110 . 010112 = 1E.5816
• 11101000110101102 = E8D616
Hexadecimal to Octal Conversion
• 2D516 = 0010 1101 01012
= 001 011 010 1012 = 13258
• 8EF16 = 0100 1110 11112
= 010 011 101 1112 = 23578
Solve :
• 4FA16 = 23728
• A2DE16 = 1213368
Octal to Hexadecimal Conversion
• 13258 = 001 011 010 1012
= 0010 1101 01012 = 2D516
• 23578 = 010 011 101 1112
= 0100 1110 11112 = 8EF16
Solve :
• 15738 = 37B16
• 13018 = 2C116
In a nut-shell
• Conversion from Decimal to other Number Systems

• To convert a decimal number to any other number system


(binary, octal or hexadecimal), use the steps given below:

Step 1: Divide the given number by the base value (b) of the
number system in which it is to be converted.

Step 2: Note the remainder

Step 3: Keep on dividing the quotient by the base value and note
the remainder till the quotient is zero.

Step 4: Write the noted remainders in the reverse order (from


bottom to top)
In a nut-shell
• Conversion from other Number Systems to Decimal Number
System
• We can use the following steps to convert the given number with
base value b to its decimal equivalent, where base value b can be
2, 8 and 16 for binary, octal and hexadecimal number system,
respectively.
Step 1: Write the position number for each alphanumeric symbol in
the given number
Step 2: Get positional value for each symbol by raising its position
number to the base value b symbol in the given number
Step 3: Multiply each digit with the respective positional value to
get a decimal value
Step 4: Add all these decimal values to get the equivalent decimal
number
In a nut-shell
• Conversion from Binary Number to Octal/
Hexadecimal Number and Vice-Versa

A binary number is converted to octal or


hexadecimal number by making groups of 3 and 4
bits, respectively, and replacing each group by its
equivalent octal / hexadecimal digit.
In a nut-shell
• Conversion of a Number with Fractional Part
Convert (0.25)10 to binary.
Integer part
0.25 × 2 = 0.50 0
0.50 × 2 = 1.00 1
Therefore, (0.25)10 = (0.01)2

Convert (0.675)10 to octal.


Integer part
0.675 × 8 = 5.400 5
0.400 × 8 = 3.200 3
0.200 × 8 = 1.600 1
0.600 × 8 = 4.800 4 Therefore, (0.675)10= (0.53146)8
0.800 × 8 = 6.400 6
In a nut-shell
• Conversion of a Number with Fractional Part
Convert (0.675)10 to hexadecimal form.
Integer part
0.675 × 16 = 10.800 A (Hexadecimal symbol for 10)
0.800 × 16 = 12.800 C (Hexadecimal symbol for 12)

Therefore, (0.675)10=(0. AC)16


Binary Addition
0+0=0
0+1=1
1+0=1
1 + 1 = 10

1 + 1 + 1 = 11
?
Practice Questions
• Add the following binary numbers:
a. 1010 + 1001
b. 100110 + 110101
c. 101101 + 111101
d. 1110.110 + 1010.011
Binary Representation of Integers
• Binary number can be represented only by using
0’s and 1’s, but can not use the sign (-) to denote
the negative number or sign (+) to denote the
positive number.
• So it must be either 0 or 1.
• There are three methods to represent binary
number. They are

(i) Sign and magnitude method


(ii) One’s complement method
(iii) Two’s complement method
Data
Representation

Magnitude Complement

1’s 2’s
Unsigned Signed
complement complement
Signed Integer Representation
• Sign and Magnitude:
– MSB denotes the sign ( 0 for +ve, 1 for –ve)

• Consider 4 bit number:


– (MSB for sign and 3 bits)

1 1 1 1 0 1 1 1

-7 +7
Sign & Magnitude
Negative Positive
• 0 0000
• -1 • 1 0001
• -2 • 2 0010
• -3 • 3 0011
• -4 • 4 0100
• -5 • 5 0101
• -6 • 6 0110
• -7 • 7 0111
Addition with S & M
(using 4 bits)
• 7 + (-3)

• (-4) + 1

• 5 + (-5)
Addition with S & M
(using 8 bits)
• 7 + (-3)

• (-24) + 16
Addition / Subtraction
Sign Magnitude Numbers
One’s Complement Representation

• +ve numbers - True Form /normal representation


• -ve numbers –
– Flip 0 to 1 and 1 to 0
– Considered as a +ve number

• Consider 4 bit number:


• + 6 –> 0110
• - 6 –> 1001 (+ 9)
One’s Complement
Negative Positive
• -0 1111 • 0 0000
• -1 1110 • 1 0001
• -2 1101 • 2 0010
• -3 1100 • 3 0011
• -4 1011 • 4 0100
• -5 1010 • 5 0101
• -6 1001 • 6 0110
• -7 1000 • 7 0111
Addition with 1’s complement
(using 4 bits)
• Rule:
– If the result of addition has a carry bit 1, then add it to the
least significant bit (LSB) of given result
– If the MSB of the result is 1, then take 1’s complement of
the result (magnitude) which is negative
• 7 + (-3) 0111 (7)
0101 (5)
+ 1100 (-3)
+ 1010 (-5)
----------------
----------------
• 5 + (-5) 10011
1 (carry bit)
1111 (-0)
----------------
----------------
0100 (4)
0000 (0)
Addition with 1’s complement (using 8 bits)
Case 1: 24 + 16 Case 2: (-24) + 16

00011000 11100111
+ 00010000 + 00010000
------------------------- ----------------------------

Case 3: 24 + (-16) Case 4: (-24) + (-16)

00011000 11100111
+ 11101111 + 11101111
-------------------------- ----------------------------
Two’s Complement Representation
• +ve numbers - True form / normal representation
• -ve numbers –
– Convert to 1’s complement (Flip 0 to 1 and 1 to 0)
– Add 1

• Consider 4 bit number:


• + 6 –> 0110
• - 6 –> 1001 (1’s complement)
+ 1 (Add one)
----------
1010
Two’s Complement
Range: -(2(n-1) ) to 2(n-1) -1
Negative Positive
• 0 • 0 0000
• -1 1111 • 1 0001
• -2 • 2 0010
• -3 1101 • 3 0011
• -4 1100 • 4 0100
• -5 • 5 0101
• -6 1010 • 6 0110
• -7 • 7 0111
Addition with 2’s complement
(using 4 bits)
• Rule:
– If the result of addition has a carry bit 1, then ignore it
– If the MSB of the result is 1, then take 2’s complement of
the result (magnitude) which is negative

• 7 + (-3) 0 1 1 1 (7) 0 1 0 1 (5)


+ 1 1 0 1 (-3) + 1 0 1 1 (-5)
---------------- ----------------
• (-5) + 5
Addition with 2’s complement (using 8 bits)
Case 1: 24 + 16 Case 2: 24 + (-16)

00011000 00011000
+ 00010000 + 11110000
------------------------- ----------------------------

Case 3: (-24) + 16 Case 4: (-24) + (-16)

11101000 11101000
+ 00010000 + 11110000
-------------------------- ----------------------------
Practice Questions
• Add -25 and -35 using 1’s complement
• Add -46 and 25 using 2’s complement
Signed Binary numbers
in a nutshell
• In Binary number
Signed Binary Number

system, all integers are


Sign & Magnitude represented in 1’s and
0’s which includes
positive and negative
1’s Complement numbers.
• These 3 different
2’s Complement methods are used to
show negative integers
in binary form.
Signed Binary numbers
• In signed binary number, the MSB is the sign bit.
• So, if we have n bit of signed binary number then
the MSB is the sign bit and the remaining (n-1)
bits represents the magnitude of that number.
• Depending on the value of the sign bit, we can
decide whether the number is positive or
negative.
• Although in all three cases, the MSB for a –ve no.
is 1, the actual no. representation is different in
all three forms.
Sign & Magnitude form
• Straight forward
• MSB for +ve is 0 and is 1 for –ve
• The magnitude of the number is represented in
true binary form.
Eg: +34 (in 8-bit) in true binary form is 0010 0010
-34 is 1010 0010
Sign & Magnitude form
• In sign and magnitude form, there are 2
different ways of representing 0
+0 (in 8 –bit) 0000 0000
-0 1000 0000
Bitwise range of positive numbers
• 4-bit 0 – 15(24 – 1)
• 5-bit 0 – 31(25 – 1)
• 6-bit 0 – 63(26 – 1)
• 7-bit 0 – 127(27 – 1)
• 8-bit 0 – 255(28 – 1)
Range of numbers in
Sign & Magnitude form
• In sign & magnitude form using n bits, the range of nos. that
can be represented are:
-(2n-1 – 1) to (2n-1 – 1)

• If we have 4 bits to represent a signed binary number, (1-bit for


the Sign bit and 3-bits for the Magnitude bits), then the actual
range of numbers we can represent in sign-magnitude notation
would be:
-2(4-1) – 1 to +2(4-1) – 1
-2(3) – 1 to +2(3) – 1
-7 to +7
which means there are 7 –ve nos. and 7 +ve nos. and 2 different
representations of 0’s.
• Decimal values into signed binary numbers
using the sign-magnitude format:

• -1510 as a 6-bit number ⇒ 1011112


• +2310 as a 6-bit number ⇒ 0101112
• -5610 as a 8-bit number ⇒ 101110002
• +8510 as a 8-bit number ⇒ 010101012
• -12710 as a 8-bit number ⇒ 111111112
1’s Complement
• Representation of +ve nos. is in its true binary
form.
• MSB for +ve is 0 and –ve is 1
• Flipping of 0’s and 1’s(including the sign bit)
34 (in 8-bit) 00100010
-34 (in 8-bit) 11011101
1’s Complement
• Mathematically for a n-bit no.
n-bit of N (2n-1) - N
Eg:
5 bit of 12(10) (25-1) – 12
11111 – 01100
10011  -12
Range of numbers in
1’s complement
• In 1’s complement, using n bits, the range of
nos. that can be represented are:
-(2n-1 – 1) to (2n-1 – 1)
-2(4-1) – 1 to +2(4-1) – 1
-2(3) – 1 to +2(3) – 1
-7 to +7
which means there are 7 –ve nos. and 7 +ve nos.
and 2 different representations of 0’s.
2’s Complement
• Representation of +ve nos. is in its true binary
form.
• MSB for +ve is 0 and –ve is 1
• Flipping of 0’s and 1’s(including the sign bit)
and adding 1 to it.
34 (in 8-bit) 0010 0010
-34 (in 8-bit) 1101 1110
2’s Complement
• Mathematically for a n-bit no.
n-bit of N 2n - N
Eg:
8 bit of 34(10) 28 – 34
100000000 – 00100010
11011110  -34
Range of numbers in
2’s complement
• In 2’s complement, using n bits, the range of
nos. that can be represented are:
-2n-1 to (2n-1 – 1)
-2(4-1) to +2(4-1) – 1
-2(3) to +2(3) – 1
-8 to +7
which means there are 8 –ve nos. and 7 +ve nos.
Only in 2’s complement form, there is only a
unique way of representing 0.
2's Complement Arithmetic
There are a few scenarios we need to look out for:

• A carry occurring for the left most bit (ie, creating a result
1 bit larger than the initial numbers) may be discarded.

• If we are adding two positive numbers and the left most


bit in the result is a 1 then an overflow has occurred.

• If we are adding two negative numbers and the left most


bit in the result is a 0 then an overflow has occurred. (a
clue that this has happened will also be that there was a
carry that was discarded)
2's Complement Arithmetic
• It is only possible to get an overflow if the two
numbers to be added together are of the
same sign (ie, both positive or both negative).

• By overflow we mean that the result was a


number larger, or smaller, than what is
capable of being represented using the given
number of bits.
Representation of Data Images
• Images are made up of pixels

• Each pixel is stored as a binary value

• A pixel (short for picture element) is one specific colour.

• The number of bits used for each pixel determines how many
colours we can use. This is known as the colour depth.

• The more bits per pixel (bpp) the greater the colour depth and the
more bits we need to store the image

• The resolution is the concentration of pixels. Usually measure in


dots per inch (DPI). The higher the more pixels.
Representation of Data Images

• 1 bit can represent 2 colours, i.e. black and white

• 2 bits can represent 4 colours

• 8 bits can represent 256 colours

• 16 bits can represent 65536 colours


Representation of Data Images
• Using the binary data saved and metadata (data about the image) the image
can be reconstructed.

• Metadata includes data such as:


– The resolution

– Width and height

– Colour depth

– Exposure, ISO, Aperture

– File format

• A larger colour depth and resolution = higher quality image + larger file size

• A smaller colour depth and resolution = lower quality image + smaller file size
Representation of Data Sound
• Sound waves are analogue, which means
continuously changing.

• Anything stored on a computer has to be stored as


a series of binary numbers, digital.

• To store sound on a computer we need to convert


the waveform into a numerical representation.
Representation of Data Characters
• In addition to numerical data, a computer must
be able to handle letters and other symbols too.
• To represent letters and symbols, a computer
uses encoding schemes.
• An encoding scheme is predetermined set of
codes for each recognized letter, number and
symbol.
• Most popular encoding schemes are ASCII,
Unicode, ISCII etc.
ASCII
American Standard Code for Information Interchange
• The most widely used alphanumeric encoding scheme is the ASCII.
• The ASCII code is a 7-bit code, and so it has
27 = 128 possible different characters.
• Eg: A  65 a  97 (Case matters)
• The original 7-bit were enough to represent English characters and
punctuation.
• Each character – mapped to a hex value.

• For eg:
• A ⬄ 065(ascii) ⬄ 41 (hex) ⬄ 100 0001
• 1 ⬄ 049(ascii) ⬄ 31 (hex) ⬄ 011 0001

• ASCII was later converted to 8-bit Extended ASCII to represent


more characters (256 characters)
ISCII
Indian Script Code for Information Interchange

• ISCII is an eight-bit code capable of coding 256


characters.
• ISCII code retains all ASCII characters and
offers coding for Indian scripts also.
Drawbacks
• Limited character encodings – not enough
characters to cover all the world's languages
• no single encoding was adequate for all the
letters, punctuation, and technical symbols
• Solution – UNICODE – Superset of all character
sets - an encoding system that solves the
space issue of ASCII.
UNICODE
Universal Character Encoding
• Supports many different alphabets and even
emojis.
• Unlike ASCII, Unicode does not define how its
mapping should be implemented.
• Each character in the Unicode system is
assigned to a hexadecimal code called code
point.
• It is represented in the form of U+<hexadecimal
code> ranging from U+0000 to U+10FFFF
– Eg: U+0041  A $  U+0024 €  U+20AC
Unicode
UNICODE
Universal Character Encoding

• Computers need a way to translate Unicode into


binary so that its characters can be stored in text
files.
• Unicode system defines different ways for
character encoding. They are:
– UTF-8 -> variable length encoding scheme
(8 bits (1 octet) -> 1 unit )

– UTF-32 -> Fixed length encoding scheme (4 bytes)


UNICODE
Universal Character Encoding
• UTF-8 is an encoding system for Unicode. It can translate any
Unicode character to a matching unique binary string, and can
also translate the binary string back to a Unicode character.
This is the meaning of “UTF”, or “Unicode Transformation
Format.”

• It uses 1 byte to encode English characters(the first 128 ASCII


characters in the Unicode).
• It can also use a sequence of bytes to encode other
characters.
• It is widely used in the e-mail systems and on the
internet(especially on the Web).
UTF 8
• UTF 8 has variable length encoding which means
characters with code points with small values like A
can be represented with just one byte.
• Algorithmic mapping from every Unicode code point
to a unique byte sequence.
• Characters with code points with large values can be
represented with more bytes as needed and it goes
upto 4 bytes.
UTF-8

1 octet – U-0 to U-127 (code points)


2 octets – U-128 to U-2047
3 octets – U-2048 to U-65535
4 octets – U-65536 to U-2097151
UTF-8 1-Octet

Control bit Data

1. Represent A (U-0041) Binary: 1000001


0 1 0 0 0 0 0 1

2. Represent ? (U-003F) Binary:0111111


0 0 1 1 1 1 1 1
UTF-8 2-Octet
1 1 0 1 0

Control bit 5 bits of Data Control bit Last bits of Data

1. Represent © (U-00A9)
Binary: 000 1010 1001
1 1 0 0 0 0 1 0 1 0 1 0 1 0 0 1

2. Represent ɸ (U-0278)
Binary: 010 0111 1000
1 1 0 0 1 0 0 1 1 0 1 1 1 0 0 0
UTF-8 3-Octet
1 1 1 0 1 0

1 0

1. Represent ಆ (U-0C86)
Binary: 0000 1100 1000 0110
1 1 1 0 0 0 0 0 1 0 1 1 0 0 1 0

1 0 0 0 0 1 1 0

2. Represent ◑ (U-25D1)
Binary: 0010 0101 1101 0001
1 1 1 0 0 0 1 0 1 0 0 1 0 1 1 1

1 0 0 1 0 0 0 1
Practice Questions
• Represent the following in UTF-8 encoding:

▪ £ (U-00A3)
▪ ↔ (U-2194)
▪ } (U-007D)
UTF-32
• UTF-32 is a fixed length encoding scheme that
uses exactly 4 bytes to represent all Unicode
code points.
• It directly stores the binary code of any Unicode
code point in 4 bytes.
• Symbol $ [ Unicode code Point : U+0024,
Binary code: 00100100
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0
Summary
• ASCII was the first major encoding.
– Because of computer limitations it was only 1 byte.
• Unicode was invented to address the problem
of encoding more languages than just English.
• UTF-8 is a variable length encoding.
• UTF 8 is now the most dominant encoding for
the World Wide Web and accounts for roughly
98% of all web pages.
• UTF-32 is fixed length encoding.

You might also like