0% found this document useful (0 votes)
16 views102 pages

AS 1.1 Data Representation SV

The document provides an overview of number systems, focusing on binary, decimal, and hexadecimal representations, along with their significance in computing. It explains the principles of number bases, methods for converting between decimal, binary, and hexadecimal, and discusses binary-coded decimal (BCD) and binary arithmetic operations. Additionally, it highlights the importance of understanding binary and denary prefixes in data storage and transmission.

Uploaded by

schoolfinish2026
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views102 pages

AS 1.1 Data Representation SV

The document provides an overview of number systems, focusing on binary, decimal, and hexadecimal representations, along with their significance in computing. It explains the principles of number bases, methods for converting between decimal, binary, and hexadecimal, and discusses binary-coded decimal (BCD) and binary arithmetic operations. Additionally, it highlights the importance of understanding binary and denary prefixes in data storage and transmission.

Uploaded by

schoolfinish2026
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 102

International A Level DRAFT VERSION

AS 1 Information
Representation
1.1 Data Representation

DRAFT VERSION
Overview
• Number Systems Introduction
• Binary and Decimal Prefixes
• Binary Coded Decimal
• Binary Addition
• Binary Subtraction
• Hexadecimal
• Characters and Text
General Principles of Number Bases
• The decimal system is so familiar to us that we usually do not even
think about it as a number system
• However, in Computer Science we often need to work with other
number systems, mainly binary and hexadecimal
• So, it is worth briefly going back to basics and looking at what we
mean by a number system
General Principles of Number Bases
• A number base tells us two essential and closely related facts:
• The number of symbols in the base, and
• The place value
• Base 10 (decimal or denary)
• Ten symbols: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
• Place value: 10
103 102 101 100
1000s 100s 10s 1s

9 4 7 3
General Principles of Number Bases
• A number base tells us two essential and closely related facts:
• The number of symbols in the base, and
• The place value
• Base 16 (hexadecimal)
• Sixteen symbols: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F
• Place value: 16
163 162 161 160
4096s 256s 16s 1s

5 D A 5 5DA516 = 23,97310
General Principles of Number Bases
• A number base tells us two essential and closely related facts:
• The number of symbols in the base, and
• The place value
• Base 2 (binary)
• Two symbols: 0, 1
• Place value: 2
23 22 21 20
8s 4s 2s 1s

1 1 0 1 11012 = 1310 = D16


1 0 0 0 1 0 1 0

Why binary and hex?


8 A

• Computers are fundamentally constructed from switches, transistors,


which have two states, which we can think of as True and False, On
and Off, or 1 and 0.
• When we store information on computer systems it is stored in binary
format, as strings of 1s and 0s.
• Binary is therefore the natural number system to use, however it is
cumbersome to write long strings of 1s and 0s (and easy to make
mistakes), which is one of the reasons that hexadecimal is used – it
maps easily to groups of four binary digits (bits), and gives us a much
more compact form - one byte can be expressed as two hex digits
Significance…
• Bit 0 is the "Least Significant Bit" or LSB
• Bit 7 is the "Most Significant Bit" or MSB
• This was important for serial transmission, where we needed to know
the order of the bits being transmitted
• It is still important for the ordering of bytes, as computers use different
methods of storage:
• Big-Endian (BE) means that the bytes are stored from most-significant
to least-significant
• Little-Endian (LE) means that bytes are stored from least-significant to
most-significant
Significance…
• In a byte, Bit 0 is the "Least Significant Bit" or LSB
• In a byte, Bit 7 is the "Most Significant Bit" or MSB

Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0


1 0 0 1 0 0 1 1

Most Significant Bit Least Significant Bit


Converting Decimal to Binary
(method 1)
Take the decimal number 147, subtract the place value. If positive value results, set binary digit to 1, carry
on with new positive value. If a negative value results, set binary digit to 0, carry on with original value.
Work left to right…

128 64 32 16 8 4 2 1
1 0 0 1 0 0 1 1

147 19 19 19 3 3 3 1
-128 -64 -32 -16 -8 -4 -2 -1
19 XX XX 3 XX XX 1 0
128
16
2
+ 1
147
Converting Decimal to Binary
(method 2)
Take the decimal number 147, repeatedly divide by two, and set the binary digit to the remainder.
Work right to left.

128 64 32 16 8 4 2 1
1 0 0 1 0 0 1 1

2/2 = 1 r0 9/2 = 4 r1 36/2 = 18 r0 147/2=73 r1

1/2 = 0 r1 4/2 = 2 r0 18/2 = 9 r0 73/2=36 r1


128
16
2
+ 1
147
Practice
• Convert 67 to binary
01000011
Check your answer before revealing!!

• Convert 242 to binary


11110010
Check your answer before revealing!!

• Convert 19 to binary
00010011
Check your answer before revealing!!

• Convert 75 to binary
01001011
Check your answer before revealing!!
Practice
• Convert 25 to binary
00011001
Check your answer before revealing!!

• Convert 251 to binary


11111011
Check your answer before revealing!!

• Convert 39 to binary
00100111
Check your answer before revealing!!

• Convert 111 to binary


01101111
Check your answer before revealing!!
Converting Binary to Decimal
Take the binary number 11010010, write it down using the place values for binary and add up the place values
with a '1' binary digit.

128 64 32 16 8 4 2 1
1 1 0 1 0 0 1 0

128
64
16
+ 2
210
Practice
• Convert 01010101 to denary
85
Check your answer before revealing!!

• Convert 00101010 to denary


42
Check your answer before revealing!!

• Convert 00111100 to denary


60
Check your answer before revealing!!

• Convert 11000011 to denary


195
Check your answer before revealing!!
Hex conversions (to binary)
• Converting between hexadecimal and binary is straightforward, as
both number systems are based on a power of two
• (Straightforward, that is, as long as you know the hex digits and their
binary equivalents!)
• To convert from hexadecimal to binary, simply convert each hex digit
in turn into its four binary digit equivalent:

C A 5 9
16

1 1 0 0 1 0 1 0 0 1 0 1 1 0 0 1
2
Hex digits to binary
Hex Binary
0 0000
1 0001
2 0010
3 0011
4 0100
5 0101
6 0110
7 0111
8 1000
9 1001
A 1010
B 1011
C 1100
D 1101
E 1110
F 1111
Hex conversions (from binary)
• To convert from binary to hex, simply group the binary number into
four bit groups, starting at the LSB and padding with zeros, then …
• … convert each four bit binary group into its hex digit:
0110100010

0 0 0 0 0 0 0 1 1 0 1 0 0 0 1 0
2

0 1 A 2
16
Hex conversions (to and from
decimal)
• To convert from hex to decimal, you can take each hex digit, convert it
to decimal, and then multiply it by the place value (in decimal), and
add all the results
• Or, convert to binary, then binary to decimal – this is actually easier
• To convert from decimal to hex, you can repeatedly divide by 16,
noting each remainder as you go, building up the hex number digit by
digit
• Or convert to binary, then binary to hex – this is also easier
Practice
• Convert 2316 to binary using 8 bits
00100011
Check your answer before revealing!!

• Convert 1E16 to denary


30
Check your answer before revealing!!

• Convert 2F16 to binary using 8 bits


00101111
Check your answer before revealing!!

• Convert FF16 to denary


Check your answer before revealing!!
255
Binary and Denary Prefixes
• When we measure the capacity of storage devices, or the speed of
network connections, we need to work in larger units than bytes (for
storage/size) and bits/second or bytes/second (for speed of
connections)
• We conventionally use denary prefixes (kilo, mega, giga and tera)
• However, binary prefixes are also used in some areas
Units of Data Storage
You do not need to memorise these!
Denary Prefixes
Name Notation Power of 10 Value In 1000s
kilobyte kB 103 1,000 bytes 1,0001
megabyte MB 106 1,000,000 bytes 1,0002
gigabyte GB 109 1,000,000,000 bytes 1,0003
terabyte TB 1012 1,000,000,000,000 bytes 1,0004

Binary Prefixes
Name Notation Power of 2 Value In 1024s
kibibyte kB 210 1,024 bytes 1,0241
mebibyte MB 220 1,048,576 bytes 1,0242
gibibyte GB 230 1,073,741,824 bytes 1,0243
tebibyte TB 240 1,099,511,627,776 bytes 1,0244
Mnemonic
You only need to go up to Tera/Tebi

Burger King Makes Great Tomato Paste


Denary Byte/Bit/base Kilo- Mega- Giga- Tera- Peta-
Binary Byte/Bit/Base Kibi- Mebi- Gibi- Tebi- Pebi-

Exam hint – write out this mnemonic at the start of your exam. You really only need to commit the denary names to
memory, and some of these at least should be familiar from your science GCSEs

You can then fill out the Binary prefixes by replacing the last two letters of each denary prefix with 'bi'. So Giga
becomes Gibi; Tera becomes Tebi etc.
Does it Matter?
• For your exam? Yes
• In the real world? Well, maybe…
• There is wide-spread confusion – many people and organisations use
the denary prefixes to mean either the denary or binary units.
• To take one example of confusion, disks and files will report different
sizes on different operating systems (some in Terabytes, some in
Tebibytes, both using the abbreviation TB)
• International standards bodies have been trying to get companies to
use denary prefixes purely for denary units, and binary prefixes for
binary units for almost 30 years
Binary-coded decimal
• Binary-coded decimal (BCD) takes a different approach to encoding
decimal values
• It is not widely used in modern computing, although there are some
cases where it is beneficial
• Standard BCD uses one byte per decimal digit (with wastage of 4+
bits)
• Packed BCD uses one nibble per decimal digit (with wastage of 6 bit
patterns)
Binary and BCD Compared
Denary Binary BCD Denary Binary BCD
0 0000 0000 0000 0000 8 0000 1000 0000 1000
1 0000 0001 0000 0001 9 0000 1001 0000 1001
2 0000 0010 0000 0010 10 0000 1010 0001 0000
3 0000 0011 0000 0011 11 0000 1011 0001 0001
4 0000 0100 0000 0100 12 0000 1100 0001 0010
5 0000 0101 0000 0101 13 0000 1101 0001 0011
6 0000 0110 0000 0110 14 0000 1110 0001 0100
7 0000 0111 0000 0111 15 0000 1111 0001 0101

Binary advantages: Storage efficiency, mathematical correspondence


Binary-Coded Decimal advantages: Human-friendly display

Properties of symbols
Binary Arithmetic – Addition Rules
0 0 1
+ 0 + 1 + 0
0 1 1

zero plus zero = zero zero plus one = one one plus zero = one

1 1
+ 1
1 + 1
1 0 1 1
1 1
one plus one = zero carry one
= two in binary (10)
one plus one plus one = one carry one
= three in binary (11)
0
+ 0
Binary Arithmetic - Addition 0

0
+ 1

0 1 1 0 1 0 1 0 10610 1

1
0 1 1 0 1 1 0 0
+ 1 1 1
10810 + 0
1
1 1 0 1 0 1 1 0 21410
1
+ 1
1
1 0

1
+ 1
1 1
1 1
Binary Arithmetic - Subtraction

0 1
1 1 1 0 1 1 1 1 23910

0 0 0 1 1 0 0 1 2510
-

1 1 0 1 0 1 1 0 21410
Advice
• Take care setting out additions – make sure that your binary digits line up
and are written clearly
• Allow space for your carried bits – these must be shown to score full
marks
• Check your working twice – it is very easy to make errors, and errors cost
marks, cost grades
• Once - Work through the binary addition a second time, check each bit
• Twice – Convert to denary, add, convert the answer back to binary and check
Practice
• Add 101101012 to 000100012 using 8 bits
11000010
Check your answer before revealing!!

• Add 001111002 to 000001112 using 8 bits


01000011
Check your answer before revealing!!

• Add 1216 to 1210 giving the answer in binary using 8 bits


00011110
Check your answer before revealing!!

• Add 110101102 to 111011112 using 8 bits


Check your answer before revealing!!
[1] 11000101 2
Overflow
• If you completed the practice additions on the last slide, you should
have noticed a problem with the last question
• You were asked to use 8 bits for your answer, but the result of the
addition can only be expressed in 9 bits
• This is called an 'overflow' and it occurs when we have a limited
number of bits, and we are trying to work with a number which
cannot be stored in that number of bits
Binary Arithmetic - Subtraction
b7 b6 b5 b4 b3 b2 b1 b0
0 1 1
1 1 10 0 1 1 1 1 20710

0 0 1 1 1 0 0 1 5710
-

1 0 0 1 0 1 1 0 15010

The subtractions for bits 0 to 3 are straightforward. When we get to bit 4, we need to borrow in order to
perform the subtraction, but bit 5 is zero, so we need to borrow from bit 6 – this makes the value in bit 5 10 2
(that is 2 in decimal/denary), we now borrow one from this, reducing it to 1 2, and making bit 4 102 so when we
perform that subtraction, we are subtracting 12 from 102, resulting in 12.

The need for unpredictably long chains of 'borrows' makes subtraction difficult to implement in any electronic
(or mechanical) system.
Binary Arithmetic
• Addition of two numbers – by our rules
• Adding more than two numbers? Divide and conquer…
• Subtraction? Well, clearly we have a process, just as we did for
addition, but it would be much more complicated to perform
electronically, so there is a neat trick we use instead…
• Multiplication – repeated addition
• Division – repeated subtraction
Representing negative numbers
• With N bits, we can represent numbers from 0 to 2N-1
• For example, with 8 bits, our range is 0 to 255
• What if we want to represent negative numbers too?
• We could, instead, reserve the most significant bit (MSB) for the sign:

Sign bit

0 0 0 0 1 1 1 0 1410

1 0 0 0 1 1 1 0 -1410
Sign and Magnitude
• This approach is called ‘sign and magnitude’ – the most significant bit is the sign (1 being
negative, 0 positive) and the remaining bits being the magnitude in the normal way.
• With N bits, we can now represent numbers from -2(N-1)-1 to 2(N-1)-1
• For example, with 8 bits, our range is -127 to +127
• …but, we have +0 (00000000) and -0 (10000000)!

Sign bit

0 0 0 0 1 1 1 0 1410

1 0 0 0 1 1 1 0 -1410
Representing negative numbers
• … and what happens if we add 14 and -14?

Sign bit

0 0 0 0 1 1 1 0 1410

1 0 0 0 1 1 1 0 -1410
+ 1 1 1

1 0 0 1 1 1 0 0 -2810
Two's Complement
• Two's Complement takes a different approach to representing
negative binary numbers
• There are two methods to find a Two's Complement representation:
• Two's Complement of X = 2N – X
• Or, "invert the bits and add one" 0 0 0 0 0 0 1 1 310

• NB – convert back in the same way 0 0 0 0 0 0 1 0 210

0 0 0 0 0 0 0 1 110
• Using this representation, we still have a sign bit
0 0 0 0 0 0 0 0 010
• We can represent numbers from -2(N-1) to 2(N-1)-1
1 1 1 1 1 1 1 1 -110

1 1 1 1 1 1 1 0 -210

1 1 1 1 1 1 0 1 -310
Two's Complement

How to express -1410 in Two's Complement How to convert from Two's Complement

128 64 32 16 8 4 2 1 128 64 32 16 8 4 2 1

0 0 0 0 1 1 1 0 1410 (negative) 1 1 1 1 0 0 1 0

Invert 1 1 1 1 0 0 0 1 Invert 0 0 0 0 1 1 0 1

Add 1 1 1 1 1 0 0 1 0 -1410 Add 1 0 0 0 0 1 1 1 0 1410 (negative)


Two's Complement
That 'Add 1' step…

How to express +1410 in Two's Complement: How to express -5.2510 in Two's Complement:

128 64 32 16 8 4 2 1 8 4 2 1 ½ ¼ 1/8 1/16

0 0 0 0 1 1 1 0 1410 0 1 0 1 0 1 0 0 5.2510

Invert 1 0 1 0 1 0 1 1
We do not make any change!

Add 1 1 0 1 0 1 1 0 0 -5.2510

We add to the Least Significant Bit


Two's Complement – where to 'add
one'
• To represent -7.510 in Two's Complement, our first step is to represent the value 7.5 in
binary:
111.1
111.1
• We can put as many leading and trailing zeroes in as we like:
000000000000000111.10000000000000000000
000000000000000111.10000000000000000000
• To get the Two's Complement representation, we first invert:
111111111111111000.01111111111111111111
111111111111111000.01111111111111111111
• Then we add one in the LSB position:
111111111111111000.10000000000000000000
111111111111111000.10000000000000000000
• So our Two's Complement representation of a negative number has an infinite number of
leading ones, and an infinite number of trailing zeroes.
Addition with Two's Complement

0 0 0 0 1 1 1 0 1410

Invert 1 1 1 1 0 0 0 1

0 0 0 0 1 1 1 0 1410
Add 1 1 1 1 1 0 0 1 0 -1410
1 1 1 1 0 0 1 0 -1410
+ 1 1 1 1 1 1 1

1 0 0 0 0 0 0 0 0 010

For Two's Complement, the overflow works in a different way…


So What is One's Complement?
• We have seen how useful Two's Complement is when we need to
represent negative integers in binary
• You have already seen One's Complement – it just means inverting all
of the bits, and is the first step in converting to Two's Complement

0 0 0 0 1 1 1 0 1410

Invert 1 1 1 1 0 0 0 1 One's Complement

Add 1 1 1 1 1 0 0 1 0 -1410 Two's Complement


Practice
• Convert -2710 to Two’s Complement binary using 8 bits
11100101
Check your answer before revealing!!
2

• Convert 111111112 from Two’s Complement binary to denary


-1Check
10
your answer before revealing!!

• Convert -2F16 to Two’s Complement binary using 8 bits


Check your answer before revealing!!
11010001 2

• Convert +2710 to Two’s Complement binary using 8 bits


Check your answer before revealing!!
000110112
Programming Challenges
• There are a series of programming challenges for you on OneNote
• Open the 12D-Cs Notebook
• Go to “AS 1 Information Representation”
• Read the “Number Base Conversions using Python” page
• Paste your solutions into the “Your Programs” page
Boolean
• We will cover Boolean Algebra and Logic Gates in Trinity
• For now, we need to look at some 'bitwise' operations, and that
requires familiarity with some basic Boolean operations
Boolean Operations – NOT, AND, OR,
XOR
Q = NOT(X) Q = X AND Y Q= X OR Y Q = X XOR Y
Q=¬X Q=X∧Y Q=X∨Y Q=X⊻Y

X Q X Y Q X Y Q X Y Q
0 1 0 0 0 0 0 0 0 0 0
1 0 0 1 0 0 1 1 0 1 1
1 0 0 1 0 1 1 0 1
1 1 1 1 1 1 1 1 0
Bitmasks
• A "bitmask" is data used to set, clear, or select certain bits from other
values
• A bitmask can be applied using a Boolean operator – AND, OR, XOR

value (X) 0 0 1 1 1 0 1 0

AND

bitmask (Y) 1 1 1 1 0 0 0 0

result (Q) 0 0 1 1 0 0 0 0
Bitmasks
• A "bitmask" is data used to set, clear, or select certain bits from other
values
• A bitmask can be applied using a Boolean operator – AND, OR, XOR

value 0 0 1 1 1 0 1 0

OR

bitmask 1 0 0 0 0 0 0 0

result 1 0 1 1 1 0 1 0
Bitmasks
• A "bitmask" is data used to set, clear, or select certain bits from other
values
• A bitmask can be applied using a Boolean operator – AND, OR, XOR

value 0 0 1 1 1 0 1 0

XOR

bitmask 0 0 0 0 1 1 1 1

result 0 0 1 1 0 1 0 1
Examples
• An AND operation with the mask 10101010 is applied to the binary
number 01010101. Show the result.
Shifts
• Logical shift left and right
• For unsigned binary values = multiply and divide by 2
• Circular shift
• Arithmetic shift left and right
• For signed (2sC) and unsigned binary values ≈ multiply and divide by 2
• Note the weird overflow convention and rounding
Shifts, multiplication and division
• In any base, shifting left is broadly equivalent to multiplying by the
base, and …
• Shifting right is broadly equivalent to dividing by the base
• For fixed size representations, we have to be aware of overflow and
underflow, and…
• For two's complement (or any signed system) representations, we
need to take account of the sign information
103 102 101 100

9 4 7 3
Logical Shifts

0 1 0 1 1 0 0 1 8910
Logical shift right
Always shift a zero into the MSB
0 0 0 1 0 1 1 0 0 1 4410

0 1 0 1 1 0 0 1 8910

Logical shift left


Always shift a zero into the LSB 17810
0 1 0 1 1 0 0 1 0 0
Arithmetic Shift Right (sign
preserving)
0 1 0 1 1 0 0 1 8910
Arithmetic shift right – positive number
(rounds towards zero as expected)
0 0 1 0 1 1 0 0 1 4410

1 1 0 1 1 0 0 1 -3910
Arithmetic shift right – negative number
(rounds towards negative infinity)
1 1 1 0 1 1 0 0 1 -2010
Arithmetic Shift Left (sign
preserving)
0 0 0 1 1 0 0 1 2510
If we interpret this as +25 in 2sC, then everything
works, and the matching MSB and carry out
indicate no problem.
0 0 0 1 1 0 0 1 0 0 5010

0 1 0 1 1 0 0 1 8910
The carry out and MSB difference indicates an
overflow situation as we cannot represent +178 in
2sC using 8 bits.
1 0 0 1 1 0 0 1 0 0 5010
Arithmetic Shift Left
1 1 0 1 1 0 0 1 -3910
As the carry out and MSB are the same, this ASL
has worked with the 2sC number -39, producing
the correct result.
1 1 0 1 1 0 0 1 0 0 -7810

1 0 0 1 1 0 0 1 -10310
Here, the attempt to ASL the 2sC number -103 has
failed, as the result of -206 exceeds the largest
negative number we can represent in 8 bits with
2sC. The difference between the MSB and carry 0 1 0 1 1 0 0 1 0 0 -7810
out indicate this.
Circular Shift Right
1 1 0 1 1 0 0 1
Circular shift right (rotate right)

1 1 1 0 1 1 0 0

Circular shift right (rotate right) through carry 1 1 0 1 1 0 0 1

Carry was originally 0, so that gets shifted into bit7,


the value of bit0 is shifted into the carry 0->1 0 1 1 0 1 1 0 0
Circular Shift Left
1 1 0 1 1 0 0 1
Circular shift left (rotate left)

1 0 1 1 0 0 1 1

Circular shift left (rotate left) through carry 1 1 0 1 1 0 0 1

Carry was originally 0, so that gets shifted into bit0,


the value of bit7 is shifted into the carry 1 0 1 1 0 0 1 0 0->1
Characters and Strings
• A character (or char) is a single symbol such as 'a', '+' or '&'
• A string is a collection of characters, usually with a terminator (which
will be unprintable)
• String manipulation is an important area that we will examine in more
detail when we move on to some practical programming
• For the remainder of this session, we will focus on how characters are
represented
ASCII
• American Standard Code for Information Interchange
• Based on telegraph codes
• Very basic character set (although Extended ASCII adds more)
• Largely replaced by Unicode
ASCII Conversion in Python

char = ‘B’
num = 65
print(ord(char)) # 66
print(chr(num)) # ‘A’
ASCII Secrets…
• The ASCII codes for the denary numbers contain their binary
equivalents:
Denary character ASCII Code
'0' 0011 0000
'1' 0011 0001
'2' 0011 0010
'3' 0011 0011
'4' 0011 0100
… …
'9' 0011 1001
ASCII Secrets…
• Conversion between upper case and lower case letters requires one
bit flip (and we can test for upper/lower case by looking at one bit):
Upper case ASCII Code Lower case ASCII Code
character character
'A' 0100 0001 'a' 0110 0001
'B' 0100 0010 'b' 0110 0010
'C' 0100 0011 'c' 0110 0011
'D' 0100 0100 'd' 0110 0100
… … … …
'Y' 0101 1001 'y' 0111 1001
'Z' 0101 1010 'z' 0111 1010
ASCII Secrets…
• The least significant five bits of a character tell us the position of the
letter in the alphabet:
Upper case ASCII Code Lower case ASCII Code
character character
'A' 0100 0001 'a' 0110 0001
'B' 0100 0010 'b' 0110 0010
'C' 0100 0011 'c' 0110 0011
'D' 0100 0100 'd' 0110 0100
… … … …
'Y' 0101 1001 'y' 0111 1001
'Z' 0101 1010 'z' 0111 1010
ASCII – points to note
• All codes are seven bits, leaving one bit for a parity check
• Number characters convert to numeric equivalents through masking the
two most significant bits (bits 6 and 7)
• Upper case Latin characters convert to lower case Latin characters by
setting second most significant bit (bit 6)
• Alphabetic sorting is very simple (and note the careful positioning of
both 'A' and 'a')
• ASCII was created in the early 1960s and based on telegraph code
• Its heritage gives rise to severe limitations, particularly a limitation to
the Latin alphabet, but also US-centricity
Unicode
• Unicode is one of the most misunderstood initiatives in Computer
Science, so, strap in… First, we need to understand how ASCII was
used and abused (or ‘Extended’)
• ASCII is a simple standard, with a direct mapping from a code to a
character (for example binary code 1000001 maps to 'A')
• Remember that ASCII only used 7 bits? Well, when there was no
longer a need for a parity bit, all the codes from 128 to 255 became
available. These new code sets were called ‘Extended ASCII’
• And they were used by lots of people for lots of different things,
culminating in much confusion between standards bodies and industry
Microsoft (not ANSI!) Code Pages
0

ASCII ASCII ASCII ASCII ASCII


Equivalent Equivalent Equivalent Equivalent Equivalent

127
128
Western
European
Cyrillic Greek Hebrew Arabic Based on ISO-8859

255
1252 1251 1253 1255 1256
Enter Unicode
• Unicode takes a different approach, separating the coding of
characters from their representation
• The Latin character A has the Unicode identification (or code point)
U+0041 (those are hex digits by the way)
• The upper limit for a Unicode code point is U+10FFFF, which gives us a
theoretical maximum of 1,114,112 characters (the actual maximum is
smaller, and only around 10% are in use, some values are reserved)
• So one part of the Unicode standard is a truly huge list of characters
(Latin letters, pictograms, emojis, symbols, hieroglyphs, …)
Enter Unicode
• The other part of the Unicode standard deals with how these
characters can be represented (in memory, on disk, emails, websites)
• These rules are called Unicode Transformation Formats (UTF), and
there are a number in use, the main ones being:
• UTF-8
• UTF-16
• UTF-32
• These formats do *not* map to 8, 16, and 32 bits directly – do not fall
into that trap!
• They do, however, map to a minimum size
UTF-8
• UTF-8 is the most popular encoding by far, partly because it is fully
backward-compatible with ASCII
• The UTF-8 encoding can use 1, 2, 3, or 4 bytes to encode a character
0 1 0 0 0 0 0 1 4116 U+0041='A'

1 1 0 0 0 0 0 1 1 0 0 0 0 0 0 1 4116

1 1 1 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 4116

1 1 1 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 4116
1 0 0 0 0 0 04116

1 1 1 1 0 0 0 0 1 0 0 1 1 1 1 1 1 0 0 1 0 0 1 0 1 1F4A916 💩
1 0 1 0 1 0 04116
UTF-16
• The UTF-16 encoding can use 2 or 4 bytes to encode a character
• Raises the problem of byte ordering (BE v LE)
• Requires the Byte Order Mark (BOM)
• Trade-off between compact format and efficiency
• Less-used than UTF-8

0 0 0 0 0 0 0 0 1 4116
1 0 0 0 0 0 04116

1 1 0 1 1 0 0 0 0 1 1 1 1 1 0 1 1 1 0 1 1 1 0 0 1 1F4A916
1 0 1 0 1 0 04116
UTF-32
• UTF-32 uses a fixed size of 4 bytes to capture the 21 bit Unicode
character information
• Very straight-forward, but inefficient (4 bytes for all code points, MS
byte always 00000000)
• Same byte ordering problems and solutions as UTF-16
• Very rarely used
<!DOCTYPE html>

Growth of Unicode
<html lang="en">
<head>
<meta charset="utf-8" />

Rising to 98% in
2023 [1]

[1] Usage Statistics and Market Share of Character Encodings for Websites, October 2023 (w3techs.com)
Do try this at home…
• Use Notepad to create text files in different encodings
• Use a local or an online hex editor (e.g., hexed.it) to examine the file
Images
• Images, like all other data, have to be represented in binary when
stored on computer systems
• We have two main types of image file:
• Bitmap
• Vector
Photo Images
• When an image is captured on digital camera, or when a printed
photo is scanned, we end up with a bitmap, which is a grid of pixels
(short for picture elements)
• When viewed at the intended resolution, the human eye 'fills in' the
image, and we are convinced that we are viewing a realistic image
rather than a grid of coloured dots
Bitmaps – image resolution and
colour depth
• Image resolution is the number of pixels that make up a bitmap
• We usually define this in terms of width x height

• Not required for A level:


• Most smartphones, for example, use "Full HD" resolution of 1920 x
1080 pixels, giving us over two million pixels
• "4K" resolution doubles the pixel count in both dimensions, leading to
an image resolution of 3840 x 2160 or over 8 million pixels

Note – there are several different 4K resolutions, but 3840 x 2160 is the most-used in consumer electronics; 4096 x 2160
(also known as Ultra HD or UHD) is the most used in the movie industry
Bitmaps – image resolution and
colour depth
19 pixels wide

10 pixels high
Bitmaps – image resolution and
colour depth
• Having looked at image resolution, we can now talk about each pixel
in an image
• The "colour depth" of an image is the number of bits used to capture
the colour information in each pixel
• With a colour depth of two, we only have two colours – these would
usually map to black and white and our image would be monochrome
(although with a high enough resolution, our eyes would interpret
this as grayscale)
• Early computer graphics used eight bit colour (so, one byte of colour
information for each pixel)
Colour depth – 8 bit choices
• With most colour information for electronic displays, we specify red,
green and blue or RGB components
• With just 8 bits, many early systems used three bits for red, three bits
for green, and two bits for blue (because the human eye is less
sensitive to the blue end of the spectrum)
• This gave us a very limited range of colours
• An alternative approach with 8 bits used a colour 'palette' – a specific
set of 256 colours chosen for the image that we are displaying
• This led to much higher quality images, but greater complexity
Colour depth – 8 bit choices
• With most colour information for electronic displays, we specify red,
green and blue or RGB components
• With just 8 bits, many early systems used three bits for red, three bits
for green, and two bits for blue (because the human eye is less
sensitive to the blue end of the spectrum)
• This gave us a very limited range of colours
• An alternative approach with 8 bits used a colour 'palette' – a specific
set of 256 colours chosen for the image that we are displaying
• This led to much higher quality images, but greater complexity
Colour depth – beyond 8 bits
• Most computer systems use 24 bits (three bytes) for colour
information, with one byte each for red, green, and blue
• This leads to over 16 million colours, comfortably beyond the ability
of the human eye to distinguish between colours (depending on
several factors, our eyes can distinguish between 1 million and 10
million colours)
• 24 bit colour depth is often referred to as "True Colour", although
fidelity depends very much on the quality of the display
• Some systems use 48 bit colour depth for technical reasons
Colour depth examples

2 Bit (monochrome)
Drawn Bitmaps
• Early video games used low-resolution bitmaps
• Most modern games use high-resolution bitmaps to achieve photo-
realistic scenes
Vector Graphics
• The main disadvantage with the bitmap format is that resizing the
image inevitably causes problems with image quality
• Each bitmap is design to be viewed at a particular resolution
• Vector graphics are more like very compact programs; they contain
instructions for drawing lines and shapes
• Vector graphics look 'perfect' at any resolution so they can be resized
without any loss of image quality
Vector Example (Simple)
Vector Example (Complex)
Vector vs Bitmap
• Vector graphics maintain image quality when resized
• Vector graphics files are much smaller
• It is much easier to create and edit vector graphics
• Vector graphics files have to be processed before they can be
displayed; very high-quality images require significant processing

• Bitmaps can be created at high levels of image quality – it is easy to


achieve photo-realism
• Very high-quality bitmaps can be displayed with relatively little
processing
Vector vs Bitmaps
Vector Bitmaps
Company logos Photos
Technical drawings Photo-realistic art
Cartoons
Web graphics
Some computer games Most computer games

Simpler images and non-real time display Complex images, real-time display
Sound - Introduction
• Converting analogue information to digital
• Sampling frequency and 'depth' (number of bits per sample)
Analogue to Digital Conversion

Digital file size = Sampling rate x Resolution x Channels x Time


#bits per second #bits # seconds

3.5 minute audio = 44.1 kHz x 16 x 2 x 210 = 300 million bits (37 Mbytes)
How to Tackle Calculations
• Notes
• sound file size = sample rate x duration (s) x bit depth
• image file size = colour depth x image height (px) x image width (px)
• text file size = bits per character x number of characters

• Show each of these separately, and break out what each one may look like
(e.g., sample rate may be in Hz, KHz, etc
• Key point to remember to divide by 8 if we are given bits and want the
answer in bytes or kb/mb/gb…
• Broadly, you are going to be multiplying everything else!
Analogue to Digital Conversion

Digital file size = Sampling rate x Resolution x Channels x Time


#bits per second #bits # seconds

3.5 minute audio = 44.1 kHz x 16 x 2 x 210 = 300 million bits (37 Mbytes)
Representative file sizes
Media Average size
eBook 2.5 Mbytes
MP3 song 3.5 Mbytes
DVD Movie 4 Gbytes
HD Movie 12 Gbytes
Blu-ray Movie 22 Gbytes
4k Movie >100 Gbytes
Use of compression
• Storage
• Transmission
• Measurement of compression
Types of Compression
• Lossless
• Lossless compression is always reversible – we can regenerate the original
digital file in its entirety
• Lossless compression depends on statistical redundancy in data – for example
large areas of the same colour in an image, or repeated patterns of bytes
• Lossy
• Lossy compression is irreversible – it depends on throwing away some (non-
essential) information
• Lossy compression always leads to some degradation of the image or audio
being compressed – there is a trade-off between loss of quality and reduction
in size
Applications
• Lossless compression is used where a reduction in quality is
undesirable (for example, maintaining the original quality of an audio,
image or video file), or unacceptable (for example, documents)
• Lossy compression is used for audio, image and video files in cases
where some loss of fidelity is acceptable – for example, the human
eye is more sensitive to luminance (brightness) than it is to colour
variations; the human ear is most sensitive in the 2kHz to 5kHz band
within the 'absolute' limits of human hearing of 200Hz to 20kHz
• Lossy compression can be much, much more effective than lossless,
and we can dynamically choose how much information we are
prepared to lose to get greater compression

You might also like