Lect1_DataStorage
Lect1_DataStorage
1-1
Bits and their meaning
1-2
Bit patterns
1-3
Boolean operations
1-4
Figure 1.1 The Boolean operations
AND, OR, and XOR (exclusive or)
1-5
Gates
1-6
A pictorial representation of AND, OR, XOR, and
NOT gates, as well as their input and output values
1-7
Flip-flops
1-8
Figure 1.3 A simple flip-flop circuit
1-9
Figure 1.4 Setting the output of a
flip-flop to 1 (cont’d)
1-10
Hexadecimal notation
• The hexadecimal system is a counting system using
base 16 (c.f. decimal, which uses base 10 -see later
notes)
1-11
Figure 1.6 The hexadecimal
coding system
1-12
Main Memory
1-14
Figure 1.7 The organization of a
byte-size memory cell
MSB LSB
1-15
Word : A general term describing several
consecutive bytes in memory (usually 2 or 4).
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
2-byte
word
MSB LSB
30 29 28 27
31 5 4 3 2 1 0
4-byte word
1-16
Main memory addresses
1-18
Main Memory
• Every computer contains both Random Access
Memory (RAM) and Read-Only Memory (ROM).
• RAM is a type of volatile memory that saves the files
people are operating on for a short period. Whereas,
ROM is a type of non-volatile memory that saves
commands for a computer indefinitely.
• Data that has been saved in RAM can be recovered
and modified because it can be read and write.
However, only read-only data can be stored in ROM.
1-19
Main Memory: Physical view
Bank 0 Bank K
0 0 • Memory is composed of a
1 1
number of banks.
2 2
3 3
. . • Each bank has a number of
. . bytes arranged as an array.
. .
. .
. .
N N
1-20
Main Memory: Physical view
Bank 0
• Some bytes are kept in RAM, some in
ROM, some in I/O ports (special purpose
ROM registers).
Port 1
• Bytes in RAM can be read/written.
Port 2
• Bytes in ROM can be read only.
Port n
RAM
1-21
Main Memory: Logical view
("Logical" means "As a programmer would imagine it")
Main
Memory
0
1 • Memory is composed of a number of bytes
2
3
• Bytes are arranged as an array
. • Each byte has an address (byte#)
.
1-22
Measuring memory capacity
1-23
Mass Storage Systems
1-26
Files
• File is the unit of data stored on a mass storage
system.
– Logical record natural groups of data within a file
i.e. natural from the programmer point of view
– Field : sub-unit of logical record
• Key field holds identifier key
• Physical record: A block of data conforming to
the physical characteristics of the storage device.
• Buffer: A memory area used for the temporary
storage of data . 1-27
Figure 1.12: Logical records
versus physical records on a disk
1-28
Representing Data
External representation: (for people) to use data
easily, data must be represented (visually,
graphically, ...) on paper or on some other directly
useable media.
1-29
Representing Data
1-30
Representing Data
In a computer system:
Text data must be represented according to
standard coding table(s),
Numerical data must be represented by some
notation (a number system)
Representing Numeric Data
• Internal number representation:
Always a sequence of 0's and
1's: ...01100111001010 ...
• External number representation:
Many notations (number representation systems)
• Context-dependent notations
• Positional (fixed point) notations
• Floating point notation
1-32
Representing Numeric Data (Externally)
Non-positional notation
Roman Numerals:
• Positions do not carry weights
• Digits are interpreted according to context
Numeral means
XIV = 14
I 1 or -1 (add 1 or subtract 1)
MCMLXXIV = 1974
V 5 or -5
X 10 or -10
Negative numbers had not
L 50 or -50 been invented yet!
C 100 or -100
D 500 or -500
M 1000
1-33
Representing Numeric Data (Externally)
Positional notation
1-34
Representing Non-numeric Data
When the computing device has to handle letters
and special symbols, each symbol is coded
separately.
Two dominant coding schemes are:
ASCII (American Standard Code for Information
Interchange)
EBCDIC (Extended Binary Coded Decimal
Interchange Code)
1-35
Representing Strings in ASCII
A N K A R A ...
Code for 'A' in ASCII table
1-37
Figure 1.13 The message “Hello.”
in ASCII
1-38
1-39
1-40
Octal encoding:
for encoding a bit stream in groups of three
Bit pattern Octal representation
000 0
001 1
010 2
011 3
100 4
101 5
110 6
111 7
1-41
Representing Images
Two approaches:
(1) Bit map techniques
– Represent image as collection of dots (pixels)
– Black and white binary images
• Pixel values either 0 (black) or 1 (white)
– Black and white (monochrome) images
• Pixel values range from 0 to 255 (commonly)
– Colour images
• Pixel represented as 3 colour components
– Red, Green & Blue (RGB)
• Problems with resizing, zooming
1-42
Representing Still Images
1-43
Colour depth
4-bit colour
1-bit colour
1-44
• The colour depth of an image is determined by the available
palette size
– 1-bit images support 2 “colours” black/white binary images
– 8-bit images support 256 grey levels
– 24-bit images support 16 million colours
• Many image formats:
– .png - Portable Network Graphics Format
– .bmp - Windows bitmap
– .pic – PC Paint graphics format
– .mac – Macintosh MacPaint format
– .gif (Graphics Interchange Format) – common for graphics on the world wide web
– .jpg – JPEG image (Joint Photographic Experts Group) – platform independent –
used for photos
– .tif ( Tagged image file format (TIFF)
1-45
Representing numeric values
1-46
base ten (decimal - or denary)
& base two (binary) systems
1-48
Figure 1.16 Decoding the binary
representation 100101
1-49
Figure 1.17 An algorithm for finding
the binary representation of a positive
integer
1-50
Figure 1.18 Applying the algorithm in
to obtain the binary representation of
thirteen
1-51
Informal restatement of the algorithm
Finding the binary representation of a positive integer P:
Repeatedly divide P by 2, recording the remainders from
right to left. Stop when you get 0 in the result (not as the
remainder)
Example: P=13
13 = 2 x 6 +1 1.
6=2x3+0 01.
3=2x1+1 101.
1=2x0+1 1101. STOP
1-52
Figure 1.19 The binary addition
facts
1-53
Decimal addition
Addition in base 10:
1 8 0 9
+ 0 3 2 3
2 1 3 2
• • (carry)
• We start with a carry value of 0.
• Whenever the sum gets larger than 9, we
add 1 to the carry.
• The carry is added at the next position.
1-54
Binary addition
Addition in base 2:
1 1 0 1
+ 0 1 0 1
10 0 1 0
• • • (carry)
• We start with a carry value of 0.
• Whenever the sum gets larger than 1, we add
1 to the carry.
• The carry is added at the next position.
1-55
Fractions in binary:
Decoding the binary representation
101.101
1-56
Finding the binary representation of a
fraction
Informal algorithm for encoding:
Repeatedly subtract successive negative powers of 2, recording the
success (1) or failure (0) of the subtraction, from left to right after the
radix point. Stop when subtraction gives result 0.
Example: N = 5/8
5/8 = 1/2 +1/8 .1
1/8 = 0 x 1/4 + 1/8 .10 subtraction failed
1/8 = 1/8 + 0 . 101 STOP
2-1 = 1/2
2-2 = 1/22 = 1/4
2-3 = 1/23 = 1/8
1-57
binary representation of a fraction
- same method
Example, M = 0.33
0.33 = 0 x 0.5 + 0.33 .0
0.33 = 1 x 0.25 + 0.08 .01
0.08 = 0 x 0.125 + 0.08 .010
0.08 = 1 x 0.0625 + 0.0175 .0101
0.0175 = 0 x 0.03125 + 0.0175 .01010
0.0175 = 1 x 0.015625 + ….. .010101
Process may not terminate.
Stop when available space filled
i.e. truncate the representation
1-58
Representing Integers
1-59
• Signed integers are whole numbers that can be positive or
negative, 4 bytes, ranges from
-2147483648 to 2147483647
• An integer can be coded in:
– Sign-magnitude notation
– One's complement notation
– Two’s complement notation the most popular representation
– Excess notation another, less popular representation, often
used for exponent values in floating point representations
1-60
Representing Integers (Internally)
Sign-magnitude
1-62
Representing Integers (Internally)
Sign-magnitude
2’s Complement
-2n-1 to 2n-1 -1
e.g. For one byte, n=8, from -128 to +127
1-65
The 1's complement representation of a positive integer is the
same as the original binary, with MSB is set to 0
- i.e. Same as sign/magnitude representation
The 1's complement of a negative integer is obtained by
subtracting its magnitude from 2n -1 where n is the number
of bits used to store the integer in binary.
e.g. One byte, n=8, subtract from 1111 1111
-1 becomes:
1111 1111
-0000 0001
= 1111 1110
1-66
One’s complement
example using 8 bits (1 byte)
• Has two representations of zero: 00000000 (+0) and
11111111 (-0).
• To add two numbers represented in this system, one
does a conventional binary addition, but it is then
necessary to add any resulting carry back into the
resulting sum: “end-around carry”
• e.g. add -1 and 2
11111110 + 00000010 = 00000000
with carry 1.
Add this to LSB, giving result 00000001
1-67
• One’s complement was popular in older
computers (e.g UNIVAC, PDP-1)
• The problems of multiple representations of 0
and the need for the end-around carry are
eliminated using two's complement.
• 2’s Complement of negative integer:
– Subtract each digit of its magnitude from 1
– Add 1 to the result
1-68
Representing Integers (Internally)
Two's complement notation
Representing binary numbers (here, in one byte only)
2's complement of -19
Binary code for +19 = 0001 0011
1’s complement => 1111 1111 - 0001 0011
(means flip)= 1110 1100
2's complement = 1110 1100 + 1 = 1110 1101
1-69
Two’s Complement
1-70
1-71
Figure 1.22 Coding the value -6
in two’s complement notation using four bits
(using an alternative algorithm)
Start with binary representation of the postive value:
1-72
Two’s Complement
1-73
Figure 1.21 Two’s complement
notation systems
1-74
Figure 1.23 Addition problems
converted to two’s complement
notation
1-75
Two's-complement addition
• The largest positive integer that can be stored in N bits in 2's
complement form is 2N-1 -1, whose binary bit pattern is one 0 followed
by N-1 1s
– e.g one byte, N=8,
largest +ve value = 27-1 = 127
12710 = 0111 11112
• Therefore, if two positive 2's complement integers are added and their
sum is greater than +127 an overflow will occur.
1-77
Two’s-complement addition
second example:
e.g. -60 -68
-68 : 10111100
-60 + 11000100
= 10000000
......
one carry into MSB, but one carry-out
– no overflow: answer is -128
1-78
Excess-N representation
A value is represented by the unsigned number which is
N greater than the intended value. Thus 0 is represented
by N, and −N is represented by the all-zeros bit pattern.
1-79
Figure 1.24 An excess eight
conversion table
1-80
Figure 1.25 An excess notation system
using bit patterns of length three (excess-4)
1-81
Representing Integers (Internally)
External data is converted into internal form, with the
aid of the programming language being used:
Using C, a programmer would:
• Select concrete type for the data (according to rules)
int, long, char, double ...
• Select a name for the data (to be able to use it):
int A ;
double F;
• Specify the value (in external notation) to be used
A = 0xFF; or A = 255;
1-82
Representing Integers (Internally)
1-85
Data Compression
• Reduce amount of data needed
– For data storage
– For data transfer
• Compression Ratio
Describes ratio of original data size to
compressed size e.g. 14:1
compress
decompress
store/transfer 1-86
Lossy vs. Lossless Compression
1-87
Generic Data Compression
Techniques I
• Run-length encoding: It works by finding runs of
repeated binary patterns and replacing them with a
single instance of the pattern and a number that
specifies how many times the pattern is repeated. e.g.
888888833333 encoded as 8[7]3[5]
• Relative encoding: It records the differences
between consecutive data units rather than
entire units; each unit is encoded in terms
of its relationship to the previous unit.
1-88
Generic Data Compression
Techiques II
• Frequency-dependent encoding: length of the
bit pattern to represent a data item inversely
related to the frequency of the item’s use
(variable length coding)
– Huffman codes: It is most commonly used to
compress text documents. It works particularly well
when characters appear multiple times within the
file that is going to be compressed.
1-89
variable
length
coding
fixed length
coding
1-90
Adaptive Dictionary Encoding
Or “Dynamic Dictionary Encoding” e.g. Lempel-Ziv
encoding systems (LZW)
“Dictionary” is set of blocks/units from which the
message/file to compress is constructed.
• Message is encoded as a sequence of references to the
dictionary. Instead of storing the whole text as a set of
character codes (letter by letter), the compression
algorithm would allocate a code for each different word
and store the code instead of the word.
• Dictionary is allowed to change during encoding.
1-91
LZW encoding: simplified example
• Encoding
– Message is xyx xyx xyx xyx
– Dictionary initially has 3 entries: “x” “y” “ ”
– First encode xyx as 121 – i.e. As 1st, 2nd and 1st
dictionary entries.
– Then the space, giving 1213
– Space means that preceding string xyx is recognised as
a word. Add it to dictionary as 4th entry
– Encoding proceeds, using new entry. Entire message
is encode as 121343434
1-92
LZW encoding: simplified example
• Decoding
– Encoded message is 121343434
– Coding Dictionary initially has 3 entries: “x” “y” “ ”
– First decode 1213 as xyx with following space
– Space means that preceding string xyx is recognised as
a word. Add it to dictionary as 4th entry
– Decoding proceeds, using new dictionary entry.
– 12134 produces xyx xyx
– Entire message is decoded as xyx xyx xyx xyx
which is the original message
1-93
Compressing Images
• GIF (graphic interchange format)
– lossless compression
– Limits colours to a pixel to only 256
– uses Dictionary Encoding
• 256 encodings stored in dictionary (the ‘palette’)
• pixel can be represented by 1 byte (referring to 1 dictionary entry)
• the colour represented is defined by 3 bytes (for Red, Green &
Blue components, RGB)
• can be made adaptive, using LZW techniques
• unsuitable for high precision photography
1-94
• JPEG (Joint Photographic Experts Group)
– encompasses several different modes
• has lossy mode but does not produce high levels of
compression,
– lossy modes are widely used
• often default compression method in digital cameras
– JPEG’s baseline standard (lossy sequential mode) has
become the standard of choice in many applications
• uses a series of lossy and lossless techniques to achieve
significant compression without noticeable loss of image quality
1-95
• JPEG baseline standard (“lossy sequential
mode”) uses a series of compression techniques
– exploits perceptual limits and simplifies colour data,
preserving luminance
• blurs the boundaries between different colours while maintaining all
brightness information
– divides image into 8x8 pixel blocks
– applies DCT-based compression (difference of
blocks)
– additional compression achieved using run-length
encoding, relative encoding and variable length
encoding
1-96
Communication Errors
1-98
Figure 1.28 The ASCII codes for the
letters A and F adjusted for odd parity
1-99
Error-Correcting Codes
• Error-Correcting Codes are the codes that allow
the reciver of a message to detect errors in the
message and to correct them.
• The Hamming Distance between two patterns
of bits is the number of bits in which the
patterns differ
– e.g. distance between 010101 & 011100 is 2
– or, distance between 11100111 & 11011011 is 4
[ XOR the bits then sum bits in the result ]
1-100
Error-correcting codes
1-101
Figure 1.29 An error-correcting code
1-102
Hamming distances for given code
A B C D E F G H
000000 A 4 3 3 3 3 4 4
001111 B 3 3 3 3 4 4
010011 C 4 4 4 3 3
011100 D 4 4 3 3
100110 E 4 3 3
101001 F 3 3
110101 G 4
111010 H
1-104
Decoding the received pattern 010100
1-105
Error-correcting codes
1-106