CH1 Data Storage
CH1 Data Storage
Instructor: Tian-Li Yu
Binary World
Bit: binary digit (0/1)
Simple, logical, and unambiguous
Boolean operations & gates
AND OR
Inputs Output Inputs Output
Logical Gates
XOR NOT
Inputs Output Input Output
Inputs Output
0 0 0 Inputs Output
0 1 1 0 1
1 0 1 1 0
1 1 0
Flip-Flop
Purpose: to keep the state of output until the next excitement.
SR Flip-Flop
Has two input lines: set and reset.
One input its stored value to 1.
The other input sets its stored value to 0.
While both inputs are 0, the most recently stored value is preserved.
x y z
x 0 0 unchanged
Flip−Flop z 0 1 0
y 1 0 1
1 1 undefined
One dimensional. 3
4
Random accessible.
5
Access the content by the address (practically, also in
6
binary).
7
Recall the pointer in C/C++.
8
Memory Techniques
Random Access Memory (RAM): Memory in which individual cells can be easily accessed
in any order.
Static Memory (SRAM): like flip-flop.
Dynamic Memory (DRAM): Tiny capacitors replenished regularly by refresh circuit.
Synchronous DRAM (SDRAM)
Double Data Rate (DDR)
Dual/Triple channel
Capacity
Kilobyte: 210 bytes = 1,024 bytes ' 103 bytes.
Megabyte: 220 bytes = 1,048,576 bytes ' 106 bytes.
Gigabyte: 230 bytes = 1,073,741,824 bytes ' 109 bytes.
Mass Storage
Types
Magnetic systems (hard disk, tape)
Optical systems (CD, DVD)
Flash drives
Read/write head
Access arm
Optical Storage
Disk motion
Bu↵er
Purpose: To synchronize (or to make compatible) di↵erent R/W mechanisms and rates.
A memory area used for the temporary storage of data (usually as a step in transferring
the data).
Blocks of data compatible with physical records can be transferred between bu↵ers and
the mass storage system.
Data in bu↵er can be referenced in terms of logical records.
Representing Text
ASCII (American standard code for information interchange by ANSI): 7 bits (or 8 bits
with a leading 0).
Unicode: 16 bits.
ISO standard (international organization of standardization): 32 bits.
ASCII Example
Representing Images
Vector techniques
Scalable
TrueType, Postscript, SVG (scalable vector graphics), etc.
CAD, printers.
Representing Sounds
Sampling
Sampling rate
Bit resolution
Bit rate (sampling rate ⇥ bit resolution)
MIDI (synthesis)
Addition
0 0 1 1
+ 0 + 1 + 0 + 1
0 1 1 10
Subtraction?
Let’s first define negative numbers.
My way
For positive x,
x ! binary encoding of x.
x ! binary encoding of (2n x).
-3 1101
+ -2 ) + 1110 ) -5
? 1011
7 0111
+ -5 ) + 1011 ) 2
? 0010
Excess Notation
Bit Value
pattern represented Conversion
111 3 x ! (2n 1
+ x) mod 2n
110 2
Addition
101 1
100 0
011 -1 x +y !
010 -2 (2n 1 + (2n 1 + x) + (2n 1 + y )) mod 2n
001 -3 = (2n 1 + x + y ) mod 2n
000 -4
Overflow
010
Overflow occurs when the arithmetic result is out of + 011
the range of representation.
101
Addition of two positive numbers
2+3=5! 3 ( mod 8)
110
Addition of two negative numbers + 101
( 2) + ( 3) = 5!3 ( mod 8) 011
Float-Point Notation
Why? (How to represent 0.000000000000001?)
On most current 64-bit computers, the exponent takes 11 bits, and the mantissa takes 52
bits (IEEE 754 standard).
Decoding Floating-Point
01101011
! (0)(110)(1011)
! (+)(+2)(1011)
1 1
.1011 ! 10.11 ! 2 + 2 + 4 = 2 34
10010011
! (1)(001)(0011)
! ( )( 3)(0011)
1 1 3
.0011 ! .0000011 ! 64 + 128 = 128
Truncation Errors
Required precision is beyond the limitation of the mantissa.
Normalized Form
The most significant bit of mantissa is 1.
0’s floating-point representation is all zero.
Normalization
01100011 ! (0)(110)(0011) ! .0011 ⇥ 22
! .1100 ⇥ 20 ! (0)(100)(1100) ! 01001100
IEEE standard
The left-most bit in mantissa is always 1 ! omit it.
An IEEE standard normalized form is (s)(eee)(mmmm)
! ( 1)s ⇥ 1.mmmm ⇥ 2(eee 4)
01100011 ! (0)(110)(0011) ! 1.0011 ⇥ 2(6 4)
Loss of Digits
1 1
4+ 4 + 4
= 01111000 + 00111000 + 00111000
= 01111000 + 01110000 + 01110000
= 01111000 = 4 !!!
1 1
4+ 4 + 4
= 01111000 + (00111000 + 00111000)
= 01111000 + 01001000
= 01111000 + 01110001
1
= 01111001 = 4 !!!
2
Just like when you use a calculator to do 1099 + 0.123 1099 .
Data Compression
Hu↵man Encoding
AAABBBAABCAAAABD
Tradition encoding
A ! 00; B! 01; C! 10; D! 11.
000000010101000001100000000111 (32 bits).
Hu↵man encoding
Count occurrences: A(9); B(5); C(1); D(1).
Build a Hu↵man tree.
LZW Encoding
A dictionary encoding which does not need to store the dictionary.
Communication Errors
Compression
Remove redundancy.
Error detection & correction
Add redundancy to prevent errors.
Error detection: Check code
Cannot correct errors, but can check if errors occur.
ID numbers
ISBN
Parity code
Error correcting
Can correct errors (to some degree).
Taiwan ID
Ca1 a2 a3 a4 a5 a6 a7 a8 a9
1 Convert the English letter C into a number xy :
2 d1 = x + 9y
P
3 d2 = 8i=1 (9 i) · ai = 8 · a1 + 7 · a2 + . . . + 1 · a8
4 Check code a9 = 10 ((d1 + d2 ) mod 10)
ISBN-10
1 Compute S = 0 · 10 + 2 · 9 + 7 · 8 + 3 · 7 + 7 · 6 + 5 · 5 + 1 · 4 + 3 · 3 + 9 · 2 = 193
2 M = S mod 11 = 6
3 N = 11 M = 5
If N = 10, the check code is X.
If N = 11, the check code is 0.
Otherwise, the check code is the number N
4 So the whole ISBN is 0-273-75139-5.
Parity Bits
License