0% found this document useful (0 votes)
37 views45 pages

CH1 Data Storage

Uploaded by

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views45 pages

CH1 Data Storage

Uploaded by

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

Introduction to Computer Science

Lecture 1: Data Storage

Instructor: Tian-Li Yu

Taiwan Evolutionary Intelligence Laboratory (TEIL)


Department of Electrical Engineering
National Taiwan University

[email protected]

Slides made by Tian-Li Yu, Jie-Wei Wu, and Chu-Yu Hsu

Instructor: Tian-Li Yu Data Storage 1/1


Binary World

Binary World
Bit: binary digit (0/1)
Simple, logical, and unambiguous
Boolean operations & gates
AND OR
Inputs Output Inputs Output

Inputs Output Inputs Output


0 0 0 0 0 0
0 1 0 0 1 1
1 0 0 1 0 1
1 1 1 1 1 1

Instructor: Tian-Li Yu Data Storage 2/1


Binary World

Logical Gates

XOR NOT
Inputs Output Input Output

Inputs Output
0 0 0 Inputs Output
0 1 1 0 1
1 0 1 1 0
1 1 0

Logical vs. real world


To be or not to be ! always True.

Instructor: Tian-Li Yu Data Storage 3/1


Flip-Flop

Flip-Flop
Purpose: to keep the state of output until the next excitement.
SR Flip-Flop
Has two input lines: set and reset.
One input its stored value to 1.
The other input sets its stored value to 0.
While both inputs are 0, the most recently stored value is preserved.

x y z
x 0 0 unchanged
Flip−Flop z 0 1 0
y 1 0 1
1 1 undefined

Instructor: Tian-Li Yu Data Storage 4/1


Flip-Flop

A Simple SR Flip-Flop Circuit

Instructor: Tian-Li Yu Data Storage 5/1


Flip-Flop

Another SR Flip-Flop Circuit

Instructor: Tian-Li Yu Data Storage 6/1


Main Memory

Hexadecimal Coding (Hex)

Bit pattern Hexadecimal representation


0000 0
0001 1
0010 2
0011 3
0100 4 Binary is usually too long for
0101 5
human to remember.
0110 6
0111 7 Binary to Hex is straightforward.
1000 8
0010111010110101
1001 9
1010 A ! 2EB5.
1011 B
1100 C
1101 D
1110 E
1111 F
Instructor: Tian-Li Yu Data Storage 7/1
Main Memory

Main Memory Cells

Cell: A unit of main memory (typically 8 bits which is one byte)

High-order end 0 1 0 0 1 1 1 1 Low-order end

Most significant bit Least significant bit

Instructor: Tian-Li Yu Data Storage 8/1


Main Memory

Main Memory and Address

One dimensional. 3
4
Random accessible.
5
Access the content by the address (practically, also in
6
binary).
7
Recall the pointer in C/C++.
8

Instructor: Tian-Li Yu Data Storage 9/1


Main Memory

Memory Techniques
Random Access Memory (RAM): Memory in which individual cells can be easily accessed
in any order.
Static Memory (SRAM): like flip-flop.
Dynamic Memory (DRAM): Tiny capacitors replenished regularly by refresh circuit.
Synchronous DRAM (SDRAM)
Double Data Rate (DDR)
Dual/Triple channel

Capacity
Kilobyte: 210 bytes = 1,024 bytes ' 103 bytes.
Megabyte: 220 bytes = 1,048,576 bytes ' 106 bytes.
Gigabyte: 230 bytes = 1,073,741,824 bytes ' 109 bytes.

Instructor: Tian-Li Yu Data Storage 10 / 1


Mass Storage

Mass Storage

Properties (compared with main memory)


Larger capacity
Less volatility
Slower
On-line or o↵-line

Types
Magnetic systems (hard disk, tape)
Optical systems (CD, DVD)
Flash drives

Instructor: Tian-Li Yu Data Storage 11 / 1


Mass Storage

Magnetic Disk Storage System


Track divided into sectors
Disk

Read/write head

Access arm

Cylinder Disk motion Arm motion


Arm motio

Head, track, sector, cylinder


Access time = seek time + rotation delay / latency time.
Transfer rate (SATA 1.5/3/6, etc.)

Instructor: Tian-Li Yu Data Storage 12 / 1


Mass Storage

Optical Storage

Data recorded on a single track,


consisting of individual sectors,
that spirals toward the outer edge
CD

Disk motion

Instructor: Tian-Li Yu Data Storage 13 / 1


Mass Storage

Physical vs. Logical Records


Logical records correspond to
natural divisions within the data

Physical records correspond to the


size of a sector

Files and file systems


Fragmentation problem
We talk about this later in OS.

Instructor: Tian-Li Yu Data Storage 14 / 1


Mass Storage

Bu↵er

Purpose: To synchronize (or to make compatible) di↵erent R/W mechanisms and rates.
A memory area used for the temporary storage of data (usually as a step in transferring
the data).
Blocks of data compatible with physical records can be transferred between bu↵ers and
the mass storage system.
Data in bu↵er can be referenced in terms of logical records.

Instructor: Tian-Li Yu Data Storage 15 / 1


Representing Text

Representing Text

ASCII (American standard code for information interchange by ANSI): 7 bits (or 8 bits
with a leading 0).
Unicode: 16 bits.
ISO standard (international organization of standardization): 32 bits.

Instructor: Tian-Li Yu Data Storage 16 / 1


Representing Text

ASCII Example

01001000 01100101 01101100 01101100 01101111 00101110


H e l l o .

Instructor: Tian-Li Yu Data Storage 17 / 1


Representing Numeric Values

Representing Numeric Values

Instructor: Tian-Li Yu Data Storage 18 / 1


Representing Numeric Values

From Binary to Decimal

Instructor: Tian-Li Yu Data Storage 19 / 1


Representing Numeric Values

From Decimal to Binary


Just as in decimal, keep dividing the number by 2 and record the remainders.
Be careful about the order.

Instructor: Tian-Li Yu Data Storage 20 / 1


Representing Images & Sounds

Representing Images

Bit map techniques


Pixel: picture element.
Colors: RGB, HSV, etc.
LCD, scanner, digitcal cameras, etc.

Vector techniques
Scalable
TrueType, Postscript, SVG (scalable vector graphics), etc.
CAD, printers.

Instructor: Tian-Li Yu Data Storage 21 / 1


Representing Images & Sounds

Representing Sounds
Sampling
Sampling rate
Bit resolution
Bit rate (sampling rate ⇥ bit resolution)
MIDI (synthesis)

Instructor: Tian-Li Yu Data Storage 22 / 1


Negative Numbers

Binary System Revisited

Addition
0 0 1 1
+ 0 + 1 + 0 + 1
0 1 1 10

Subtraction?
Let’s first define negative numbers.

Instructor: Tian-Li Yu Data Storage 23 / 1


Negative Numbers

Two’s Complement Notation


Bit Value
pattern represented
Range: 2n 1 ⇠ 2n 1 1 0111 7
0110 6
Bit Value 0101 5
0100 4
pattern represented
0011 3
011 3 0010 2
010 2 0001 1
0000 0
001 1 1111 -1
000 0 1110 -2
111 -1 1101 -3
110 -2 1100 -4
1011 -5
101 -3 1010 -6
100 -4 1001 -7
1000 -8

Instructor: Tian-Li Yu Data Storage 24 / 1


Negative Numbers

Two’s Complement Encoding


Textbook’s way

My way
For positive x,
x ! binary encoding of x.
x ! binary encoding of (2n x).

Instructor: Tian-Li Yu Data Storage 25 / 1


Negative Numbers

Subtraction in 2’s Complement


Do it as usual in binary.
3 0011
+ 2 ) + 0010 ) 5
? 0101

-3 1101
+ -2 ) + 1110 ) -5
? 1011

7 0111
+ -5 ) + 1011 ) 2
? 0010

Instructor: Tian-Li Yu Data Storage 26 / 1


Negative Numbers

Excess Notation

Bit Value
pattern represented Conversion
111 3 x ! (2n 1
+ x) mod 2n
110 2
Addition
101 1
100 0
011 -1 x +y !
010 -2 (2n 1 + (2n 1 + x) + (2n 1 + y )) mod 2n
001 -3 = (2n 1 + x + y ) mod 2n
000 -4

Instructor: Tian-Li Yu Data Storage 27 / 1


Negative Numbers

Overflow

010
Overflow occurs when the arithmetic result is out of + 011
the range of representation.
101
Addition of two positive numbers
2+3=5! 3 ( mod 8)
110
Addition of two negative numbers + 101
( 2) + ( 3) = 5!3 ( mod 8) 011

Instructor: Tian-Li Yu Data Storage 28 / 1


Real Numbers

Fraction in Binary (Fixed-Point)

Instructor: Tian-Li Yu Data Storage 29 / 1


Real Numbers

Float-Point Notation
Why? (How to represent 0.000000000000001?)

On most current 64-bit computers, the exponent takes 11 bits, and the mantissa takes 52
bits (IEEE 754 standard).

Instructor: Tian-Li Yu Data Storage 30 / 1


Real Numbers

Decoding Floating-Point

01101011
! (0)(110)(1011)
! (+)(+2)(1011)
1 1
.1011 ! 10.11 ! 2 + 2 + 4 = 2 34

10010011
! (1)(001)(0011)
! ( )( 3)(0011)
1 1 3
.0011 ! .0000011 ! 64 + 128 = 128

Instructor: Tian-Li Yu Data Storage 31 / 1


Real Numbers

Truncation Errors
Required precision is beyond the limitation of the mantissa.

The computer can only represent it as 2 12 .

Instructor: Tian-Li Yu Data Storage 32 / 1


Real Numbers

Normalized Form
The most significant bit of mantissa is 1.
0’s floating-point representation is all zero.

Normalization
01100011 ! (0)(110)(0011) ! .0011 ⇥ 22
! .1100 ⇥ 20 ! (0)(100)(1100) ! 01001100

IEEE standard
The left-most bit in mantissa is always 1 ! omit it.
An IEEE standard normalized form is (s)(eee)(mmmm)
! ( 1)s ⇥ 1.mmmm ⇥ 2(eee 4)
01100011 ! (0)(110)(0011) ! 1.0011 ⇥ 2(6 4)

Instructor: Tian-Li Yu Data Storage 33 / 1


Real Numbers

Loss of Digits
1 1
4+ 4 + 4
= 01111000 + 00111000 + 00111000
= 01111000 + 01110000 + 01110000
= 01111000 = 4 !!!
1 1
4+ 4 + 4
= 01111000 + (00111000 + 00111000)
= 01111000 + 01001000
= 01111000 + 01110001
1
= 01111001 = 4 !!!
2
Just like when you use a calculator to do 1099 + 0.123 1099 .

Instructor: Tian-Li Yu Data Storage 34 / 1


Compression

Data Compression

Lossy vs. lossless


Run-length encoding
Frequency-dependent encoding
Hu↵man encoding

Relative encoding / di↵erence encoding


Dictionary encoding
Adaptive dictionary encoding
LZW encoding

Instructor: Tian-Li Yu Data Storage 35 / 1


Compression

Hu↵man Encoding

AAABBBAABCAAAABD
Tradition encoding
A ! 00; B! 01; C! 10; D! 11.
000000010101000001100000000111 (32 bits).
Hu↵man encoding
Count occurrences: A(9); B(5); C(1); D(1).
Build a Hu↵man tree.

A ! 0; B ! 10; C ! 110; D ! 111.


0001010100010110000010111 (25 bits)
Instructor: Tian-Li Yu Data Storage 36 / 1
Compression

LZW Encoding
A dictionary encoding which does not need to store the dictionary.

xyx xyx xyx xyx


1 Symbol Code
12 x 1
121 y 2
space 3
1213 ! (knowing xyx forms a word).
xyx 4
12134
121343434 In reality, simply use ASCII code. So
no addition dictionary is needed.
Decoding is similar.

Instructor: Tian-Li Yu Data Storage 37 / 1


Compression

Images, Audios, and Videos

GIF: 256 colors, dictionary encoding


JPEG
Lossy or lossless.
Discrete cosine transform.
Discard high-frequency information that is insensitive to human eyes.
MP3
Temporal masking
Frequency masking
MPEG
Relative encoding & other techniques.

Instructor: Tian-Li Yu Data Storage 38 / 1


Error Handling

Communication Errors

Compression
Remove redundancy.
Error detection & correction
Add redundancy to prevent errors.
Error detection: Check code
Cannot correct errors, but can check if errors occur.
ID numbers
ISBN
Parity code
Error correcting
Can correct errors (to some degree).

Instructor: Tian-Li Yu Data Storage 39 / 1


Error Handling

Taiwan ID

Ca1 a2 a3 a4 a5 a6 a7 a8 a9
1 Convert the English letter C into a number xy :

2 d1 = x + 9y
P
3 d2 = 8i=1 (9 i) · ai = 8 · a1 + 7 · a2 + . . . + 1 · a8
4 Check code a9 = 10 ((d1 + d2 ) mod 10)

Instructor: Tian-Li Yu Data Storage 40 / 1


Error Handling

ISBN-10

The first 9 digits of ISBN-10 of the textbook is


0-273-75139

1 Compute S = 0 · 10 + 2 · 9 + 7 · 8 + 3 · 7 + 7 · 6 + 5 · 5 + 1 · 4 + 3 · 3 + 9 · 2 = 193
2 M = S mod 11 = 6
3 N = 11 M = 5
If N = 10, the check code is X.
If N = 11, the check code is 0.
Otherwise, the check code is the number N
4 So the whole ISBN is 0-273-75139-5.

Instructor: Tian-Li Yu Data Storage 41 / 1


Error Handling

Parity Bits

Add an additional bit to make the whole odd number of 1s.


Communication
RAID (redundant array of independent disks) techniques

Instructor: Tian-Li Yu Data Storage 42 / 1


Error Handling

An Error-Correcting Code (ECC)


(3,1)-repetition code (can correct 1-bit errors)

Triplet received Interpret as


000 0 (error free)
001 0
010 0
100 0
111 1 (error free)
110 1
101 1
011 1

Instructor: Tian-Li Yu Data Storage 43 / 1


Error Handling

Another Error-Correcting Code (ECC)


Received 010100.
Maximized Hamming distances among
symbols (at least 3). Symbol Distance
Symbol Code A 2
B 4
A 000000
C 3
B 001111
D 1
C 010011
E 3
D 011100
F 5
E 100110
G 2
F 101001
H 4
G 110101
H 111010
010100 ! D.

Instructor: Tian-Li Yu Data Storage 44 / 1


Error Handling

License

Page File Licensing Source/ author


Wikimedia.,Author:Anomie, Source: http://commons.wikimedia.
17 org/wiki/File:ASCII_Code_Chart.svg, Date:2012/02/05, This file
is ineligible for copyright and therefore in the public domain

Instructor: Tian-Li Yu Data Storage 45 / 1

You might also like