0% found this document useful (0 votes)

49 views46 pages

Big Data and The Next Wave

The document discusses how rapidly growing data and user expectations are creating "infrastructure stress" as computer subsystems and usage grow faster than the underlying infrastructure. It notes that storage is growing bigger faster through improvements in disk density and memory size. This is creating data problems around creating, understanding, storing, moving and accessing data that stress existing systems and require rewriting of software. Specific technologies discussed include the growth of CPUs, memory, networks and disk storage capacity over time.

Uploaded by

JAGA ADHI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views46 pages

Big Data and The Next Wave

Uploaded by

JAGA ADHI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 46

Big Data ...

and the Next Wave of InfraStress

John R. Mashey
Chief Scientist, SGI
Technology Waves: OK!
NOT technology for technology’s sake
IT’S WHAT YOU DO WITH IT
But if you don’t understand the trends
IT’S WHAT IT WILL DO TO YOU
Uh−oh!

4/25/98 page 1
Big Data
And The Next Wave of InfraStress
1. Big data: storage growing bigger faster
DRAM: 1.6X/year (4X/3 years) continues
Disk density:
1.3X/year CAGR: historical trendline
1.6X/year since ~1990
2.0X/year leap ~1998/1999
2. Net continues raising user expectations
More data (image, graphics, models)
(Some) more difficult data (audio, video)
Pressure on net, especially last mile
=> Explosion of WIDELY−accessible data
Create, understand, store, move ... or else ...
Drown in Wave of Infrastructure Stress
General references: John Hennessy, David A Patterson, Computer Architecture: A Quantitiative Approach,
Second Edition, Morgan Kaufmann, San Francisco, 1996. ISBN 1−55860−329−8.
Also, Computer Organization and Design, Morgan Kaufmann, San Francisco, 1994, ISBN 1−55860−281−X.
Also, thanks to Glenn Stettler of SGI, "Disk Drive Futures", 1/20/99.
4/25/98 page 2
InfraStress
= Infrastructure Stress
in.fra.stress. n.
1. Bad effects of faster change in computer
subsystems & usage:
CPUs, memory, disks, demand ...
than in underlying infrastructure:
bandwidths, addressability & naming,
scalability of interconnect,
operating systems, file systems, backup ...
Symptoms: bottlenecks, odd limits, workarounds,
instability, unpredictability, nonlinear surprise,
over−frequent releases, multiple versions,
hardware obsolete before depreciated

2. In organizations that grow quickly, stress on

management and support infrastructure.
4/25/98 page 3
Environment:
4*X Data Problems
LAN WAN
InterNet
InterNet #X

#1 IntraNet
Partners, customers
#4

#2
#3
Employees Public

#1 Have data, cannot find & understand it insight <− data

#2 Cannot create data from outside creativity −> data
#3 Cannot have/process data, system limits (data)
Server always needs (30%?) headroom power
#4 Have the data, but in wrong place/form data <−> data
Internal interconnect; network; firewalls unleash
#X Rapid change, surprise amplify all 4 DATA problems
Data distribution more troublesome than CPU distribution

4/25/98 page 4
http://www.botham.co.uk
Hidden flag

Family bakery in Yorkshire + Website

=> suddenly begin selling outside UK.

Predict this?

No ... just predict change & surprise.

But, some technology predictions easier...
4/25/98 page 5
1. CPUs
CMOS Microprocessors
Infra−
Stress 16−bit Change 32−bit Change 64−bit
micros minis−> micros 32 −> micros
OK micros, OK 64/32 OK
16 −> 32
100%

% 32−bit
systems % 64−bit
shipped systems
(vs 16−bit) shipped
(vs 32−bit)
????????????????????????????????????????????????
????????????????????????????????????????????????
????????????????????????????????????????????????
1980 1983 1986 1989 1992 1995 1998 2001 2004 2007
64
1st 64−bit micro (MIPS R4000)
4/25/98 page 6
2. Big Memory & Micros
Infra−
Stress 16−bit Change 32−bit micros Change 64−bit micros
micros minis−> OK 32 −> OK
OK micros, 64/32
16 −> 32

PCs: large servers:

640K <4GB limit painful
painful limit
1MB hack...
large servers
> 4GB useful

@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@
1980 1983 1986 1989 1992 1995 1998 2001 2004 2007

4/25/98 page 7
3. Big Net
Infra−
Stress Everybody knows
Networks
this one!
Organizations
Note: does not mean Procedures
effects stop, just that
most organizations
will have BIG NET:
Web−ized
operations The Net, WWW
by 2002.

1980 1983 1986 1989 1992 1995 1998 2001 2004 2007

4/25/98 page 8
4. Bigger (Disk) Data
Infra− 1.3X 1.6X 2X
Stress
Disk file systems
Backups
I/O systems
Many must
rewrite
critical
BIGGER DATA software
3.5" disk density

1980 1983 1986 1989 1992 1995 1998 2001 2004 2007

http://www.quantum.com/src/history, http://www.disktrend.com
http://www.ibm.com/storage/microdrive: 340MB Microdrive, 1999. 1.7"x1.4"x.19"
4/25/98 page 9
5. HUGE Data (Maybe)
Storage Hierarchy
Infra− Like bigger,
Stress 1) Tapes, near−line storage
but worse
2) Laser−enhanced magnetics
for removables, maybe fixed disks @@@@@@@@@
@@@@@@@@@
@@@@@@@@@
@@@@@@@@@
10X: TeraStor @@@@@@@@@
@@@@@@@@@
NFR: "Near−Field Recording" @@@@@@@@@
@@@@@@@@@
@@@@@@@@@
5.25", removable, 2400 RPM, 18ms @@@@@@@@@
2Q99: 10GB, 6 MB/sec, <$800 @@@@@@@@@
@@@@@@@@@
@@@@@@@@@
4Q99: 20GB, 11 MB/sec, <$1200 @@@@@@@@@
@@@@@@@@@
@@@@@@@@@
?? : 40GB, 2−sided @@@@@@@@@
@@@@@@@@@
3−5X: Quinta (Seagate), demo 11/98 @@@@@@@@@
@@@@@@@@@
@@@@@@@@@
OAW: Optically assisted Winchester @@@@@@@@@
@@@@@@@@@
@@@@@@@@@
@@@@@@@@@
@@@@@@@@@
@@@@@@@@@
@@@@@@@@@
@@@@@@@@@
@@@@@@@@@
1980 1983 1986 1989 1992 1995 1998 2001 2004 2007

~1999: Laser=enhanced magnetic disks (removable)

http://www.quinta.com, http://www.terastor.com
4/25/98 page 1 0
InfraStress Addup
Infra− Infra−
Stress Stress
5. HUGE DATA:
Storage hierarchy @@@@@@@@@
@@@@@@@@@
@@@@@@@@@
@@@@@@@@@
@@@@@@@@@
@@@@@@@@@
4. BIGGER DATA: @@@@@@@@@
@@@@@@@@@
@@@@@@@@@
1.3X −> 1.6X −> 2X @@@@@@@@@
@@@@@@@@@
@@@@@@@@@
3. BIG NET: @@@@@@@@@
@@@@@@@@@
The Net, WWW @@@@@@@@@
@@@@@@@@@
@@@@@@@@@
@@@@@@@@@
2. BIG MEMORY: @@@@@@@@@
@@@@@@@@@
@@@@@@@@@
DRAM vs 32−bit @@@@@@@@@
@@@@@@@@@
1. CPUS: Microprocessors @@@@@@@@@
@@@@@@@@@
@@@@@@@@@
@@@@@@@@@
32 −> 64 ????????????????????????
@@@@@@@@@
????????????????????????
@@@@@@@@@
????????????????????????
@@@@@@@@@
1980 1983 1986 1989 1992 1995 1998 2001 2004 2007

4/25/98 page 1 1
Technology Change Rates
Example: Large Server*
Years Large Server # Revisions in 6 years
H/W chassis 4..6 0
Interconnects
I/O bus (PCI...) 4−6+ 0−(1)
CPU==mem 3−5 0−(1)
Backplane 3−5 0
Network varies 1−2
Subsystems
CPU MHz .75−1.5 4−8
4X DRAM 3 2−(3)
Disks 1 6
Graphics 1.5−2.5
Software
File system 8−10 0−1
OS release 1−2 2−6
App release 1−2 2−6
Data forever
*Desktops &
Media not long other access devices
0 3 6
cycle faster, maybe
Years
4/25/98 page 1 2
Technology Trends
Capacities − Great News

Latencies − Not−so−great News

Bandwidths − InfraStress

Interactions − Surprises

Tradeoffs − keep changing

4/25/98 page 1 3
1"x 3.5" Disk Capacity
Capacity 1.3X 1.6X 2X >4X / 3 years
90 GB
Traditional "Fear is not
80 GB
disk density an option ..."
70 GB growth 72 1.6X
60 GB These are 1" (LP)
50 GB
drives only.

40 GB 1.6" (HH) drives 36

have higher
30 GB
capacity,
20 GB (36−50GB
16.8* 18
available 1Q99). 9
10 GB 1.3X
.5 1 4.5
0 GB
1980 1983 1986 1989 1992 1995 1998 2001 2004 2007
"Disks are binary devices ... new and full"
*IBM Desktap 16GP, Giant Magnetoresistive heads (GMR), 4Q97.
4/25/98 page 1 4
Log−scale charts ahead

Linear scale Logarithmic scale

Huge differences do
not look so big at top
−100 100
64
− 80
64
16
− 60
==> 10
− 40 4
16 − 20
4 1
1 1
0
0 3 6 9 0 3 6 9
Parallel = same ratio
Inflection points clear
4/25/98 page 1 5
DRAM Capacity: 1.6X CAGR
4X / 3 years
Capacity
1 TB Supers
Big T3E ~220GB
100 GB Multi−rack Origin2000 128GB
Origin2000 (1 Rack) 32GB
10 GB Power Challenge 16GB
4GB Challenge 2GB
1 GB
Power Series 256MB
"4Gb"??
100 MB
Total DRAM: MIPS M/500 "1Gb"
32MB "256Mb"
10 MB actually sold,
1−rack system "64Mb"
1 MB
"16Mb"
100 KB
1Q92: 1st 64−bit micro
10 KB 1 DRAM: 4Q94: technical use
Bytes/chip
1 KB
1980 1983 1986 1989 1992 1995 1998 2001 2004 2007
64 64 T64 64

See: John R. Mashey, "64−bit Computing", BYTE, September 1991, 135−141.

4/25/98 page 1 6
Disk Capacity:
1.3X −> 1.6X −> 2X
Capacity 1"X 3.5" Disk
1 Disk ~=
1 TB Bytes/disk 144?
300−500 DRAMs 72
100 GB 36
18
9
10 GB 4.5
1
1 GB .5
Historical
4Gb??
100 MB trend 1Gb
10 MB
1.3X 256Mb
64Mb
1 MB
16Mb
100 KB

10 KB DRAM
Bytes/chip
1 KB
1980 1983 1986 1989 1992 1995 1998 2001 2004 2007

See: John R. Mashey, Darryl Ramm, "Databases on RISC: still The Future",
4/25/98 page 1 7 UNIX Review, September 1996, 47−54.
3.5" Disk Review
Height (1" or 1.6") X (4" X 5.75")
Capacity (1MB = 1,000,000 B)
Seek Times (msecs)
Track−to−track (Read/Write) Controller
Average (Read/Write)
Typical < Average (OS & controllers)
Maximum (Read/Write)
Rotational latency (msecs)
Average Latency = .5 * rev = 30000/RPM
Bandwidths (MB/sec)
Internal Formatted Transfer
ZBR range
External Rate (Bus)
Density (Gbit/sq inch)
See:http://www.quantum.com/src/basic_resources
See "Disk Performance Background for Tables/Graphs", SGI internal, Radek Aster, Jeremey Higdon, Carl Rigg, June 27, 1997.
4/25/98 page 1 8
3.5" Disk Review
− Capacity/drive ~ # platters (varies)
− Capacity/platter ~ areal density
− Bandwidth ~ RPM * Linear density
− Seek time ... improves slowly
− Combine several drives onto one:
take care, may lose seeks/second
− IOPS vs MB/s applications

System (OS)
I/O Bus (~PCI)
Peripheral Connect (~SCSI)
Embedded Disk Controller
Disk Seek
Rotate
Read
Time −>

4/25/98 page 1 9
Common Disk Types
1. By capacity
A. Large (1.6" x 3.5", HH) ~8−10 platters
B. Medium (1" X 3.5", LP), ~4−5 platters
C. "Depopulated", 1 platter
D. Smaller platters ...
E. "Microdrive", 1 small platter
2. By target
− High−performance (B: high RPM)
− High−capacity (A)
− By IOPs (multiples of C & D)
− By cost [ATA, IDE versions of A, B, C]
− By physical size (mobile, consumer)Bad
Huge disks => long backup times
4/25/98 page 2 0
Good for archive−like applications
Storage Densities
"IBM and other vendors, universities, and the government are working on a holographic 10,000,000 Billion
Density/in2 storage system they say will achieve 100Gb per square inch and data transfer rates of
30Mb per second by November 1998. Future targets are 100Gb per square inch and Atoms/in2
10,000 Tb 100Mb per second data rates by January 1999, and 100Gb per square inch and 1Gb per
second transfer by April 1999.

1,000 Tb OptiTek, in Mountain View, Calif., is developing holography products, promising 5.25
disk capacities of 100GB with cartridges backward−compatible to current automated libraries.
The company will release evaluation models in the second half of 1999,
and plans to release "write−once" products for use in archiving applications by early 2000."
100 Tb
InfoWorld Electric, "When Data Explodes", http://www.idg.net 1 TB/in3
10 Tb Tape density
300 Gb/in2
Atomic Force
1 Tb microscope(?)

100 Gb 45 Gb/in2 AF demo 40−70 Gb/in2

10 Gb Near−field recording super−

GMR: 2.4−2.6 (1997) paramagnetic
2.0−2.8 Gb/in2 limit
1 Gb 10 (2001), 40 (2004) 1.0−1.5 Gb/in2
.660−.981 Gb/in2
100 Mb .129 Gb/in2: Tape: DDS−3

1980 1983 1986 1989 1992 1995 1998 2001 2004 2007

See: Merrit E. Jones, The MITRE Corp, "The Limits That Await Us", THIC Meeting April 23, 1997, Falls Church, Va.
See http://www.terastor.com on Near−field recording.
4/25/98 page 2 1
Disk Issues
Workloads Converge
"IOPS" − Transaction / seeks/second
Classic OLTP, small blocks

"MB/s" − Bandwidth (& backup!)

Classic technical, larger blocks

Some commercial now more like technical

Classic Classic Commercial

Technical

Gflops Big Data tpms other

Silicon Graphics
4/25/98 page 2 2
Disk Issues − Implications
1. Huge capacity leap breaks old filesystems
Hard limits (2GB, 8GB, etc) OR
Algorithmic performance, scaling issues

2. More memory, more bandwidth, everywhere

Small disk blocks even less efficient
=> 64−bit addressing more useful
=> Big pages, map more pages, MMUs
=> More memory => more bandwidth
=> More interconnect bandwidth

3. BACKUP ...
Must run many tapes, full−speed, parallel
Sometimes use HSM, RAID, mirror
New cartridge disks may be useful
4/25/98 page 2 3
Disk Rotational Latencies
High−performance − 1/2 Rotation
Clock
Faster rotation ~ 2−3 years
Average Latency = .5 * (60/RPM)
1 GHz 1 ns

100 MHz 10 ns

10 MHz 100 ns

1 MHz 1 mics

100 Khz 10 mics Platters shrink

10 KHz 100 mics

2.0 1.5
1 KHz 1 msec 4.17 3.0
8.3 msec 5.55
100 Hz 20000
10 msec 15000 RPM
5400 7200 10000
3600
10 Hz
1980 1983 1986 1989 1992 1995 1998 2001 2004 2007

Money can buy bandwidth, but latency is forever.

4/25/98 page 2 4
Disk Average Seek
High−performance disks
Clock
Faster rotation ~ 2−3 years
Average Latency = .5 * (60/RPM)
1 GHz 1 ns
1/2 Rotation faster than average seek ...
100 MHz 10 ns But of course, short seeks are faster
10 MHz 100 ns Short random blocks dominated by seek
Large blocks dominated by transfer time
1 MHz 1 mics

100 Khz 10 mics

10 KHz 100 mics 16 msec 15 14 12 9 8 6 5 4 Avg Seek

2.0 1.5
1 KHz 1 msec 4.17 3.0 1/2 Rotation
8.3 msec 5.55
Avg Seek
100 Hz 10 msec

10 Hz
1980 1983 1986 1989 1992 1995 1998 2001 2004 2007

4/25/98 page 2 5
Disk Total Latencies
1/2 Rotation + Average Seek
Clock
Faster rotation ~ 2−3 years
Average Latency = .5 * (60/RPM)
1 GHz 1 ns
1/2 Rotation faster than average seek ...
100 MHz 10 ns But of course, short seeks are faster
10 MHz 100 ns Short random blocks dominated by seeks
Large blocks dominated by transfer time
1 MHz 1 mics

100 Khz 10 mics

10 KHz 100 mics 16 msec 15 14 12 9 8 6 5 4 Avg Seek

2.0 1.5
1 KHz 1 msec 4.17 3.0 1/2 Rotation
8.3 msec 5.55
Avg Seek
100 Hz 10 msec Latency
10 Hz 24 msec 23 20 18 15 13 12 11 9 7 5.5 1.1X CAGR

1980 1983 1986 1989 1992 1995 1998 2001 2004 2007

4/25/98 page 2 6
CPU Latency, Performance
Clock
Effective instruction latency = CPU perform
10 GHz .1 ns DRAM ... CPU cycle/peak issue 1.4X−1.6X
1 ns CPU cycle
1 GHz 1 ns
4 ns 1.4X CAGR
10ns
100 MHz 10 ns
40ns Raw DRAM
125ns 1.1X CAGR
10 MHz 100 ns 40ns
100ns 80ns
60ns CPU:DRAM:
1 MHz 1 mics
120ns 40X (cycle)
100 Khz 10 mics Upper edge = raw DRAM access time 100X (real)
Lower edge = lean memory system, 400X (instrs)
10 KHz 100 mics including overhead, for acual load
2000: 40ns nominal −> 150ns+ Soon:
1 KHz 1 msec 1000X (instrs)
100 Hz 10 msec

10 Hz
1980 1983 1986 1989 1992 1995 1998 2001 2004 2007

4/25/98 page 2 7
Latency & Performance
Clock
Effective instruction latency = CPU perform
10 GHz .1 ns DRAM ... CPU cycle/peak issue 1.4X−1.6X
1 ns CPU cycle
1 GHz 1 ns
4 ns 1.4X CAGR
10ns
100 MHz 10 ns
40ns Raw DRAM
125ns 1.1X CAGR
10 MHz 100 ns 40ns
100ns 80ns
60ns CPU:DRAM
1 MHz 1 mics
120ns 1000X (insts)
Lower edge = memory system CPU:Disk
100 Khz 10 mics
CPU:Disk:1986 >5M instrs now
10 KHz 100 mics 200K instrs >30M soon
1 KHz 1 msec Disk latency
1.1X CAGR
100 Hz 10 msec Humans
10 Hz 24 msec 23 20 18 15 13 12 11 9 7 5.5 1X/ ...
1980 1983 1986 1989 1992 1995 1998 2001 2004 2007

4/25/98 page 2 8
Latencies − Implications
1. CPU <−> DRAM <−> disk
Latency ratios already bad, getting worse.
"Money can buy bandwidth, but latency is forever."

==> More latency tolerance in CPUs

==> Trade (bandwidth, memory, CPU,
PROGRAMMING) for latency
==> Already worth 1M instruction
to avoid a disk I/O

2. RDBMS huge buffer areas for indices,

small tables, to avoid latency
3. Networks: be alert for latency issues

4/25/98 page 2 9
Input/Output: A Sad History
"I/O certainly has been lagging in the last decade."
Seymour Cray
Public Lecture (1976)

"Also, I/O needs a lot of work."

David Kuck
Keynote Address, 15th Annual Symposium
on Computer Architecture (1988)

"Input/output has been the orphan of

computer architecture ... I/O’s revenge is at hand"
David A. Patterson, John. L. Hennessy
Computer Architecture: A Quantitative Approach,
2nd Ed (1996), Morgan Kaufmann.
.

4/25/98 page 3 0
I/O Single−Channel Bandwidth
1000
GB/sec I/O Busses falling
100
behind 4X/3 growth,
need faster I/O
10 4X/3
XIO (4Q96)
GigaRing [1.2 GB/s (2X .64)]
1
Indigo2, Indy
GIO64 [.2] PCI64−66 [.4]
Indigo, PCI64 [.2]
GIO32 [.1]
0.1
100 MBs PCI32 Sun
[.1] SBUS64
EISA [.1]
0.01 (.033 p)
10 MBs ISA
(.007 p)
0.001
1MBs 1980 1983 1986 1989 1992 1995 1998 2001 2004 2007

4/25/98 page 3 1
Bus−Based SMP
Bandwidth Wall
1000
GB/sec SMP Busses falling Laws of
behind 4X/3 growth, Data gap, physics ...
100
need change big,
growing
are laws ...
10 4X/3 Sun UE X000
DEC 8400 2Q96 (2.5)
2Q95 (1.6)
SGI Challenge −2.5 GB/s
1 1Q93 (1.22) 2X / 3 growth,
Sun SC2000 slowing
Intel SHV
2Q93, (.5)
2Q96, (.534p)
SGI Power Sequent Highly SMP Bus,
0.1 Series 4Q88 (.064) Scalable Bus Memory, Total I/O
100 MBs 1994, (.107, [.240 p])
Sequent Bus
4Q87 (.053)
0.01
10 MBs

0.001
1MBs 1980 1983 1986 1989 1992 1995 1998 2001 2004 2007

4/25/98 page 3 2
Bandwidths (ccNUMA, XBAR)
1000
GB/sec Why ccNUMA?
A: Central XBAR $$. 128p Origin, Onyx2:
100 128p
up to
80GB/s I/O
40 GB/s memory,
10 4X/3 20 GB/s Bisection

1 SMP Bus 1 XIO, 1.28 GB/s

Bandwidth
Origin200
0.1 1p PCI64,
100 MBs .2 GB/s
Start small
0.01 Buy incrementally
10 MBs I/O Bus Scale big
Bandwidth
0.001
1MBs 1980 1983 1986 1989 1992 1995 1998 2001 2004 2007

4/25/98 page 3 3
LAN, Interconnect Bandwidths
1000
GB/sec Networks improving Networks
100
faster than SMP Bus must
& I/O Busses improve to
Origin
ccNUMA stay ahead
10 4X/3 I/O
of disks
High−end
1 SMP bus Gigabyte
bandwidth System
Network (GSN)

HIPPI Ethernet
0.1 1000BT
800 ATM
100 MBs
ATM
OC12
1000BT
Ethernet
0.01
10 MBs
OC3 100BT coming
Ethernet faster
10BT
0.001
1MBs 1980 1983 1986 1989 1992 1995 1998 2001 2004 2007

4/25/98 page 3 4
Beyond the LAN
(Different Scale!) Gigabyte
1
1 GBs System
Network (GSN)

HIPPI Ethernet
0.1 1000BT
800 ATM
DS−4, 274 Mbs Mbs
OC12
ATM Ethernet
0.01 OC3 100BT
10 MBs T3, 43.2 Mbs, 5.4 MBs
Ethernet
10BT
0.001
1MBs *DSL, 2 Mbs − 7 Mbs
3Mbs Cable Modem (375 KBs)
T1, 1.544 Mbs
0.0001
100 KBs

ISDN (128Kb, 16 KBs)

0.00001
10 KBs All these are theoretical peaks, 56Kbs Modem (7 KBs)
reality = less 28.8Kbs Modem (3.6 KBs)

0.000001
1 KBs 1980 1983 1986 1989 1992 1995 1998 2001 2004 2007

4/25/98 page 3 5
Disk Bandwidths (Highest)
1000
GB/sec 1"X 3.5" Disk
Bytes/disk
100

0.1
100 MBs 2001: Guess 40 MB/s
Striped Bandwidth/
4 disks 1999 − 18GB,
0.01 10000 RPM, 28 MB/s
3 disks 4
10 MBs 3 1998 − 9GB, 7200 RPM, 13 MB/s
2 disks 2 10000 RPM , 15 MB/s
Bandwidth/1 disk 1
0.001
1MBs 1980 1983 1986 1989 1992 1995 1998 2001 2004 2007

4/25/98 page 3 6
Fast Disk Bandwidth
vs Peripheral Connections
1000
GB/sec # 10MB/s FW SCSI F20W FC100 Disk bandwidth growth
Disks 20 MB/s 40 MB/s 100 MB/s overpowers peripheral
100 1 10 10 10 connection growth!
2 18* 20 20
3 * 30 30
10 4 * 32* 40
... ... ... ...
10 * * 95* Peripheral
* Already saturated on bandwidth tasks, Connections
1
like backup or striped−disk I/O. MB/s
200 FC200
xx 160 SCSI
0.1 x 100 FC100
x = 4 disks exhaust 80 SCSI LV
100 MBs x
bus in bandwidth apps 40 MB/s
40 SCSI F20W
x 20 FW SCSI
0.01
4 disks 28 MB/s
10 F SCSI
10 MBs 3 disks 4 10 MB/s
3
2 disks 2
Bandwidth/1 disk 1
0.001
1MBs 1980 1983 1986 1989 1992 1995 1998 2001 2004 2007

4/25/98 page 3 7
Fast Disk Bandwidth
vs Networks & Peripheral Connections
1000
GB/sec 10BaseT = .1 1997 fast disks (bottleneck
100BaseT = 1 1997 fast disk
100 1000BaseT = 2 2001 fast disks (2 X 40 MBs)
= 1 2001 dual−head fast disk (80 MBs)
GSN = many disks, still not enough for all!
10

Theoretical ... reality much less Peripheral

1 Connections
GSN
MB/s

Ethernet 1000BaseT 100 FC100

0.1 x 80 SCSI LV
100 MBs 40 SCSI F20W
40 MB/s
15 MB/s
20 FW SCSI
0.01 Ethernet 100BaseT x 10 F SCSI
4 10 MB/s
10 MBs 3
2
Ethernet 10BaseT 1 Bandwidth/1 disk
0.001
1MBs 1980 1983 1986 1989 1992 1995 1998 2001 2004 2007

4/25/98 page 3 8
Bandwidths − Summary
1000 Disks
GB/sec Networks and disks InfraStress on networks

100
pressure on I/O Bus Disks + networks
and SMP Bus Origin
InfraStress on I/O bus

ccNUMA Disks + nets + memory

10 4X/3 I/O InfraStress on SMP bus

High−end Network
1 SMP bus bandwidth
bandwidth

Disk
bandwidth
0.1
100 MBs

0.01 1 I/O bus

4
10 MBs bandwidth
3
2
0.001 1
1MBs 1980 1983 1986 1989 1992 1995 1998 2001 2004 2007

4/25/98 page 3 9
Bandwidths − Implications
1. SMP Busses not growing with 4X/3
Interconnect and memory bandwidth limits
==> Crossbars
Centralized (mainframe)
Distributed (ccNUMA)
2. Some I/O busses, peripheral connects,
and especially networks under pressure
to keep up with disk bandwidth

3. Disks are faster than tapes ... backup?

4. SANs for bandwidth and latency

4/25/98 page 4 0
Interactions: Distributed Data
Shape of solution driven by shape of hardware?
"Natural" distribution of work: cost−effective
"Unnatural" data distribution: very painful
High bandwidth, low latency, or else...
Better: make hardware match shape of problem
Problem Shape Solution Shape?
Good Fit
(technology)
growth?? Centralize
(allocation)
Decentralize
(partitioning,
administration)
4/25/98 page 4 1
Interactions:
Bandwidths vs Latencies
1000 − CRAY T932
GB/sec
Faster
Practical
shared− High−bandwidth,
100
memory low−latency =>
Origin "never having to Cheaper
128
say you’re sorry"
10 UE10000
16..64
Bus
− Sun UltraSMP [2.5]
SMP Disk I/O
1 − −DEC 8400 [1.6]
1 Memory Sequent − HIPPI−6400, .8
Systems − SHV [.5]
NUMA−Q General Networks
DEC Mem Channel
ServerNet 1
.035−.060 [.1 total]
0.1 2X.04 = .08
2.9us 1 way best − HIPPI, 32−bit (.09, [.1])
3+ .3 per hop − ATM OC12 (90% eff) (.062)
100 MBs S.N.2 = 2.5X Typical time to read
entire 1" x 3.5" disk
Dedicated − ATM OC3 (90% eff)(.0155)
0.01 Switch/ − FDDI (95% eficiency) (.012)
IBM SP2 Switch
10 MBs Network, .036 GB/s, 39us 1−way
Clustering .048 GB/s full−duplex
[.1 GB/s] MPI − Ethernet (90% eff) (.001)
0.001
1MBs 1ns 10ns 100ns 1us 10us 100us 1ms 10ms 100ms 1sec 10 sec 100 sec 1000 10000 sec
Latency .001us .01us .1us 1us 10us 100us 1000us 10000 100000 1M 10M 100M 1B 10B us
4/25/98 page 4 2
Interactions:
Disk Technology Trends
Capacities
Grow very fast
Latencies
Barely improve for small blocks
Improve moderately for large blocks
Bandwidths
Improve, but not so fast as capacity
Capacity/bandwidth ratios get worse
Pressure −> more smaller disks
Interactions
100BaseT, PCI32, F+W SCSI overrun
Backup rethinking
Desktop & 2 half−empty disks?
Backup servers?

4/25/98 page 4 3
Technology Summary
Good Bad Ugly
CPU Mhz Parallelism Latency

SRAM On−chip Latency

RAM Capacity Latency

Disk Capacity Latency

Tape Capacity Bandwidth Latency

Network Bandwidth Latency

Software Work!

4/25/98 page 4 4
Sysadmin Technology Exciting
Conclusion: InfraStress
Wishlist for Overcoming It
1. Find/understand: insight
Tools: Navigate, organize, visualize
2. Input: creativity
Tools: create content from ideas

3. Store and process the data: power

Big addressing, modern file system
Big I/O (number and individual speed)
Big compute (HPC or commercial)
4. Move it: unleash
Scalable interconnect
High−performance networking

5. Change: survive!
Incremental scalability, headroom
4/25/98 page 4 5
Infrastructure already upgraded
References
1.http://www.storage.ibm.com/hardsoft/diskdrdl/library/technolo.htm
IBM storage web page

4/25/98 page 4 6

SY0-601 - Dumpsbase 45
No ratings yet
SY0-601 - Dumpsbase 45
143 pages
Wordpress Interview Question
100% (1)
Wordpress Interview Question
16 pages
Storage Architecture: CE202 December 2, 2003 David Pease
No ratings yet
Storage Architecture: CE202 December 2, 2003 David Pease
31 pages
17 Storage Notes
No ratings yet
17 Storage Notes
33 pages
Lecture 1 - IT Infrastructure
No ratings yet
Lecture 1 - IT Infrastructure
26 pages
Lecture 1: IT Infrastructure: After Completing This Topic, You Will Be Able To
No ratings yet
Lecture 1: IT Infrastructure: After Completing This Topic, You Will Be Able To
21 pages
Chapter 3 Disk Management
No ratings yet
Chapter 3 Disk Management
57 pages
William Stallings Computer Organization and Architecture: External Memory
No ratings yet
William Stallings Computer Organization and Architecture: External Memory
37 pages
Storage Systems-UNIT-Ist
No ratings yet
Storage Systems-UNIT-Ist
20 pages
SNIA-SA 100 Chapter1 Storage Technology Primer (Version 1.1)
No ratings yet
SNIA-SA 100 Chapter1 Storage Technology Primer (Version 1.1)
87 pages
Data Storage Concepts - Handout
No ratings yet
Data Storage Concepts - Handout
8 pages
Cs 5226 Week 8
No ratings yet
Cs 5226 Week 8
36 pages
001 Storage Basics and Application Environments
No ratings yet
001 Storage Basics and Application Environments
56 pages
Data Management and Storage in ICT Infrastructure
No ratings yet
Data Management and Storage in ICT Infrastructure
43 pages
Multimedia Storage and Retrieval
No ratings yet
Multimedia Storage and Retrieval
52 pages
Topic 1
No ratings yet
Topic 1
22 pages
Storage1
No ratings yet
Storage1
49 pages
CSC Unit - 4
No ratings yet
CSC Unit - 4
31 pages
Nimra Assigment
No ratings yet
Nimra Assigment
6 pages
Cloud Computing - Chapter 3
No ratings yet
Cloud Computing - Chapter 3
56 pages
He-Dieu-Hanh - Kai-Li - Disksflash - (Cuuduongthancong - Com)
No ratings yet
He-Dieu-Hanh - Kai-Li - Disksflash - (Cuuduongthancong - Com)
27 pages
Mass Storage Structure and MGT (Module 10)
No ratings yet
Mass Storage Structure and MGT (Module 10)
41 pages
El Chin - Gon
No ratings yet
El Chin - Gon
133 pages
001-Storage Basics and Application Environments V1.01
No ratings yet
001-Storage Basics and Application Environments V1.01
56 pages
Storage - Part 1
No ratings yet
Storage - Part 1
51 pages
Storage School I: An Introduction To The World of Storage
No ratings yet
Storage School I: An Introduction To The World of Storage
26 pages
SIT102 Lecture 8.1
No ratings yet
SIT102 Lecture 8.1
30 pages
Module-4 Data Storage
No ratings yet
Module-4 Data Storage
78 pages
01 - Storage Basics and Application Environments
No ratings yet
01 - Storage Basics and Application Environments
56 pages
Topic 1
No ratings yet
Topic 1
22 pages
Storage and File Structure
No ratings yet
Storage and File Structure
55 pages
Lecture13 Cda3101
No ratings yet
Lecture13 Cda3101
24 pages
04 Files and Buffers
No ratings yet
04 Files and Buffers
50 pages
Chapter 4 Device Management (Updated)
No ratings yet
Chapter 4 Device Management (Updated)
38 pages
Storage Techniques: Das Raid Nas San: By: Ben Ghorbel Med Aymen Ghrab Ilyes Hamdi Firas Jamoui Achref
No ratings yet
Storage Techniques: Das Raid Nas San: By: Ben Ghorbel Med Aymen Ghrab Ilyes Hamdi Firas Jamoui Achref
27 pages
Torage Echnologies AND Rchitecture: Presented By: Sandeep Tonk Tushar Srivastav Anil Kumar Abhinav Banerji Pulkit Mohan
No ratings yet
Torage Echnologies AND Rchitecture: Presented By: Sandeep Tonk Tushar Srivastav Anil Kumar Abhinav Banerji Pulkit Mohan
24 pages
Disks and Storage
No ratings yet
Disks and Storage
59 pages
Enterprise Solution
No ratings yet
Enterprise Solution
27 pages
Disk Management
No ratings yet
Disk Management
46 pages
HCIA-Cloud Computing-Chapter3
No ratings yet
HCIA-Cloud Computing-Chapter3
28 pages
Dell 2000 Storage
No ratings yet
Dell 2000 Storage
115 pages
Unit IV
No ratings yet
Unit IV
31 pages
Using and Configuring Storage Devices
No ratings yet
Using and Configuring Storage Devices
46 pages
William Stallings Computer Organization and Architecture 6 Edition External Memory
No ratings yet
William Stallings Computer Organization and Architecture 6 Edition External Memory
24 pages
Chapter 6 Report
No ratings yet
Chapter 6 Report
48 pages
Classic Data Center - Tagged
No ratings yet
Classic Data Center - Tagged
56 pages
San Basics
No ratings yet
San Basics
25 pages
Introduction To Computers and Information Technology. Chapter 3 Storage Basics
No ratings yet
Introduction To Computers and Information Technology. Chapter 3 Storage Basics
27 pages
Data Storage For The Enterprise
No ratings yet
Data Storage For The Enterprise
16 pages
Chapter 4: Spatial Storage and Indexing
No ratings yet
Chapter 4: Spatial Storage and Indexing
39 pages
Master Netapp Notes
No ratings yet
Master Netapp Notes
99 pages
Information Storage and Management
No ratings yet
Information Storage and Management
27 pages
Lecture 3 - Storage Systems
No ratings yet
Lecture 3 - Storage Systems
81 pages
Module 2: Storing Data: Disks and Files 2.1 Memory Hierarchy
No ratings yet
Module 2: Storing Data: Disks and Files 2.1 Memory Hierarchy
16 pages
03 Storage Technology Basics
No ratings yet
03 Storage Technology Basics
50 pages
Chap02 Storage
No ratings yet
Chap02 Storage
36 pages
Lecture 15
No ratings yet
Lecture 15
19 pages
Unit - Iii
No ratings yet
Unit - Iii
45 pages
Data Storage-Unit3 Presented By: Gourav Kumar Dubey
No ratings yet
Data Storage-Unit3 Presented By: Gourav Kumar Dubey
19 pages
IT For Management - 6
No ratings yet
IT For Management - 6
14 pages
Classic Data Centre
No ratings yet
Classic Data Centre
16 pages
Programming and Prototyping with Teensy Microcontrollers: Definitive Reference for Developers and Engineers
From Everand
Programming and Prototyping with Teensy Microcontrollers: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Basic RMAN Backups
No ratings yet
Basic RMAN Backups
3 pages
ACBS Chapter Manual
100% (2)
ACBS Chapter Manual
12 pages
Big Data Greenplum PDF
No ratings yet
Big Data Greenplum PDF
5 pages
JD Server Backups Administrator
No ratings yet
JD Server Backups Administrator
2 pages
Group-2-Xero-Final Manuscript-Research-Paper
No ratings yet
Group-2-Xero-Final Manuscript-Research-Paper
37 pages
San 3rd Ia Handwritten Solution
No ratings yet
San 3rd Ia Handwritten Solution
16 pages
03 - 01 - RN33153EN20GLA0 - Alarms Hadling PDF
No ratings yet
03 - 01 - RN33153EN20GLA0 - Alarms Hadling PDF
47 pages
Ora 200 PDF
No ratings yet
Ora 200 PDF
1 page
Mah Cet Computer
No ratings yet
Mah Cet Computer
16 pages
Bash Scripting For Automated Backups in DevOps
No ratings yet
Bash Scripting For Automated Backups in DevOps
4 pages
Exacc - Oracle Database Exadata Cloud at Customer: SESSION 7037 November 6, 2019
No ratings yet
Exacc - Oracle Database Exadata Cloud at Customer: SESSION 7037 November 6, 2019
47 pages
GW 55series NVR Manual
No ratings yet
GW 55series NVR Manual
80 pages
Veeam Backup Azure 5 0 User Guide PDF
No ratings yet
Veeam Backup Azure 5 0 User Guide PDF
368 pages
IBM Spectrum Protect Plus: Practical Guidance For Deployment, Configuration, and Usage
No ratings yet
IBM Spectrum Protect Plus: Practical Guidance For Deployment, Configuration, and Usage
508 pages
Introduction of Computing Assignment 2
No ratings yet
Introduction of Computing Assignment 2
6 pages
Datasheet (AS6508T)
No ratings yet
Datasheet (AS6508T)
7 pages
Percona XtraBackup - MySQL Database Backup Software
No ratings yet
Percona XtraBackup - MySQL Database Backup Software
6 pages
AZ 304.prepaway - Premium.1278.exam.67q
No ratings yet
AZ 304.prepaway - Premium.1278.exam.67q
70 pages
Middleburg VA - IT Managed Services RFP - 2024 - FINAL
No ratings yet
Middleburg VA - IT Managed Services RFP - 2024 - FINAL
10 pages
File Design Alternatives
100% (1)
File Design Alternatives
5 pages
Summa S Class Maintenance
No ratings yet
Summa S Class Maintenance
22 pages
Bank Management System
No ratings yet
Bank Management System
35 pages
Power Operation 2021 System Guide
No ratings yet
Power Operation 2021 System Guide
1,312 pages
Bce Unit 5
No ratings yet
Bce Unit 5
36 pages
HNDR-S4812 User's Manual
No ratings yet
HNDR-S4812 User's Manual
74 pages
Backups in Openshift Environment With Storage Plug in For Containers
No ratings yet
Backups in Openshift Environment With Storage Plug in For Containers
76 pages
DUMP
No ratings yet
DUMP
23 pages
Jamb Test Manual
No ratings yet
Jamb Test Manual
14 pages

Big Data and The Next Wave

Uploaded by

Big Data and The Next Wave

Uploaded by

Big Data ...

and the Next Wave of InfraStress

2. In organizations that grow quickly, stress on

#1 Have data, cannot find & understand it insight <− data

Family bakery in Yorkshire + Website

No ... just predict change & surprise.

PCs: large servers:

~1999: Laser=enhanced magnetic disks (removable)

Latencies − Not−so−great News

Tradeoffs − keep changing

40 GB 1.6" (HH) drives 36

Linear scale Logarithmic scale

See: John R. Mashey, "64−bit Computing", BYTE, September 1991, 135−141.

100 Gb 45 Gb/in2 AF demo 40−70 Gb/in2

10 Gb Near−field recording super−

"MB/s" − Bandwidth (& backup!)

Some commercial now more like technical

Classic Classic Commercial

Gflops Big Data tpms other

2. More memory, more bandwidth, everywhere

100 Khz 10 mics Platters shrink

10 KHz 100 mics

Money can buy bandwidth, but latency is forever.

100 Khz 10 mics

10 KHz 100 mics 16 msec 15 14 12 9 8 6 5 4 Avg Seek

100 Khz 10 mics

10 KHz 100 mics 16 msec 15 14 12 9 8 6 5 4 Avg Seek

==> More latency tolerance in CPUs

2. RDBMS huge buffer areas for indices,

"Also, I/O needs a lot of work."

"Input/output has been the orphan of

1 SMP Bus 1 XIO, 1.28 GB/s

ISDN (128Kb, 16 KBs)

Theoretical ... reality much less Peripheral

Ethernet 1000BaseT 100 FC100

ccNUMA Disks + nets + memory

0.01 1 I/O bus

3. Disks are faster than tapes ... backup?

4. SANs for bandwidth and latency

SRAM On−chip Latency

RAM Capacity Latency

Tape Capacity Bandwidth Latency

Network Bandwidth Latency

3. Store and process the data: power

You might also like