Big Data and The Next Wave
Big Data and The Next Wave
4/25/98 page 1
Big Data
And The Next Wave of InfraStress
1. Big data: storage growing bigger faster
DRAM: 1.6X/year (4X/3 years) continues
Disk density:
1.3X/year CAGR: historical trendline
1.6X/year since ~1990
2.0X/year leap ~1998/1999
2. Net continues raising user expectations
More data (image, graphics, models)
(Some) more difficult data (audio, video)
Pressure on net, especially last mile
=> Explosion of WIDELY−accessible data
Create, understand, store, move ... or else ...
Drown in Wave of Infrastructure Stress
General references: John Hennessy, David A Patterson, Computer Architecture: A Quantitiative Approach,
Second Edition, Morgan Kaufmann, San Francisco, 1996. ISBN 1−55860−329−8.
Also, Computer Organization and Design, Morgan Kaufmann, San Francisco, 1994, ISBN 1−55860−281−X.
Also, thanks to Glenn Stettler of SGI, "Disk Drive Futures", 1/20/99.
4/25/98 page 2
InfraStress
= Infrastructure Stress
in.fra.stress. n.
1. Bad effects of faster change in computer
subsystems & usage:
CPUs, memory, disks, demand ...
than in underlying infrastructure:
bandwidths, addressability & naming,
scalability of interconnect,
operating systems, file systems, backup ...
Symptoms: bottlenecks, odd limits, workarounds,
instability, unpredictability, nonlinear surprise,
over−frequent releases, multiple versions,
hardware obsolete before depreciated
#1 IntraNet
Partners, customers
#4
#2
#3
Employees Public
4/25/98 page 4
http://www.botham.co.uk
Hidden flag
Predict this?
% 32−bit
systems % 64−bit
shipped systems
(vs 16−bit) shipped
(vs 32−bit)
????????????????????????????????????????????????
????????????????????????????????????????????????
????????????????????????????????????????????????
1980 1983 1986 1989 1992 1995 1998 2001 2004 2007
64
1st 64−bit micro (MIPS R4000)
4/25/98 page 6
2. Big Memory & Micros
Infra−
Stress 16−bit Change 32−bit micros Change 64−bit micros
micros minis−> OK 32 −> OK
OK micros, 64/32
16 −> 32
@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@
1980 1983 1986 1989 1992 1995 1998 2001 2004 2007
4/25/98 page 7
3. Big Net
Infra−
Stress Everybody knows
Networks
this one!
Organizations
Note: does not mean Procedures
effects stop, just that
most organizations
will have BIG NET:
Web−ized
operations The Net, WWW
by 2002.
1980 1983 1986 1989 1992 1995 1998 2001 2004 2007
4/25/98 page 8
4. Bigger (Disk) Data
Infra− 1.3X 1.6X 2X
Stress
Disk file systems
Backups
I/O systems
Many must
rewrite
critical
BIGGER DATA software
3.5" disk density
1980 1983 1986 1989 1992 1995 1998 2001 2004 2007
http://www.quantum.com/src/history, http://www.disktrend.com
http://www.ibm.com/storage/microdrive: 340MB Microdrive, 1999. 1.7"x1.4"x.19"
4/25/98 page 9
5. HUGE Data (Maybe)
Storage Hierarchy
Infra− Like bigger,
Stress 1) Tapes, near−line storage
but worse
2) Laser−enhanced magnetics
for removables, maybe fixed disks @@@@@@@@@
@@@@@@@@@
@@@@@@@@@
@@@@@@@@@
10X: TeraStor @@@@@@@@@
@@@@@@@@@
NFR: "Near−Field Recording" @@@@@@@@@
@@@@@@@@@
@@@@@@@@@
5.25", removable, 2400 RPM, 18ms @@@@@@@@@
2Q99: 10GB, 6 MB/sec, <$800 @@@@@@@@@
@@@@@@@@@
@@@@@@@@@
4Q99: 20GB, 11 MB/sec, <$1200 @@@@@@@@@
@@@@@@@@@
@@@@@@@@@
?? : 40GB, 2−sided @@@@@@@@@
@@@@@@@@@
3−5X: Quinta (Seagate), demo 11/98 @@@@@@@@@
@@@@@@@@@
@@@@@@@@@
OAW: Optically assisted Winchester @@@@@@@@@
@@@@@@@@@
@@@@@@@@@
@@@@@@@@@
@@@@@@@@@
@@@@@@@@@
@@@@@@@@@
@@@@@@@@@
@@@@@@@@@
1980 1983 1986 1989 1992 1995 1998 2001 2004 2007
4/25/98 page 1 1
Technology Change Rates
Example: Large Server*
Years Large Server # Revisions in 6 years
H/W chassis 4..6 0
Interconnects
I/O bus (PCI...) 4−6+ 0−(1)
CPU==mem 3−5 0−(1)
Backplane 3−5 0
Network varies 1−2
Subsystems
CPU MHz .75−1.5 4−8
4X DRAM 3 2−(3)
Disks 1 6
Graphics 1.5−2.5
Software
File system 8−10 0−1
OS release 1−2 2−6
App release 1−2 2−6
Data forever
*Desktops &
Media not long other access devices
0 3 6
cycle faster, maybe
Years
4/25/98 page 1 2
Technology Trends
Capacities − Great News
Bandwidths − InfraStress
Interactions − Surprises
4/25/98 page 1 3
1"x 3.5" Disk Capacity
Capacity 1.3X 1.6X 2X >4X / 3 years
90 GB
Traditional "Fear is not
80 GB
disk density an option ..."
70 GB growth 72 1.6X
60 GB These are 1" (LP)
50 GB
drives only.
10 KB DRAM
Bytes/chip
1 KB
1980 1983 1986 1989 1992 1995 1998 2001 2004 2007
See: John R. Mashey, Darryl Ramm, "Databases on RISC: still The Future",
4/25/98 page 1 7 UNIX Review, September 1996, 47−54.
3.5" Disk Review
Height (1" or 1.6") X (4" X 5.75")
Capacity (1MB = 1,000,000 B)
Seek Times (msecs)
Track−to−track (Read/Write) Controller
Average (Read/Write)
Typical < Average (OS & controllers)
Maximum (Read/Write)
Rotational latency (msecs)
Average Latency = .5 * rev = 30000/RPM
Bandwidths (MB/sec)
Internal Formatted Transfer
ZBR range
External Rate (Bus)
Density (Gbit/sq inch)
See:http://www.quantum.com/src/basic_resources
See "Disk Performance Background for Tables/Graphs", SGI internal, Radek Aster, Jeremey Higdon, Carl Rigg, June 27, 1997.
4/25/98 page 1 8
3.5" Disk Review
− Capacity/drive ~ # platters (varies)
− Capacity/platter ~ areal density
− Bandwidth ~ RPM * Linear density
− Seek time ... improves slowly
− Combine several drives onto one:
take care, may lose seeks/second
− IOPS vs MB/s applications
System (OS)
I/O Bus (~PCI)
Peripheral Connect (~SCSI)
Embedded Disk Controller
Disk Seek
Rotate
Read
Time −>
4/25/98 page 1 9
Common Disk Types
1. By capacity
A. Large (1.6" x 3.5", HH) ~8−10 platters
B. Medium (1" X 3.5", LP), ~4−5 platters
C. "Depopulated", 1 platter
D. Smaller platters ...
E. "Microdrive", 1 small platter
2. By target
− High−performance (B: high RPM)
− High−capacity (A)
− By IOPs (multiples of C & D)
− By cost [ATA, IDE versions of A, B, C]
− By physical size (mobile, consumer)Bad
Huge disks => long backup times
4/25/98 page 2 0
Good for archive−like applications
Storage Densities
"IBM and other vendors, universities, and the government are working on a holographic 10,000,000 Billion
Density/in2 storage system they say will achieve 100Gb per square inch and data transfer rates of
30Mb per second by November 1998. Future targets are 100Gb per square inch and Atoms/in2
10,000 Tb 100Mb per second data rates by January 1999, and 100Gb per square inch and 1Gb per
second transfer by April 1999.
1,000 Tb OptiTek, in Mountain View, Calif., is developing holography products, promising 5.25
disk capacities of 100GB with cartridges backward−compatible to current automated libraries.
The company will release evaluation models in the second half of 1999,
and plans to release "write−once" products for use in archiving applications by early 2000."
100 Tb
InfoWorld Electric, "When Data Explodes", http://www.idg.net 1 TB/in3
10 Tb Tape density
300 Gb/in2
Atomic Force
1 Tb microscope(?)
1980 1983 1986 1989 1992 1995 1998 2001 2004 2007
See: Merrit E. Jones, The MITRE Corp, "The Limits That Await Us", THIC Meeting April 23, 1997, Falls Church, Va.
See http://www.terastor.com on Near−field recording.
4/25/98 page 2 1
Disk Issues
Workloads Converge
"IOPS" − Transaction / seeks/second
Classic OLTP, small blocks
3. BACKUP ...
Must run many tapes, full−speed, parallel
Sometimes use HSM, RAID, mirror
New cartridge disks may be useful
4/25/98 page 2 3
Disk Rotational Latencies
High−performance − 1/2 Rotation
Clock
Faster rotation ~ 2−3 years
Average Latency = .5 * (60/RPM)
1 GHz 1 ns
100 MHz 10 ns
10 MHz 100 ns
1 MHz 1 mics
10 Hz
1980 1983 1986 1989 1992 1995 1998 2001 2004 2007
4/25/98 page 2 5
Disk Total Latencies
1/2 Rotation + Average Seek
Clock
Faster rotation ~ 2−3 years
Average Latency = .5 * (60/RPM)
1 GHz 1 ns
1/2 Rotation faster than average seek ...
100 MHz 10 ns But of course, short seeks are faster
10 MHz 100 ns Short random blocks dominated by seeks
Large blocks dominated by transfer time
1 MHz 1 mics
1980 1983 1986 1989 1992 1995 1998 2001 2004 2007
4/25/98 page 2 6
CPU Latency, Performance
Clock
Effective instruction latency = CPU perform
10 GHz .1 ns DRAM ... CPU cycle/peak issue 1.4X−1.6X
1 ns CPU cycle
1 GHz 1 ns
4 ns 1.4X CAGR
10ns
100 MHz 10 ns
40ns Raw DRAM
125ns 1.1X CAGR
10 MHz 100 ns 40ns
100ns 80ns
60ns CPU:DRAM:
1 MHz 1 mics
120ns 40X (cycle)
100 Khz 10 mics Upper edge = raw DRAM access time 100X (real)
Lower edge = lean memory system, 400X (instrs)
10 KHz 100 mics including overhead, for acual load
2000: 40ns nominal −> 150ns+ Soon:
1 KHz 1 msec 1000X (instrs)
100 Hz 10 msec
10 Hz
1980 1983 1986 1989 1992 1995 1998 2001 2004 2007
4/25/98 page 2 7
Latency & Performance
Clock
Effective instruction latency = CPU perform
10 GHz .1 ns DRAM ... CPU cycle/peak issue 1.4X−1.6X
1 ns CPU cycle
1 GHz 1 ns
4 ns 1.4X CAGR
10ns
100 MHz 10 ns
40ns Raw DRAM
125ns 1.1X CAGR
10 MHz 100 ns 40ns
100ns 80ns
60ns CPU:DRAM
1 MHz 1 mics
120ns 1000X (insts)
Lower edge = memory system CPU:Disk
100 Khz 10 mics
CPU:Disk:1986 >5M instrs now
10 KHz 100 mics 200K instrs >30M soon
1 KHz 1 msec Disk latency
1.1X CAGR
100 Hz 10 msec Humans
10 Hz 24 msec 23 20 18 15 13 12 11 9 7 5.5 1X/ ...
1980 1983 1986 1989 1992 1995 1998 2001 2004 2007
4/25/98 page 2 8
Latencies − Implications
1. CPU <−> DRAM <−> disk
Latency ratios already bad, getting worse.
"Money can buy bandwidth, but latency is forever."
4/25/98 page 2 9
Input/Output: A Sad History
"I/O certainly has been lagging in the last decade."
Seymour Cray
Public Lecture (1976)
4/25/98 page 3 0
I/O Single−Channel Bandwidth
1000
GB/sec I/O Busses falling
100
behind 4X/3 growth,
need faster I/O
10 4X/3
XIO (4Q96)
GigaRing [1.2 GB/s (2X .64)]
1
Indigo2, Indy
GIO64 [.2] PCI64−66 [.4]
Indigo, PCI64 [.2]
GIO32 [.1]
0.1
100 MBs PCI32 Sun
[.1] SBUS64
EISA [.1]
0.01 (.033 p)
10 MBs ISA
(.007 p)
0.001
1MBs 1980 1983 1986 1989 1992 1995 1998 2001 2004 2007
4/25/98 page 3 1
Bus−Based SMP
Bandwidth Wall
1000
GB/sec SMP Busses falling Laws of
behind 4X/3 growth, Data gap, physics ...
100
need change big,
growing
are laws ...
10 4X/3 Sun UE X000
DEC 8400 2Q96 (2.5)
2Q95 (1.6)
SGI Challenge −2.5 GB/s
1 1Q93 (1.22) 2X / 3 growth,
Sun SC2000 slowing
Intel SHV
2Q93, (.5)
2Q96, (.534p)
SGI Power Sequent Highly SMP Bus,
0.1 Series 4Q88 (.064) Scalable Bus Memory, Total I/O
100 MBs 1994, (.107, [.240 p])
Sequent Bus
4Q87 (.053)
0.01
10 MBs
0.001
1MBs 1980 1983 1986 1989 1992 1995 1998 2001 2004 2007
4/25/98 page 3 2
Bandwidths (ccNUMA, XBAR)
1000
GB/sec Why ccNUMA?
A: Central XBAR $$. 128p Origin, Onyx2:
100 128p
up to
80GB/s I/O
40 GB/s memory,
10 4X/3 20 GB/s Bisection
4/25/98 page 3 3
LAN, Interconnect Bandwidths
1000
GB/sec Networks improving Networks
100
faster than SMP Bus must
& I/O Busses improve to
Origin
ccNUMA stay ahead
10 4X/3 I/O
of disks
High−end
1 SMP bus Gigabyte
bandwidth System
Network (GSN)
HIPPI Ethernet
0.1 1000BT
800 ATM
100 MBs
ATM
OC12
1000BT
Ethernet
0.01
10 MBs
OC3 100BT coming
Ethernet faster
10BT
0.001
1MBs 1980 1983 1986 1989 1992 1995 1998 2001 2004 2007
4/25/98 page 3 4
Beyond the LAN
(Different Scale!) Gigabyte
1
1 GBs System
Network (GSN)
HIPPI Ethernet
0.1 1000BT
800 ATM
DS−4, 274 Mbs Mbs
OC12
ATM Ethernet
0.01 OC3 100BT
10 MBs T3, 43.2 Mbs, 5.4 MBs
Ethernet
10BT
0.001
1MBs *DSL, 2 Mbs − 7 Mbs
3Mbs Cable Modem (375 KBs)
T1, 1.544 Mbs
0.0001
100 KBs
0.000001
1 KBs 1980 1983 1986 1989 1992 1995 1998 2001 2004 2007
4/25/98 page 3 5
Disk Bandwidths (Highest)
1000
GB/sec 1"X 3.5" Disk
Bytes/disk
100
10
0.1
100 MBs 2001: Guess 40 MB/s
Striped Bandwidth/
4 disks 1999 − 18GB,
0.01 10000 RPM, 28 MB/s
3 disks 4
10 MBs 3 1998 − 9GB, 7200 RPM, 13 MB/s
2 disks 2 10000 RPM , 15 MB/s
Bandwidth/1 disk 1
0.001
1MBs 1980 1983 1986 1989 1992 1995 1998 2001 2004 2007
4/25/98 page 3 6
Fast Disk Bandwidth
vs Peripheral Connections
1000
GB/sec # 10MB/s FW SCSI F20W FC100 Disk bandwidth growth
Disks 20 MB/s 40 MB/s 100 MB/s overpowers peripheral
100 1 10 10 10 connection growth!
2 18* 20 20
3 * 30 30
10 4 * 32* 40
... ... ... ...
10 * * 95* Peripheral
* Already saturated on bandwidth tasks, Connections
1
like backup or striped−disk I/O. MB/s
200 FC200
xx 160 SCSI
0.1 x 100 FC100
x = 4 disks exhaust 80 SCSI LV
100 MBs x
bus in bandwidth apps 40 MB/s
40 SCSI F20W
x 20 FW SCSI
0.01
4 disks 28 MB/s
10 F SCSI
10 MBs 3 disks 4 10 MB/s
3
2 disks 2
Bandwidth/1 disk 1
0.001
1MBs 1980 1983 1986 1989 1992 1995 1998 2001 2004 2007
4/25/98 page 3 7
Fast Disk Bandwidth
vs Networks & Peripheral Connections
1000
GB/sec 10BaseT = .1 1997 fast disks (bottleneck
100BaseT = 1 1997 fast disk
100 1000BaseT = 2 2001 fast disks (2 X 40 MBs)
= 1 2001 dual−head fast disk (80 MBs)
GSN = many disks, still not enough for all!
10
4/25/98 page 3 8
Bandwidths − Summary
1000 Disks
GB/sec Networks and disks InfraStress on networks
100
pressure on I/O Bus Disks + networks
and SMP Bus Origin
InfraStress on I/O bus
High−end Network
1 SMP bus bandwidth
bandwidth
Disk
bandwidth
0.1
100 MBs
4/25/98 page 3 9
Bandwidths − Implications
1. SMP Busses not growing with 4X/3
Interconnect and memory bandwidth limits
==> Crossbars
Centralized (mainframe)
Distributed (ccNUMA)
2. Some I/O busses, peripheral connects,
and especially networks under pressure
to keep up with disk bandwidth
4/25/98 page 4 0
Interactions: Distributed Data
Shape of solution driven by shape of hardware?
"Natural" distribution of work: cost−effective
"Unnatural" data distribution: very painful
High bandwidth, low latency, or else...
Better: make hardware match shape of problem
Problem Shape Solution Shape?
Good Fit
(technology)
growth?? Centralize
(allocation)
Decentralize
(partitioning,
administration)
4/25/98 page 4 1
Interactions:
Bandwidths vs Latencies
1000 − CRAY T932
GB/sec
Faster
Practical
shared− High−bandwidth,
100
memory low−latency =>
Origin "never having to Cheaper
128
say you’re sorry"
10 UE10000
16..64
Bus
− Sun UltraSMP [2.5]
SMP Disk I/O
1 − −DEC 8400 [1.6]
1 Memory Sequent − HIPPI−6400, .8
Systems − SHV [.5]
NUMA−Q General Networks
DEC Mem Channel
ServerNet 1
.035−.060 [.1 total]
0.1 2X.04 = .08
2.9us 1 way best − HIPPI, 32−bit (.09, [.1])
3+ .3 per hop − ATM OC12 (90% eff) (.062)
100 MBs S.N.2 = 2.5X Typical time to read
entire 1" x 3.5" disk
Dedicated − ATM OC3 (90% eff)(.0155)
0.01 Switch/ − FDDI (95% eficiency) (.012)
IBM SP2 Switch
10 MBs Network, .036 GB/s, 39us 1−way
Clustering .048 GB/s full−duplex
[.1 GB/s] MPI − Ethernet (90% eff) (.001)
0.001
1MBs 1ns 10ns 100ns 1us 10us 100us 1ms 10ms 100ms 1sec 10 sec 100 sec 1000 10000 sec
Latency .001us .01us .1us 1us 10us 100us 1000us 10000 100000 1M 10M 100M 1B 10B us
4/25/98 page 4 2
Interactions:
Disk Technology Trends
Capacities
Grow very fast
Latencies
Barely improve for small blocks
Improve moderately for large blocks
Bandwidths
Improve, but not so fast as capacity
Capacity/bandwidth ratios get worse
Pressure −> more smaller disks
Interactions
100BaseT, PCI32, F+W SCSI overrun
Backup rethinking
Desktop & 2 half−empty disks?
Backup servers?
4/25/98 page 4 3
Technology Summary
Good Bad Ugly
CPU Mhz Parallelism Latency
Software Work!
4/25/98 page 4 4
Sysadmin Technology Exciting
Conclusion: InfraStress
Wishlist for Overcoming It
1. Find/understand: insight
Tools: Navigate, organize, visualize
2. Input: creativity
Tools: create content from ideas
5. Change: survive!
Incremental scalability, headroom
4/25/98 page 4 5
Infrastructure already upgraded
References
1.http://www.storage.ibm.com/hardsoft/diskdrdl/library/technolo.htm
IBM storage web page
4/25/98 page 4 6