0% found this document useful (0 votes)
91 views77 pages

An Introduction: Prof. Thomas Sterling Department of Computer Science Louisiana State University January 18, 2011

This document appears to be the introduction slides for a lecture on high performance computing. It discusses topics like HPC applications, supercomputing history, computer architecture, and performance. It also shows images related to hurricanes and supercomputers. The introduction aims to provide an overview of HPC, including brief histories of computing power and the evolution of technologies that have enabled faster supercomputers over time.

Uploaded by

Priya Srivastava
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
91 views77 pages

An Introduction: Prof. Thomas Sterling Department of Computer Science Louisiana State University January 18, 2011

This document appears to be the introduction slides for a lecture on high performance computing. It discusses topics like HPC applications, supercomputing history, computer architecture, and performance. It also shows images related to hurricanes and supercomputers. The introduction aims to provide an overview of HPC, including brief histories of computing power and the evolution of technologies that have enabled faster supercomputers over time.

Uploaded by

Priya Srivastava
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 77

Prof.

Thomas Sterling
Department of Computer Science
Louisiana State University
January 18, 2011

HIGH PERFORMANCE COMPUTING:


MODELS, METHODS, & MEANS
AN INTRODUCTION

CSC 7600 Lecture 1 : Introduction


Spring 2011
Aerial & Satellite of Hurricane Katrina

CSC 7600 Lecture 1 : Introduction 2


Spring 2011
Devastation from Hurricane Katrina

CSC 7600 Lecture 1 : Introduction 3


Spring 2011
Simulating Katrina

CSC 7600 Lecture 1 : Introduction


Spring 2011
Evolution of HPC

1959
IBM 7094 1976 1991 1996 2003 2009
1949 Cray 1 Intel Delta T3E Cray X1 Cray XT5
1 Edsac 103 106 109 1012 1015

One OPS KiloOPS MegaOPS GigaOPS TeraOPS PetaOPS

1823 1951 1964 1982 1988 1997 2001 2006


Babbage Difference 1943
Harvard Univac 1 CDC 6600 Cray XMP Cray YMP ASCI Red Earth BlueGene/L
Engine Simulator
Mark 1

CSC 7600 Lecture 1 : Introduction


Spring 2011 5
New Fastest Computer in the World

DEPARTMENT OF COMPUTER SCIENCE @ CSC 7600 Lecture 1 : Introduction 6


LOUISIANA STATE UNIVERSITY
Spring 2011
2nd Fastest Computer in the World
Jaguar (Cray XT5-HE)
• Owned by Oak Ridge National Laboratory
• Breaks Petaflops processing barrier(1.759e+15 flops)
• Contains 224,162 AMD x86_64 Opteron Six Core 2600
MHz chips

CSC 7600 Lecture 1 : Introduction 7


Spring 2011
Topics
• HPC Applications
• Supercomputing : An Enabler
• Architecture, Technologies, Programming Models
• Performance oriented theme
• Brief History of HPC
• Sources of Performance Degradation
• Supercomputer System Stack
• Course Overview - Goals & Content
• Course Administration
• Summary Materials for Test

CSC 7600 Lecture 1 : Introduction 8


Spring 2011
Synergy Drives Supercomputing Evolution
• Technology
– Enables digital technology
– Defines balance of capabilities
– Establishes relationship of relative costs
• Architecture
– Creates interface between computation and technology
– Determines structures of technology-based components
– Establishes low-level semantics of operation
– Provides low-cost mechanisms
• Model of Computation
– Paradigm by which computation is manifest
– Provides governing principles of architecture operation
– Implies programming model and languages

CSC 7600 Lecture 1 : Introduction 9


Spring 2011
Where Does Performance Come From?
• Device Technology
– Logic switching speed and device density
– Memory capacity and access time
– Communications bandwidth and latency
• Computer Architecture
– Instruction issue rate
• Execution pipelining
• Reservation stations
• Branch prediction
• Cache management
– Parallelism
• Parallelism – number of operations per cycle
per processor
– Instruction level parallelism (ILP)
– Vector processing
• Parallelism – number of processors per node
• Parallelism – number of nodes in a system

CSC 7600 Lecture 1 : Introduction


10
Spring 2011
Major Technology Generations
(dates approximate)
• Electromechanical
– 19th century through 1st half of 20th century
• Digital electronic with vacuum tubes
– 1940s
• Core memory
– 1950
• Transistors
– 1947
• SSI & MSI RTL/DTL/TTL semiconductor
– 1970
• DRAM
– 1970s
• CMOS VLSI
– 1990
• Multicore
– 2006

CSC 7600 Lecture 1 : Introduction


11
Spring 2011
The SIA ITRS Roadmap

100,000
MB per DRAM Chip
10,000 Logic Transistors per Chip (M)
uP Clock (MHz)
1,000

100

10

1
1997

2003
1999

2001

2006

2009

2012
Year of Technology Availability
CSC 7600 Lecture 1 : Introduction 12
Spring 2011
Classical DRAM
• Memory mats: ~ 1 Mbit each
• Row Decoders
• Primary Sense Amps
• Secondary sense amps & “page” multiplexing
• Timing, BIST, Interface
• Kerf
1000 1.00
0.90
100
0.80
10

% Chip Overhead
0.70
Gbits per chip

1
0.60
0.1 0.50
0.01 0.40

0.001 0.30
0.20
0.0001
0.10
0.00001
0.00
0.000001
1970 1980 1990 2000 2010 2020
1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 2020
Historical SIA Production SIA Introduction
Historical ITRS @ Production ITRS @ Introduction

Density/Chip has dropped below 4X/3yrs And 45% of Die is Non-Memory

CSC 7600 Lecture 1 : Introduction 13


Spring 2011
Peak Logic Clock Rates

100,000 100000

w
La
’s
10,000 10000

re
aw

oo

Clock (MHz)
’s L
Clock (MHz)

M
re 3 GHz 3 GHz

al
oo

s ic
1,000 1000

s
M

Cla
l
ca
ssi
Cla

100 100

10 10
1975 1980 1985 1990 1995 2000 2005 2010 2015 2020 10000 1000 100 10
Historical ITRS Max Clock Rate (12 invertors)
Feature Size
Historical ITRS Max

2005 projection was for 5.2 GHz – and we didn’t make it in production.
Further, we’re still stuck at 3+GHz in production.
CSC 7600 Lecture 1 : Introduction 14
Spring 2011
Classes of Architecture for
High Performance Computers
• Parallel Vector Processors (PVP)
– NEC Earth Simulator, SX-6
– Cray- 1, 2, XMP, YMP, C90, T90, X1
– Fujitsu 5000 series
• Massively Parallel Processors (MPP)
– Intel Touchstone Delta & Paragon
– TMC CM-5
– IBM SP-2 & 3, Blue Gene/Light
– Cray T3D, T3E, Red Storm/Strider
• Distributed Shared Memory (DSM)
– SGI Origin
– HP Superdome
• Single Instruction stream Multiple Data stream
(SIMD)
– Goodyear MPP, MasPar 1 & 2, TMC CM-2
• Commodity Clusters
– Beowulf-class PC/Linux clusters
– Constellations
– HP Compaq SC, Linux NetworX MCR

CSC 7600 Lecture 1 : Introduction


Spring 2011 15
Top 500 : System Architecture

CSC 7600 Lecture 1 : Introduction 16


Spring 2011
Driving Issues/Trends
• Multicore
– Now: 8, AMD Opterons, Intel Xeon
– possibly 100’s
– will be million-way parallelism
• Heterogeneity
– GPGPU
– Clearspeed
– Cell SPE
• Component I/O Pins
– Off chip bandwidth not increasing with demand
• Limited number of pins
• Limited bandwidth per pin (pair)
– Cache size per core may decline
– Shared cache fragmentation
• System Interconnect
– Node bandwidth not increasing proportionally
to core demand
• Power
– Mwatts at the high end = millions of $s per year

CSC 7600 Lecture 1 : Introduction


17
Spring 2011
Multi-Core
• Motivation for Multi-Core
– Exploits improved feature-size and density
– Increases functional units per chip (spatial efficiency)
– Limits energy consumption per operation
– Constrains growth in processor complexity
• Challenges resulting from multi-core
– Relies on effective exploitation of multiple-thread
parallelism
• Need for parallel computing model and parallel
programming model
– Aggravates memory wall
• Memory bandwidth
– Way to get data out of memory banks
– Way to get data into multi-core processor array
• Memory latency
• Fragments L3 cache
– Pins become strangle point
• Rate of pin growth projected to slow and flatten
• Rate of bandwidth per pin (pair) projected to grow slowly
– Requires mechanisms for efficient inter-processor
coordination
• Synchronization
• Mutual exclusion
• Context switching

CSC 7600 Lecture 1 : Introduction


Spring 2011 18
Heterogeneous Multicore Architecture

• Combines different types of processors


– Each optimized for a different operational modality
• Performance > nX better than other n processor types
– Synthesis favors superior performance
• For complex computation exhibiting distinct modalities
• Conventional co-processors
– Graphical processing units (GPU)
– Network controllers (NIC)
– Efforts underway to apply existing special purpose
components to general applications
• Purpose-designed accelerators
– Integrated to significantly speedup some critical aspect
of one or more important classes of computation
– IBM Cell architecture
– ClearSpeed SIMD attached array processor

CSC 7600 Lecture 1 : Introduction


Spring 2011 19
Topics
• HPC Applications
• Supercomputing : An Enabler
• Architecture, Technologies, Programming Models
• Performance oriented theme
• Brief History of HPC
• Sources of Performance Degradation
• Supercomputer System Stack
• Course Overview - Goals & Content
• Course Administration
• Summary Materials for Test

CSC 7600 Lecture 1 : Introduction 20


Spring 2011
Definitions: “supercomputer”

Supercomputer: A computing system exhibiting high-end performance


capabilities and resource capacities within practical constraints of technology,
cost, power, and reliability. Thomas Sterling, 2007

Supercomputer: a large very fast mainframe used especially for


scientific computations. Merriam-Webster Online

Supercomputer: any of a class of extremely powerful computers. The term is


commonly applied to the fastest high-performance systems available at any given time.
Such computers are used primarily for scientific and engineering work requiring
exceedingly high-speed computations. Encyclopedia Britannica Online

CSC 7600 Lecture 1 : Introduction 21


Spring 2011
Moore’s Law
Moore's Law describes a long-
term trend in the history of
computing hardware, in which
the number of transistors that
can be placed inexpensively on
an integrated circuit has doubled
approximately every two years.

CSC 7600 Lecture 1 : Introduction 22


Spring 2011
Top 500 List

CSC 7600 Lecture 1 : Introduction 23


Spring 2011
Performance
• Performance:
– A quantifiable measure of rate of doing (computational) work
– Multiple such measures of performance
• Delineated at the level of the basic operation
– ops – operations per second
– ips – instructions per second
– flops – floating operations per second
• Rate at which a benchmark program takes to execute
– A carefully crafted and controlled code used to compare systems
– Linpack Rmax (Linpack flops)
– gups (billion updates per second)
– others
• Two perspectives on performance
– Peak performance
• Maximum theoretical performance possible for a system
– Sustained performance
• Observed performance for a particular workload and run
• Varies across workloads and possibly between runs

CSC 7600 Lecture 1 : Introduction 24


Spring 2011
Scalability
• The ability to deliver proportionally greater sustained performance through
increased system resources
• Strong Scaling
– Fixed size application problem
– Application size remains constant with increase in system size
• Weak Scaling
– Variable size application problem
– Application size scales proportionally with system size
• Capability computing
– in most pure form: strong scaling
– Marketing claims tend toward this class
• Capacity computing
– Throughput computing
• Includes job-stream workloads
– In most simple form: weak scaling
• Cooperative computing
– Interacting and coordinating concurrent processes
– Not a widely used term
– Also: coordinated computing

CSC 7600 Lecture 1 : Introduction 25


Spring 2011
Machine Parameters affecting Performance
• Peak floating point performance
• Main memory capacity
• Bi-section bandwidth
• I/O bandwidth
• Secondary storage capacity
• Organization
– Class of system
– # nodes
– # processors per node
– Accelerators
– Network topology
• Control strategy
– MIMD
– Vector, PVP
– SIMD
– SPMD

CSC 7600 Lecture 1 : Introduction 26


Spring 2011
Topics
• HPC Applications
• Supercomputing : An Enabler
• Architecture, Technologies, Programming Models
• Performance oriented theme
• Brief History of HPC
• Sources of Performance Degradation
• Supercomputer System Stack
• Course Overview - Goals & Content
• Course Administration
• Summary Materials for Test

CSC 7600 Lecture 1 : Introduction 27


Spring 2011
A Brief History of Supercomputing
• Mechanical Computing
– Babbage, Hollerith, Aiken
• Electronic Digital Calculating
– Atanasoff, Eckert, Mauchly
• von Neumann Architecture
– Turing, von Neumann, Eckert, Mauchly, Foster, Wilkes
• Semiconductor Technologies
• Birth of the Supercomputer
– Cray, Watanabe
• The Golden Age
– Batcher, Dennis, S. Chen, Hillis, Dally, Blank, B. Smith
• Common Era of Killer Micros
– Scott, Culler, Sterling/Becker, Goodhue, A. Chen, Tomkins
• Petaflops
– Messina, Sterling, Stevens, P. Smith,

CSC 7600 Lecture 1 : Introduction 28


Spring 2011
Practical Constraints and Limitations
• Cost
– Deployment
– Operational support
• Power
– Energy required to run the computer
– Energy for support facilities
– Energy for cooling (remove heat from
machine)
• Size
– Floor space
– Access way for power and signal cabling
• Reliability
– One factor of availability
• Generality
– How good is it across a range of problems
• Usability
– How hard is it to program and manage

CSC 7600 Lecture 1 : Introduction 29


Spring 2011
Historical Machines

• Leibniz Stepped Reckoner


• Babbage Difference Engine
• Hollerith Tabulator
• Harvard Mark 1
• Un. of Pennsylvania Eniac
• Cambridge Edsac
• MIT Whirlwind
• Cray 1
• TMC CM-2
• Intel Touchstone Delta
• Beowulf
• IBM Blue Gene/L

CSC 7600 Lecture 1 : Introduction 30


Spring 2011
Golden Age of Parallel Architecture
• 1975 – 1992
• Vector
– Cray-1&2, NEC SX, 1976
Fujitsu VPP Cray 1
• SIMD
– Maspar, CM-2
• Systolic
– Warp
• Dataflow
– Manchester, Sigma,
Monsoon
• Multithreaded
– HEP, MTA
• Actor-based
– J-Machine

CSC 7600 Lecture 1 : Introduction


31
Spring 2011
Dark Ages of Parallel Computing
Technology drivers
• 1992 to present
• Killer Micro and mass market
PCs
• High density DRAM
• High cost of fab lines
• CSP
– Message passing
• Economy of scale S-curve
• MPP
• Weak scaling
– Gustafson et al
• Beowulf, NOW Clusters
• MPI
• Ethernet, Myrinet
• Linux

CSC 7600 Lecture 1 : Introduction


32
Spring 2011
Supercomputer Points of Transition

• Automated calculating
– 17th century
• Stored program digital electronic
– 1948
• Vector
– 1975
• SIMD
– 1980s
• MPPs
– 1991
• Commodity Clusters
– 1993/4
• Multicore
– 2006

CSC 7600 Lecture 1 : Introduction


33
Spring 2011
Topics
• HPC Applications
• Supercomputing : An Enabler
• Architecture, Technologies, Programming Models
• Performance oriented theme
• Brief History of HPC
• Sources of Performance Degradation
• Supercomputer System Stack
• Course Overview - Goals & Content
• Course Administration
• Summary Materials for Test

CSC 7600 Lecture 1 : Introduction 34


Spring 2011
Driving Factors for HPC
• Technology trends
– Multicore components
– Heterogeneous structures and accelerators
• The 4 Horsemen of the Apocalypse (SLOW)
– Starvation (sufficient parallelism and load balancing)
– Latency (idle time due to round trip delays)
– Overhead (critical path support mechanisms)
– Waiting for contention (inadequate bandwidth)
• Reliability
– Single point failure modes cannot be tolerated
– Reduced feature size and increased component count
• Power consumption
– Just too much!
– Dominating practical growth in mission critical domains
• Changing application workload characteristics
– Data (meta-data) intensive for sparse numerics and symbolics
• Programmability & ease of use
– System complexity, scale and dynamics defy optimization by
hand
CSC 7600 Lecture 1 : Introduction
35
Spring 2011
Sources of Performance Degradation
(SLOW)
• Starvation
– Not enough work to do due to insufficient parallelism or poor load
balancing among distributed resources
• Latency
– Waiting for access to memory or other parts of the system
• Overhead
– Extra work that has to be done to manage program concurrency
and parallel resources the real work you want to perform
• Waiting for Contention
– Delays due to fighting over what task gets to use a shared
resource next. Network bandwidth is a major constraint.

CSC 7600 Lecture 1 : Introduction 36


Spring 2011
The Memory Wall

500
1000 Ratio
400
Memory Access Time

Memory to CPU Ratio


100 300
Time (ns)

200
10
100
1 CPU Time
0
0.1
1997 1999 2001 2003 2006 2009
X-Axis

CPU Clock Period (ns) Ratio


Memory System Access Time

THE WALL
CSC 7600 Lecture 1 : Introduction 37
Spring 2011
Microprocessors no longer realize the
full potential of VLSI technology

1e+7
1e+6 52%
/ye Perf (ps/Inst)
1e+5 a r
1e+4 Linear (ps/Inst)

1e+3
19%/ye
1e+2 74% 30:1 a r
1e+1 /ye
ar 1,000:1
1e+0
1e-1 30,000:1
1e-2
1e-3
1e-4
1980 1990 2000 2010 2020

CSC 7600 Lecture 1 : Introduction 38


Spring 2011
Amdahl’s Law
TO

start end

TA TF
TO º time for non-accelerated computation
TA º time for accelerated computation
start end
TF º time of portion of computation that can be accelerated
TF/g g º peak performance gain for accelerated portion of computation
f º fraction of non-accelerated computation to be accelerated
S º speed up of computation with acceleration applied
S =TO TA
f =TF TO
æf ö
TA =( 1- f ) ´ TO + ç ÷´ TO
èg ø
TO
S=
æf ö
( 1- f ) ´ TO + ç ÷´ TO
èg ø
1
S=
æf ö
1- f + ç ÷
èg ø

CSC 7600 Lecture 1 : Introduction 39


Spring 2011
Amdahl’s Law with Overhead
TO

start end

tF tF tF tF
TA
n
TF   tFi
start end i

v  overhead of accelerate d work segment


v + tF/g
n
V  total overhead for accelerate d work   vi
i

f
T A  1  f   TO  TO  n  v
g
TO TO
S 
TA 1  f   TO  f  TO  n  v
g
1
S
1  f   f  n  v
g TO

CSC 7600 Lecture 1 : Introduction 40


Spring 2011
Topics
• HPC Applications
• Supercomputing : An Enabler
• Architecture, Technologies, Programming Models
• Performance oriented theme
• Brief History of HPC
• Sources of Performance Degradation
• Supercomputer System Stack
• Course Overview - Goals & Content
• Course Administration
• Summary Materials for Test

CSC 7600 Lecture 1 : Introduction 41


Spring 2011
Supercomputing System Stack
• Device technologies
– Enabling technologies for logic, memory, & communication
– Circuit design
• Computer architecture
– semantics and structures
• Models of computation
– governing principles
• Operating systems
– Manages resources and provides virtual machine
• Compilers and runtime software
– Maps application program to system resources, mechanisms, and
semantics
• Programming
– languages, tools, & environments
• Algorithms
– Numerical techniques
– Means of exposing parallelism
• Applications
– End user problems, often in sciences and technology

CSC 7600 Lecture 1 : Introduction 42


Spring 2011
Topics
• HPC Applications
• Supercomputing : An Enabler
• Architecture, Technologies, Programming Models
• Performance oriented theme
• Brief History of HPC
• Sources of Performance Degradation
• Supercomputer System Stack
• Course Overview – Goals & Content
• Course Administration
• Summary Materials for Test

CSC 7600 Lecture 1 : Introduction 43


Spring 2011
Addressing the Big Questions
• How to integrate technology into computing engines?
• How to push the performance to extremes?
– What are the enabling conditions?
– What are the inhibiting factors?
• How to manage supercomputer resources to deliver useful
computing capabilities?
– What are the hardware mechanisms?
– What are the software policies?
• How do users program such systems?
– What languages and in what environments?
– What are the semantics and strategies?
• What grand challenge applications demand these capabilities?
• What are the computational models and algorithms that can map
the innate application properties to the physical medium of the
machine?

CSC 7600 Lecture 1 : Introduction 44


Spring 2011
Goals of the Course
• A first overview of the entire field of HPC
• Basic concepts that govern the capability and
effectiveness of supercomputers
• Techniques and methods for applying HPC systems
• Tools and environments that facilitate effective
application of supercomputers
• Hands-on experience with widely used systems and
software
• Performance measurement methods, benchmarks,
and metrics
• Practical real-world knowledge about the HPC
community
• Access by students outside the HPC mainstream

CSC 7600 Lecture 1 : Introduction 45


Spring 2011
Student Objectives

• Computational Scientist
• HPC researcher
• System Administrators
• Design Engineers

CSC 7600 Lecture 1 : Introduction 46


Spring 2011
Course Overview: Multiple Segments
• Introduction • System Software
– An Overview – Operating Systems
– Parallel Computer Architecture – Schedulers and Middleware
– Commodity Clusters
– Parallel file I/O
– Benchmarking
– Throughput Computing • Advanced Techniques
• Distributed Memory - MPI ⁻ Visualization
⁻ Parallel Algorithms
– Communicating sequential
processes (CSP) ⁻ HPC Libraries
– Enabling Technologies - Networks • Conclusions
– MPI programming – What’s beyond the scope of this course
– Performance measurement (2)
– What form will the future of HPC take
• Shared Memory – OpenMP
– Single Node Architecture
– Enabling Technologies – Memory,
Core Architectures,..
– Parallel thread computing
– OpenMP programming
– Performance factors and
measurement (1)

CSC 7600 Lecture 1 : Introduction 47


Spring 2011
Introduction & Throughput Computing

January Tu 18 Introduction
  Th 20 Parallel Computer Architecture, Quiz1
  Tu 25 Commodity Cluster
  Th 27 Benchmarking, Quiz2
February Tu 1 Throughput Computing

*Project walkthroughs will be held during


office hours.

CSC 7600 Lecture 1 : Introduction 48


Spring 2011
Distributed Memory & MPI

  Th 3 CSP / Parallelism, Quiz3


  Tu 8 MPI 1
  Th 10 MPI 2 / Performance Measurement (TAU), Quiz4
  Tu 15 Shared Memory / Parallelization, Sample Project Overview

*Project walkthroughs will be held during


office hours.

CSC 7600 Lecture 1 : Introduction 49


Spring 2011
Shared Memory & OpenMP

Enabling Technologies -(memory, architecture,


  Th 17 multicore, cache coherence) , Quiz5
  Tu 22 Pthreads
  Th 24 OpenMP , Quiz6
March Tu 1 Performance Measurement (PAPI…)
  Th 3 Visualization, Quiz7, Project Abstract Due
  Tu 8 Mardi Gras Holidays
  Th 10 Parallel Algorithms 1, Quiz8

*Project walkthroughs will be held during


office hours.

CSC 7600 Lecture 1 : Introduction 50


Spring 2011
Advanced Techniques

  Th 17 Parallel Algorithms 2, Quiz9


  Tu 22 Parallel Algorithms 3, Project Walkthroughs*

  Th 24 Parallel Algorithms 4, Project Walkthroughs*, Quiz10


  Tu 29 Libraries 1
  Th 31 Libraries 2, Quiz11
April Tu 5 Parallel File I/O 1
  Th 7 Parallel File I/O 2, Quiz12
  Tu 12 Operating Systems 1
  Th 14 Operating Systems 2, Quiz13
*Project walkthroughs will be held during
office hours.

CSC 7600 Lecture 1 : Introduction 51


Spring 2011
System Software

  Tu 19 Spring Break
  Th 21 Spring Break
  Tu 26 Scheduling / Workload Management Systems
  Th 28 Checkpointing/System Administration, Project Due, Quiz14
May Tu 3 Beyond and Beyond
  Th 5 Class Summary / Final Exam Review
  Th 12 FINAL EXAM (7:30 – 9:30 AM)

*Project walkthroughs will be held during


office hours.

CSC 7600 Lecture 1 : Introduction 52


Spring 2011
Topics
• HPC Applications
• Supercomputing : An Enabler
• Architecture, Technologies, Programming Models
• Performance oriented theme
• Demo 1 : Performance Scalability
• Brief History of HPC
• Sources of Performance Degradation
• Supercomputer System Stack
• Course Overview - Goals & Content
• Course Administration
• Summary Materials for Test

CSC 7600 Lecture 1 : Introduction 53


Spring 2011
Course Website
• HPC Course Website can be accessed at:
http://www.cct.lsu.edu/csc7600
• Course Info:
– Syllabus
– Schedule
• Contact Information in the (People Section):
email, IM, Phone etc.
• All course announcements will be made via email and Website.
• Lecture Slides will be made available on the course website (Course
Material Section)
• Videos of Lectures will be made available on the course website
(Course Material Section) after every lecture.

CSC 7600 Lecture 1 : Introduction 54


Spring 2011
Contact Information

Prof. Thomas Sterling


[email protected]
(225) 578-8982 (CCT Office)
Johnston Hall 320, (225) 578-3320
Office Hours: Tu(1:00 - 3:00 PM) & Th(9:00 – 10:00 AM)

Teaching Assistant: Course Secretary :


Daniel Kogler Ms. Terrie Bordelon
[email protected] [email protected]
Office Hours : Johnston 318 302 Johnston Hall
Tuesday 1:40 – 3:00 PM
Thursday 9:00 – 10:00 AM (225) 578-5979

CSC 7600 Lecture 1 : Introduction 55


Spring 2011
Grading Policy
Grading Policy for Graduate Students :
• Midterm – 20 %
• Final – 30 %
• Problem Sets – 25 %
• Quizzes – 5 %
• Project – 20 %

Grading Policy for Under-Graduate Students :


• Midterm – 30 %
• Final – 35 %
• Problem Sets – 30 %
• Quizzes – 5 %

CSC 7600 Lecture 1 : Introduction 56


Spring 2011
Assignments
• There will be adequately portioned assignments during
this course.
– Assignments should be turned in as PRINTOUTS to the TA the following
TUESDAY BEFORE CLASS.
– Assignments should be turned in WORD format / PDF format. NO
handwritten assignments will be accepted.
– Assignments involving programming problems should have source code
printed and attached, and all solution relevant materials (e.g. PBS scripts,
commands used for performance measurement etc…) must be well
documented and attached.
– Source code and all relevant files for assignments involving programming
assignments needs to be submitted according to the guidelines
mentioned in each problem-set and is due the same time as the
assignment (late policy for source code submissions is the same as that
of assignments).

CSC 7600 Lecture 1 : Introduction 57


Spring 2011
Assignments
• LATE POLICY:
– All assignments should be turned in on the due date BEFORE the
CLASS.
– Assignments turned in on the same day by 5 PM (Central) will incur a
penalty of 30% of the assignment grade.
– Assignments turned in BEYOND 5PM (Central) of the due date will
receive 0 points irrespective of the work quality.
• IMPORTANT :
– Most of the assignments will need to be run on local
supercomputing resources that are shared among several users.
– Jobs that you submit WILL get stuck in a queue.
– “Queue ate my homework” is NOT an acceptable excuse for not
turning homework in.
– You are strongly encouraged to start working on assignments as
and when they are assigned to avoid inevitable queue wait times.

CSC 7600 Lecture 1 : Introduction 58


Spring 2011
Graduate Student Projects

• Term projects are required for Graduate Students


• Sample Topics
– Parallel Image Processing
– Application performance measurement
– Advanced visualization techniques
– Parallel Programming
• LATE POLICY:
– Abstracts turned in later than the assigned date will incur
an overall project penalty of 5%
– Walkthroughs done later than the assigned date will incur
a overall project penalty of 15%
– Projects turned in later than the assigned date will NOT be
considered for grading and will have an automatic score
of 0.

CSC 7600 Lecture 1 : Introduction


59
Spring 2011
Graduate Student Project Topics

• Application Scaling : detailed analysis & performance


profiling of application(s) based on parameters such as
number of processors, application performance
bottlenecks, etc..
• Application Development : design and develop new
parallel applications with simple performance profiling
analysis.
• Architecture Comparative Studies: alternative networks,
processors, accelerators

CSC 7600 Lecture 1 : Introduction


Spring 2011 60
Reference Material
• No Required Textbook
• Lecture notes (slides), required reading lists
(URLs) provided at the end of lectures, some
additional notes (on web site), and assignments
would be primary sources of material for exams.
• Students are strongly encouraged to pursue
additional reading material available on the
internet (and as part of projects).

CSC 7600 Lecture 1 : Introduction 61


Spring 2011
DEMO: Computing Resources Overview

presented by Adam Yates


CSC 7600 Lecture 1 : Introduction 62
Spring 2011
Computing Resources

Arete [arete.cct.lsu.edu]


64 compute nodes x 8 cores

Quad-core AMD Opteron Processor @ 2.4 Ghz

8 GB RAM per Node

24TB of shared storage

1GB ethernet network interface

10GB Infiniband interconnect

CSC 7600 Lecture 1 : Introduction 63


Spring 2011
Plagiarism
• The LSU Code of Student Conduct defines plagiarism in Section
5.1.16:
– "Plagiarism is defined as the unacknowledged inclusion of someone else's words, structure, ideas, or
data. When a student submits work as his/her own that includes the words, structure, ideas, or data
of others, the source of this information must be acknowledged through complete, accurate, and
specific references, and, if verbatim statements are included, through quotation marks as well.
Failure to identify any source (including interviews, surveys, etc.), published in any medium (including
on the internet) or unpublished, from which words, structure, ideas, or data have been taken,
constitutes plagiarism;“

• Plagiarism will not be tolerated and will be dealt with in


accordance with and as outlined by the LSU Code of Student
Conduct :

http://appl003.lsu.edu/slas/dos.nsf/$Content/Code+of+Conduct?
OpenDocument

CSC 7600 Lecture 1 : Introduction 64


Spring 2011
Topics
• HPC Applications
• Supercomputing : An Enabler
• Architecture, Technologies, Programming Models
• Performance oriented theme
• Demo 1 : Performance Scalability
• Brief History of HPC
• Sources of Performance Degradation
• Supercomputer System Stack
• Course Overview - Goals & Content
• Course Administration
• Summary Materials for Test

CSC 7600 Lecture 1 : Introduction 65


Spring 2011
Summary Materials for Test

• Defining Supercomputer – slide 21


• Performance Issues in HPC – slide 24
• Scalability – slide 25
• Machine parameters affecting performance – slide 26
• Driving factors for HPC – slide 35
• Sources of performance degradation – slide 36
• Supercomputing system stack – slide 42

CSC 7600 Lecture 1 : Introduction 66


Spring 2011
CSC 7600 Lecture 1 : Introduction
Spring 2011
ENIAC
(Electronic Numerical Integrator and Computer )

• Eckert and Mauchly,


1946.
• Vacuum tubes.
• Numerical solutions to
problems in fields such
as atomic energy and
ballistic trajectories.

CSC 7600 Lecture 1 : Introduction


68
Spring 2011
EDSAC
(Electronic Delay Storage Automatic Calculator)

• Maurice Wilkes, 1949.


• Mercury delay lines for
memory and vacuum
tubes for logic.
• Used one of the first
assemblers called Initial
Orders.
• Calculation of prime
numbers, solutions of
algebraic equations,
etc.

CSC 7600 Lecture 1 : Introduction


69
Spring 2011
MIT Whirlwind

• Jay Forrester, 1949.


• Fastest computer.
• First computer to use
magnetic core memory.
• Displayed real time text
and graphics on a large
oscilloscope screen.

CSC 7600 Lecture 1 : Introduction


70
Spring 2011
CRAY-1

• Cray Research, 1976.


• Pipelined vector
arithmetic units.
• Unique C-shape to help
increase the signal
speeds from one end to
the other.

CSC 7600 Lecture 1 : Introduction


71
Spring 2011
CM-2

• Thinking Machines
Corporation, 1987.
• Hypercube architecture
with 65,536 processors.
• SIMD.
• Performance in the
range of GFLOPS.

CSC 7600 Lecture 1 : Introduction


72
Spring 2011
INTEL Touchstone Delta
• INTEL, 1990.
• MIMD hypercube.
• LINPACK rating of 13.9
GFLOPS .
• Enough computing
power for applications
like real-time
processing of satellite
images and molecular
models for AIDS
research.

CSC 7600 Lecture 1 : Introduction


73
Spring 2011
Beowulf
• Thomas Sterling and
Donald Becker, 1994.
• Cluster formed of one
head node and one/more
compute nodes.
• Nodes and network
dedicated to the Beowulf.
• Compute nodes are
mass produced
commodities.
• Use open source
software including Linux.

CSC 7600 Lecture 1 : Introduction


74
Spring 2011
Earth Simulator

• Japan, 1997.
• Fastest supercomputer
from 2002-2004: 35.86
TFLOPS.
• 640 nodes with eight
vector processors and
16 gigabytes of
computer memory at
each node.

CSC 7600 Lecture 1 : Introduction


75
Spring 2011
BlueGene/L

• IBM, 2004.
• First supercomputer
ever to run over 100
TFLOPS sustained on a
real world application,
namely a three-
dimensional molecular
dynamics code
(ddcMD).

CSC 7600 Lecture 1 : Introduction


76
Spring 2011
CSC 7600 Lecture 1 : Introduction77
Spring 2011

You might also like