0% found this document useful (0 votes)

91 views77 pages

An Introduction: Prof. Thomas Sterling Department of Computer Science Louisiana State University January 18, 2011

This document appears to be the introduction slides for a lecture on high performance computing. It discusses topics like HPC applications, supercomputing history, computer architecture, and performance. It also shows images related to hurricanes and supercomputers. The introduction aims to provide an overview of HPC, including brief histories of computing power and the evolution of technologies that have enabled faster supercomputers over time.

Uploaded by

Priya Srivastava

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

91 views77 pages

An Introduction: Prof. Thomas Sterling Department of Computer Science Louisiana State University January 18, 2011

Uploaded by

Priya Srivastava

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 77

Prof.

Thomas Sterling
Department of Computer Science
Louisiana State University
January 18, 2011

HIGH PERFORMANCE COMPUTING:

MODELS, METHODS, & MEANS
AN INTRODUCTION

CSC 7600 Lecture 1 : Introduction

Spring 2011
Aerial & Satellite of Hurricane Katrina

CSC 7600 Lecture 1 : Introduction 2

Spring 2011
Devastation from Hurricane Katrina

CSC 7600 Lecture 1 : Introduction 3

Spring 2011
Simulating Katrina

CSC 7600 Lecture 1 : Introduction

Spring 2011
Evolution of HPC

1959
IBM 7094 1976 1991 1996 2003 2009
1949 Cray 1 Intel Delta T3E Cray X1 Cray XT5
1 Edsac 103 106 109 1012 1015

One OPS KiloOPS MegaOPS GigaOPS TeraOPS PetaOPS

1823 1951 1964 1982 1988 1997 2001 2006

Babbage Difference 1943
Harvard Univac 1 CDC 6600 Cray XMP Cray YMP ASCI Red Earth BlueGene/L
Engine Simulator
Mark 1

CSC 7600 Lecture 1 : Introduction

Spring 2011 5
New Fastest Computer in the World

DEPARTMENT OF COMPUTER SCIENCE @ CSC 7600 Lecture 1 : Introduction 6

LOUISIANA STATE UNIVERSITY
Spring 2011
2nd Fastest Computer in the World
Jaguar (Cray XT5-HE)
• Owned by Oak Ridge National Laboratory
• Breaks Petaflops processing barrier(1.759e+15 flops)
• Contains 224,162 AMD x86_64 Opteron Six Core 2600
MHz chips

CSC 7600 Lecture 1 : Introduction 7

Spring 2011
Topics
• HPC Applications
• Supercomputing : An Enabler
• Architecture, Technologies, Programming Models
• Performance oriented theme
• Brief History of HPC
• Sources of Performance Degradation
• Supercomputer System Stack
• Course Overview - Goals & Content
• Course Administration
• Summary Materials for Test

CSC 7600 Lecture 1 : Introduction 8

Spring 2011
Synergy Drives Supercomputing Evolution
• Technology
– Enables digital technology
– Defines balance of capabilities
– Establishes relationship of relative costs
• Architecture
– Creates interface between computation and technology
– Determines structures of technology-based components
– Establishes low-level semantics of operation
– Provides low-cost mechanisms
• Model of Computation
– Paradigm by which computation is manifest
– Provides governing principles of architecture operation
– Implies programming model and languages

CSC 7600 Lecture 1 : Introduction 9

Spring 2011
Where Does Performance Come From?
• Device Technology
– Logic switching speed and device density
– Memory capacity and access time
– Communications bandwidth and latency
• Computer Architecture
– Instruction issue rate
• Execution pipelining
• Reservation stations
• Branch prediction
• Cache management
– Parallelism
• Parallelism – number of operations per cycle
per processor
– Instruction level parallelism (ILP)
– Vector processing
• Parallelism – number of processors per node
• Parallelism – number of nodes in a system

CSC 7600 Lecture 1 : Introduction

10
Spring 2011
Major Technology Generations
(dates approximate)
• Electromechanical
– 19th century through 1st half of 20th century
• Digital electronic with vacuum tubes
– 1940s
• Core memory
– 1950
• Transistors
– 1947
• SSI & MSI RTL/DTL/TTL semiconductor
– 1970
• DRAM
– 1970s
• CMOS VLSI
– 1990
• Multicore
– 2006

CSC 7600 Lecture 1 : Introduction

11
Spring 2011
The SIA ITRS Roadmap

100,000
MB per DRAM Chip
10,000 Logic Transistors per Chip (M)
uP Clock (MHz)
1,000

100

1
1997

2003
1999

2001

2006

2009

2012
Year of Technology Availability
CSC 7600 Lecture 1 : Introduction 12
Spring 2011
Classical DRAM
• Memory mats: ~ 1 Mbit each
• Row Decoders
• Primary Sense Amps
• Secondary sense amps & “page” multiplexing
• Timing, BIST, Interface
• Kerf
1000 1.00
0.90
100
0.80
10

% Chip Overhead
0.70
Gbits per chip

1
0.60
0.1 0.50
0.01 0.40

0.001 0.30
0.20
0.0001
0.10
0.00001
0.00
0.000001
1970 1980 1990 2000 2010 2020
1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 2020
Historical SIA Production SIA Introduction
Historical ITRS @ Production ITRS @ Introduction

Density/Chip has dropped below 4X/3yrs And 45% of Die is Non-Memory

CSC 7600 Lecture 1 : Introduction 13

Spring 2011
Peak Logic Clock Rates

100,000 100000

w
La
’s
10,000 10000

re
aw

Clock (MHz)
’s L
Clock (MHz)

M
re 3 GHz 3 GHz

al
oo

s ic
1,000 1000

s
M

Cla
l
ca
ssi
Cla

100 100

10 10
1975 1980 1985 1990 1995 2000 2005 2010 2015 2020 10000 1000 100 10
Historical ITRS Max Clock Rate (12 invertors)
Feature Size
Historical ITRS Max

2005 projection was for 5.2 GHz – and we didn’t make it in production.
Further, we’re still stuck at 3+GHz in production.
CSC 7600 Lecture 1 : Introduction 14
Spring 2011
Classes of Architecture for
High Performance Computers
• Parallel Vector Processors (PVP)
– NEC Earth Simulator, SX-6
– Cray- 1, 2, XMP, YMP, C90, T90, X1
– Fujitsu 5000 series
• Massively Parallel Processors (MPP)
– Intel Touchstone Delta & Paragon
– TMC CM-5
– IBM SP-2 & 3, Blue Gene/Light
– Cray T3D, T3E, Red Storm/Strider
• Distributed Shared Memory (DSM)
– SGI Origin
– HP Superdome
• Single Instruction stream Multiple Data stream
(SIMD)
– Goodyear MPP, MasPar 1 & 2, TMC CM-2
• Commodity Clusters
– Beowulf-class PC/Linux clusters
– Constellations
– HP Compaq SC, Linux NetworX MCR

CSC 7600 Lecture 1 : Introduction

Spring 2011 15
Top 500 : System Architecture

CSC 7600 Lecture 1 : Introduction 16

Spring 2011
Driving Issues/Trends
• Multicore
– Now: 8, AMD Opterons, Intel Xeon
– possibly 100’s
– will be million-way parallelism
• Heterogeneity
– GPGPU
– Clearspeed
– Cell SPE
• Component I/O Pins
– Off chip bandwidth not increasing with demand
• Limited number of pins
• Limited bandwidth per pin (pair)
– Cache size per core may decline
– Shared cache fragmentation
• System Interconnect
– Node bandwidth not increasing proportionally
to core demand
• Power
– Mwatts at the high end = millions of $s per year

CSC 7600 Lecture 1 : Introduction

17
Spring 2011
Multi-Core
• Motivation for Multi-Core
– Exploits improved feature-size and density
– Increases functional units per chip (spatial efficiency)
– Limits energy consumption per operation
– Constrains growth in processor complexity
• Challenges resulting from multi-core
– Relies on effective exploitation of multiple-thread
parallelism
• Need for parallel computing model and parallel
programming model
– Aggravates memory wall
• Memory bandwidth
– Way to get data out of memory banks
– Way to get data into multi-core processor array
• Memory latency
• Fragments L3 cache
– Pins become strangle point
• Rate of pin growth projected to slow and flatten
• Rate of bandwidth per pin (pair) projected to grow slowly
– Requires mechanisms for efficient inter-processor
coordination
• Synchronization
• Mutual exclusion
• Context switching

CSC 7600 Lecture 1 : Introduction

Spring 2011 18
Heterogeneous Multicore Architecture

• Combines different types of processors

– Each optimized for a different operational modality
• Performance > nX better than other n processor types
– Synthesis favors superior performance
• For complex computation exhibiting distinct modalities
• Conventional co-processors
– Graphical processing units (GPU)
– Network controllers (NIC)
– Efforts underway to apply existing special purpose
components to general applications
• Purpose-designed accelerators
– Integrated to significantly speedup some critical aspect
of one or more important classes of computation
– IBM Cell architecture
– ClearSpeed SIMD attached array processor

CSC 7600 Lecture 1 : Introduction

Spring 2011 19
Topics
• HPC Applications
• Supercomputing : An Enabler
• Architecture, Technologies, Programming Models
• Performance oriented theme
• Brief History of HPC
• Sources of Performance Degradation
• Supercomputer System Stack
• Course Overview - Goals & Content
• Course Administration
• Summary Materials for Test

CSC 7600 Lecture 1 : Introduction 20

Spring 2011
Definitions: “supercomputer”

Supercomputer: A computing system exhibiting high-end performance

capabilities and resource capacities within practical constraints of technology,
cost, power, and reliability. Thomas Sterling, 2007

Supercomputer: a large very fast mainframe used especially for

scientific computations. Merriam-Webster Online

Supercomputer: any of a class of extremely powerful computers. The term is

commonly applied to the fastest high-performance systems available at any given time.
Such computers are used primarily for scientific and engineering work requiring
exceedingly high-speed computations. Encyclopedia Britannica Online

CSC 7600 Lecture 1 : Introduction 21

Spring 2011
Moore’s Law
Moore's Law describes a long-
term trend in the history of
computing hardware, in which
the number of transistors that
can be placed inexpensively on
an integrated circuit has doubled
approximately every two years.

CSC 7600 Lecture 1 : Introduction 22

Spring 2011
Top 500 List

CSC 7600 Lecture 1 : Introduction 23

Spring 2011
Performance
• Performance:
– A quantifiable measure of rate of doing (computational) work
– Multiple such measures of performance
• Delineated at the level of the basic operation
– ops – operations per second
– ips – instructions per second
– flops – floating operations per second
• Rate at which a benchmark program takes to execute
– A carefully crafted and controlled code used to compare systems
– Linpack Rmax (Linpack flops)
– gups (billion updates per second)
– others
• Two perspectives on performance
– Peak performance
• Maximum theoretical performance possible for a system
– Sustained performance
• Observed performance for a particular workload and run
• Varies across workloads and possibly between runs

CSC 7600 Lecture 1 : Introduction 24

Spring 2011
Scalability
• The ability to deliver proportionally greater sustained performance through
increased system resources
• Strong Scaling
– Fixed size application problem
– Application size remains constant with increase in system size
• Weak Scaling
– Variable size application problem
– Application size scales proportionally with system size
• Capability computing
– in most pure form: strong scaling
– Marketing claims tend toward this class
• Capacity computing
– Throughput computing
• Includes job-stream workloads
– In most simple form: weak scaling
• Cooperative computing
– Interacting and coordinating concurrent processes
– Not a widely used term
– Also: coordinated computing

CSC 7600 Lecture 1 : Introduction 25

Spring 2011
Machine Parameters affecting Performance
• Peak floating point performance
• Main memory capacity
• Bi-section bandwidth
• I/O bandwidth
• Secondary storage capacity
• Organization
– Class of system
– # nodes
– # processors per node
– Accelerators
– Network topology
• Control strategy
– MIMD
– Vector, PVP
– SIMD
– SPMD

CSC 7600 Lecture 1 : Introduction 26

CSC 7600 Lecture 1 : Introduction 27

Spring 2011
A Brief History of Supercomputing
• Mechanical Computing
– Babbage, Hollerith, Aiken
• Electronic Digital Calculating
– Atanasoff, Eckert, Mauchly
• von Neumann Architecture
– Turing, von Neumann, Eckert, Mauchly, Foster, Wilkes
• Semiconductor Technologies
• Birth of the Supercomputer
– Cray, Watanabe
• The Golden Age
– Batcher, Dennis, S. Chen, Hillis, Dally, Blank, B. Smith
• Common Era of Killer Micros
– Scott, Culler, Sterling/Becker, Goodhue, A. Chen, Tomkins
• Petaflops
– Messina, Sterling, Stevens, P. Smith,

CSC 7600 Lecture 1 : Introduction 28

Spring 2011
Practical Constraints and Limitations
• Cost
– Deployment
– Operational support
• Power
– Energy required to run the computer
– Energy for support facilities
– Energy for cooling (remove heat from
machine)
• Size
– Floor space
– Access way for power and signal cabling
• Reliability
– One factor of availability
• Generality
– How good is it across a range of problems
• Usability
– How hard is it to program and manage

CSC 7600 Lecture 1 : Introduction 29

Spring 2011
Historical Machines

• Leibniz Stepped Reckoner

• Babbage Difference Engine
• Hollerith Tabulator
• Harvard Mark 1
• Un. of Pennsylvania Eniac
• Cambridge Edsac
• MIT Whirlwind
• Cray 1
• TMC CM-2
• Intel Touchstone Delta
• Beowulf
• IBM Blue Gene/L

CSC 7600 Lecture 1 : Introduction 30

Spring 2011
Golden Age of Parallel Architecture
• 1975 – 1992
• Vector
– Cray-1&2, NEC SX, 1976
Fujitsu VPP Cray 1
• SIMD
– Maspar, CM-2
• Systolic
– Warp
• Dataflow
– Manchester, Sigma,
Monsoon
• Multithreaded
– HEP, MTA
• Actor-based
– J-Machine

CSC 7600 Lecture 1 : Introduction

31
Spring 2011
Dark Ages of Parallel Computing
Technology drivers
• 1992 to present
• Killer Micro and mass market
PCs
• High density DRAM
• High cost of fab lines
• CSP
– Message passing
• Economy of scale S-curve
• MPP
• Weak scaling
– Gustafson et al
• Beowulf, NOW Clusters
• MPI
• Ethernet, Myrinet
• Linux

CSC 7600 Lecture 1 : Introduction

32
Spring 2011
Supercomputer Points of Transition

• Automated calculating
– 17th century
• Stored program digital electronic
– 1948
• Vector
– 1975
• SIMD
– 1980s
• MPPs
– 1991
• Commodity Clusters
– 1993/4
• Multicore
– 2006

CSC 7600 Lecture 1 : Introduction

33
Spring 2011
Topics
• HPC Applications
• Supercomputing : An Enabler
• Architecture, Technologies, Programming Models
• Performance oriented theme
• Brief History of HPC
• Sources of Performance Degradation
• Supercomputer System Stack
• Course Overview - Goals & Content
• Course Administration
• Summary Materials for Test

CSC 7600 Lecture 1 : Introduction 34

Spring 2011
Driving Factors for HPC
• Technology trends
– Multicore components
– Heterogeneous structures and accelerators
• The 4 Horsemen of the Apocalypse (SLOW)
– Starvation (sufficient parallelism and load balancing)
– Latency (idle time due to round trip delays)
– Overhead (critical path support mechanisms)
– Waiting for contention (inadequate bandwidth)
• Reliability
– Single point failure modes cannot be tolerated
– Reduced feature size and increased component count
• Power consumption
– Just too much!
– Dominating practical growth in mission critical domains
• Changing application workload characteristics
– Data (meta-data) intensive for sparse numerics and symbolics
• Programmability & ease of use
– System complexity, scale and dynamics defy optimization by
hand
CSC 7600 Lecture 1 : Introduction
35
Spring 2011
Sources of Performance Degradation
(SLOW)
• Starvation
– Not enough work to do due to insufficient parallelism or poor load
balancing among distributed resources
• Latency
– Waiting for access to memory or other parts of the system
• Overhead
– Extra work that has to be done to manage program concurrency
and parallel resources the real work you want to perform
• Waiting for Contention
– Delays due to fighting over what task gets to use a shared
resource next. Network bandwidth is a major constraint.

CSC 7600 Lecture 1 : Introduction 36

Spring 2011
The Memory Wall

500
1000 Ratio
400
Memory Access Time

Memory to CPU Ratio

100 300
Time (ns)

200
10
100
1 CPU Time
0
0.1
1997 1999 2001 2003 2006 2009
X-Axis

CPU Clock Period (ns) Ratio

Memory System Access Time

THE WALL
CSC 7600 Lecture 1 : Introduction 37
Spring 2011
Microprocessors no longer realize the
full potential of VLSI technology

1e+7
1e+6 52%
/ye Perf (ps/Inst)
1e+5 a r
1e+4 Linear (ps/Inst)

1e+3
19%/ye
1e+2 74% 30:1 a r
1e+1 /ye
ar 1,000:1
1e+0
1e-1 30,000:1
1e-2
1e-3
1e-4
1980 1990 2000 2010 2020

CSC 7600 Lecture 1 : Introduction 38

Spring 2011
Amdahl’s Law
TO

start end

TA TF
TO º time for non-accelerated computation
TA º time for accelerated computation
start end
TF º time of portion of computation that can be accelerated
TF/g g º peak performance gain for accelerated portion of computation
f º fraction of non-accelerated computation to be accelerated
S º speed up of computation with acceleration applied
S =TO TA
f =TF TO
æf ö
TA =( 1- f ) ´ TO + ç ÷´ TO
èg ø
TO
S=
æf ö
( 1- f ) ´ TO + ç ÷´ TO
èg ø
1
S=
æf ö
1- f + ç ÷
èg ø

CSC 7600 Lecture 1 : Introduction 39

Spring 2011
Amdahl’s Law with Overhead
TO

start end

tF tF tF tF
TA
n
TF   tFi
start end i

v  overhead of accelerate d work segment

v + tF/g
n
V  total overhead for accelerate d work   vi
i

f
T A  1  f   TO  TO  n  v
g
TO TO
S 
TA 1  f   TO  f  TO  n  v
g
1
S
1  f   f  n  v
g TO

CSC 7600 Lecture 1 : Introduction 40

CSC 7600 Lecture 1 : Introduction 41

Spring 2011
Supercomputing System Stack
• Device technologies
– Enabling technologies for logic, memory, & communication
– Circuit design
• Computer architecture
– semantics and structures
• Models of computation
– governing principles
• Operating systems
– Manages resources and provides virtual machine
• Compilers and runtime software
– Maps application program to system resources, mechanisms, and
semantics
• Programming
– languages, tools, & environments
• Algorithms
– Numerical techniques
– Means of exposing parallelism
• Applications
– End user problems, often in sciences and technology

CSC 7600 Lecture 1 : Introduction 42

Spring 2011
Topics
• HPC Applications
• Supercomputing : An Enabler
• Architecture, Technologies, Programming Models
• Performance oriented theme
• Brief History of HPC
• Sources of Performance Degradation
• Supercomputer System Stack
• Course Overview – Goals & Content
• Course Administration
• Summary Materials for Test

CSC 7600 Lecture 1 : Introduction 43

Spring 2011
Addressing the Big Questions
• How to integrate technology into computing engines?
• How to push the performance to extremes?
– What are the enabling conditions?
– What are the inhibiting factors?
• How to manage supercomputer resources to deliver useful
computing capabilities?
– What are the hardware mechanisms?
– What are the software policies?
• How do users program such systems?
– What languages and in what environments?
– What are the semantics and strategies?
• What grand challenge applications demand these capabilities?
• What are the computational models and algorithms that can map
the innate application properties to the physical medium of the
machine?

CSC 7600 Lecture 1 : Introduction 44

Spring 2011
Goals of the Course
• A first overview of the entire field of HPC
• Basic concepts that govern the capability and
effectiveness of supercomputers
• Techniques and methods for applying HPC systems
• Tools and environments that facilitate effective
application of supercomputers
• Hands-on experience with widely used systems and
software
• Performance measurement methods, benchmarks,
and metrics
• Practical real-world knowledge about the HPC
community
• Access by students outside the HPC mainstream

CSC 7600 Lecture 1 : Introduction 45

Spring 2011
Student Objectives

• Computational Scientist
• HPC researcher
• System Administrators
• Design Engineers

CSC 7600 Lecture 1 : Introduction 46

Spring 2011
Course Overview: Multiple Segments
• Introduction • System Software
– An Overview – Operating Systems
– Parallel Computer Architecture – Schedulers and Middleware
– Commodity Clusters
– Parallel file I/O
– Benchmarking
– Throughput Computing • Advanced Techniques
• Distributed Memory - MPI ⁻ Visualization
⁻ Parallel Algorithms
– Communicating sequential
processes (CSP) ⁻ HPC Libraries
– Enabling Technologies - Networks • Conclusions
– MPI programming – What’s beyond the scope of this course
– Performance measurement (2)
– What form will the future of HPC take
• Shared Memory – OpenMP
– Single Node Architecture
– Enabling Technologies – Memory,
Core Architectures,..
– Parallel thread computing
– OpenMP programming
– Performance factors and
measurement (1)

CSC 7600 Lecture 1 : Introduction 47

Spring 2011
Introduction & Throughput Computing

January Tu 18 Introduction
Th 20 Parallel Computer Architecture, Quiz1
Tu 25 Commodity Cluster
Th 27 Benchmarking, Quiz2
February Tu 1 Throughput Computing

*Project walkthroughs will be held during

office hours.

CSC 7600 Lecture 1 : Introduction 48

Spring 2011
Distributed Memory & MPI

Th 3 CSP / Parallelism, Quiz3

Tu 8 MPI 1
Th 10 MPI 2 / Performance Measurement (TAU), Quiz4
Tu 15 Shared Memory / Parallelization, Sample Project Overview

*Project walkthroughs will be held during

office hours.

CSC 7600 Lecture 1 : Introduction 49

Spring 2011
Shared Memory & OpenMP

Enabling Technologies -(memory, architecture,

Th 17 multicore, cache coherence) , Quiz5
Tu 22 Pthreads
Th 24 OpenMP , Quiz6
March Tu 1 Performance Measurement (PAPI…)
Th 3 Visualization, Quiz7, Project Abstract Due
Tu 8 Mardi Gras Holidays
Th 10 Parallel Algorithms 1, Quiz8

*Project walkthroughs will be held during

office hours.

CSC 7600 Lecture 1 : Introduction 50

Spring 2011
Advanced Techniques

Th 17 Parallel Algorithms 2, Quiz9

Tu 22 Parallel Algorithms 3, Project Walkthroughs*

Th 24 Parallel Algorithms 4, Project Walkthroughs*, Quiz10

Tu 29 Libraries 1
Th 31 Libraries 2, Quiz11
April Tu 5 Parallel File I/O 1
Th 7 Parallel File I/O 2, Quiz12
Tu 12 Operating Systems 1
Th 14 Operating Systems 2, Quiz13
*Project walkthroughs will be held during
office hours.

CSC 7600 Lecture 1 : Introduction 51

Spring 2011
System Software

Tu 19 Spring Break
Th 21 Spring Break
Tu 26 Scheduling / Workload Management Systems
Th 28 Checkpointing/System Administration, Project Due, Quiz14
May Tu 3 Beyond and Beyond
Th 5 Class Summary / Final Exam Review
Th 12 FINAL EXAM (7:30 – 9:30 AM)

*Project walkthroughs will be held during

office hours.

CSC 7600 Lecture 1 : Introduction 52

Spring 2011
Topics
• HPC Applications
• Supercomputing : An Enabler
• Architecture, Technologies, Programming Models
• Performance oriented theme
• Demo 1 : Performance Scalability
• Brief History of HPC
• Sources of Performance Degradation
• Supercomputer System Stack
• Course Overview - Goals & Content
• Course Administration
• Summary Materials for Test

CSC 7600 Lecture 1 : Introduction 53

Spring 2011
Course Website
• HPC Course Website can be accessed at:
http://www.cct.lsu.edu/csc7600
• Course Info:
– Syllabus
– Schedule
• Contact Information in the (People Section):
email, IM, Phone etc.
• All course announcements will be made via email and Website.
• Lecture Slides will be made available on the course website (Course
Material Section)
• Videos of Lectures will be made available on the course website
(Course Material Section) after every lecture.

CSC 7600 Lecture 1 : Introduction 54

Spring 2011
Contact Information

Prof. Thomas Sterling

[email protected]
(225) 578-8982 (CCT Office)
Johnston Hall 320, (225) 578-3320
Office Hours: Tu(1:00 - 3:00 PM) & Th(9:00 – 10:00 AM)

Teaching Assistant: Course Secretary :

Daniel Kogler Ms. Terrie Bordelon
[email protected] [email protected]
Office Hours : Johnston 318 302 Johnston Hall
Tuesday 1:40 – 3:00 PM
Thursday 9:00 – 10:00 AM (225) 578-5979

CSC 7600 Lecture 1 : Introduction 55

Spring 2011
Grading Policy
Grading Policy for Graduate Students :
• Midterm – 20 %
• Final – 30 %
• Problem Sets – 25 %
• Quizzes – 5 %
• Project – 20 %

Grading Policy for Under-Graduate Students :

• Midterm – 30 %
• Final – 35 %
• Problem Sets – 30 %
• Quizzes – 5 %

CSC 7600 Lecture 1 : Introduction 56

Spring 2011
Assignments
• There will be adequately portioned assignments during
this course.
– Assignments should be turned in as PRINTOUTS to the TA the following
TUESDAY BEFORE CLASS.
– Assignments should be turned in WORD format / PDF format. NO
handwritten assignments will be accepted.
– Assignments involving programming problems should have source code
printed and attached, and all solution relevant materials (e.g. PBS scripts,
commands used for performance measurement etc…) must be well
documented and attached.
– Source code and all relevant files for assignments involving programming
assignments needs to be submitted according to the guidelines
mentioned in each problem-set and is due the same time as the
assignment (late policy for source code submissions is the same as that
of assignments).

CSC 7600 Lecture 1 : Introduction 57

Spring 2011
Assignments
• LATE POLICY:
– All assignments should be turned in on the due date BEFORE the
CLASS.
– Assignments turned in on the same day by 5 PM (Central) will incur a
penalty of 30% of the assignment grade.
– Assignments turned in BEYOND 5PM (Central) of the due date will
receive 0 points irrespective of the work quality.
• IMPORTANT :
– Most of the assignments will need to be run on local
supercomputing resources that are shared among several users.
– Jobs that you submit WILL get stuck in a queue.
– “Queue ate my homework” is NOT an acceptable excuse for not
turning homework in.
– You are strongly encouraged to start working on assignments as
and when they are assigned to avoid inevitable queue wait times.

CSC 7600 Lecture 1 : Introduction 58

Spring 2011
Graduate Student Projects

• Term projects are required for Graduate Students

• Sample Topics
– Parallel Image Processing
– Application performance measurement
– Advanced visualization techniques
– Parallel Programming
• LATE POLICY:
– Abstracts turned in later than the assigned date will incur
an overall project penalty of 5%
– Walkthroughs done later than the assigned date will incur
a overall project penalty of 15%
– Projects turned in later than the assigned date will NOT be
considered for grading and will have an automatic score
of 0.

CSC 7600 Lecture 1 : Introduction

59
Spring 2011
Graduate Student Project Topics

• Application Scaling : detailed analysis & performance

profiling of application(s) based on parameters such as
number of processors, application performance
bottlenecks, etc..
• Application Development : design and develop new
parallel applications with simple performance profiling
analysis.
• Architecture Comparative Studies: alternative networks,
processors, accelerators

CSC 7600 Lecture 1 : Introduction

Spring 2011 60
Reference Material
• No Required Textbook
• Lecture notes (slides), required reading lists
(URLs) provided at the end of lectures, some
additional notes (on web site), and assignments
would be primary sources of material for exams.
• Students are strongly encouraged to pursue
additional reading material available on the
internet (and as part of projects).

CSC 7600 Lecture 1 : Introduction 61

Spring 2011
DEMO: Computing Resources Overview

presented by Adam Yates

CSC 7600 Lecture 1 : Introduction 62
Spring 2011
Computing Resources

Arete [arete.cct.lsu.edu]

●
64 compute nodes x 8 cores
●
Quad-core AMD Opteron Processor @ 2.4 Ghz
●
8 GB RAM per Node
●
24TB of shared storage
●
1GB ethernet network interface
●
10GB Infiniband interconnect

CSC 7600 Lecture 1 : Introduction 63

Spring 2011
Plagiarism
• The LSU Code of Student Conduct defines plagiarism in Section
5.1.16:
– "Plagiarism is defined as the unacknowledged inclusion of someone else's words, structure, ideas, or
data. When a student submits work as his/her own that includes the words, structure, ideas, or data
of others, the source of this information must be acknowledged through complete, accurate, and
specific references, and, if verbatim statements are included, through quotation marks as well.
Failure to identify any source (including interviews, surveys, etc.), published in any medium (including
on the internet) or unpublished, from which words, structure, ideas, or data have been taken,
constitutes plagiarism;“

• Plagiarism will not be tolerated and will be dealt with in

accordance with and as outlined by the LSU Code of Student
Conduct :

http://appl003.lsu.edu/slas/dos.nsf/$Content/Code+of+Conduct?
OpenDocument

CSC 7600 Lecture 1 : Introduction 64

CSC 7600 Lecture 1 : Introduction 65

Spring 2011
Summary Materials for Test

• Defining Supercomputer – slide 21

• Performance Issues in HPC – slide 24
• Scalability – slide 25
• Machine parameters affecting performance – slide 26
• Driving factors for HPC – slide 35
• Sources of performance degradation – slide 36
• Supercomputing system stack – slide 42

CSC 7600 Lecture 1 : Introduction 66

Spring 2011
CSC 7600 Lecture 1 : Introduction
Spring 2011
ENIAC
(Electronic Numerical Integrator and Computer )

• Eckert and Mauchly,

1946.
• Vacuum tubes.
• Numerical solutions to
problems in fields such
as atomic energy and
ballistic trajectories.

CSC 7600 Lecture 1 : Introduction

68
Spring 2011
EDSAC
(Electronic Delay Storage Automatic Calculator)

• Maurice Wilkes, 1949.

• Mercury delay lines for
memory and vacuum
tubes for logic.
• Used one of the first
assemblers called Initial
Orders.
• Calculation of prime
numbers, solutions of
algebraic equations,
etc.

CSC 7600 Lecture 1 : Introduction

69
Spring 2011
MIT Whirlwind

• Jay Forrester, 1949.

• Fastest computer.
• First computer to use
magnetic core memory.
• Displayed real time text
and graphics on a large
oscilloscope screen.

CSC 7600 Lecture 1 : Introduction

70
Spring 2011
CRAY-1

• Cray Research, 1976.

• Pipelined vector
arithmetic units.
• Unique C-shape to help
increase the signal
speeds from one end to
the other.

CSC 7600 Lecture 1 : Introduction

71
Spring 2011
CM-2

• Thinking Machines
Corporation, 1987.
• Hypercube architecture
with 65,536 processors.
• SIMD.
• Performance in the
range of GFLOPS.

CSC 7600 Lecture 1 : Introduction

72
Spring 2011
INTEL Touchstone Delta
• INTEL, 1990.
• MIMD hypercube.
• LINPACK rating of 13.9
GFLOPS .
• Enough computing
power for applications
like real-time
processing of satellite
images and molecular
models for AIDS
research.

CSC 7600 Lecture 1 : Introduction

73
Spring 2011
Beowulf
• Thomas Sterling and
Donald Becker, 1994.
• Cluster formed of one
head node and one/more
compute nodes.
• Nodes and network
dedicated to the Beowulf.
• Compute nodes are
mass produced
commodities.
• Use open source
software including Linux.

CSC 7600 Lecture 1 : Introduction

74
Spring 2011
Earth Simulator

• Japan, 1997.
• Fastest supercomputer
from 2002-2004: 35.86
TFLOPS.
• 640 nodes with eight
vector processors and
16 gigabytes of
computer memory at
each node.

CSC 7600 Lecture 1 : Introduction

75
Spring 2011
BlueGene/L

• IBM, 2004.
• First supercomputer
ever to run over 100
TFLOPS sustained on a
real world application,
namely a three-
dimensional molecular
dynamics code
(ddcMD).

CSC 7600 Lecture 1 : Introduction

76
Spring 2011
CSC 7600 Lecture 1 : Introduction77
Spring 2011

Basics Computer Architecture by Pooyan Jamshidi 1731311297
No ratings yet
Basics Computer Architecture by Pooyan Jamshidi 1731311297
266 pages
High Performance Computing: Course Introduction
No ratings yet
High Performance Computing: Course Introduction
32 pages
CA Chap1 Introduction
No ratings yet
CA Chap1 Introduction
44 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
24 pages
L1 Intro
No ratings yet
L1 Intro
23 pages
Lec01 Intro
No ratings yet
Lec01 Intro
41 pages
CA Introduction
No ratings yet
CA Introduction
46 pages
CS 258 Parallel Computer Architecture: CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley
No ratings yet
CS 258 Parallel Computer Architecture: CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley
44 pages
Parallel Architecture Fundamental
No ratings yet
Parallel Architecture Fundamental
18 pages
Lecture1 - Computer Abstractions and Technology v2
No ratings yet
Lecture1 - Computer Abstractions and Technology v2
58 pages
Lecture 2 CPU Fundamentals
No ratings yet
Lecture 2 CPU Fundamentals
43 pages
CI-0120 Arquitectura de Computadoras Ejemplos FundamentosDiseño
No ratings yet
CI-0120 Arquitectura de Computadoras Ejemplos FundamentosDiseño
52 pages
722 9 5 2011 Review
No ratings yet
722 9 5 2011 Review
101 pages
CH02-HP Computer Abstractions and Technology
No ratings yet
CH02-HP Computer Abstractions and Technology
36 pages
CCS 1202 Lecture 2 - Computer Evolution and Performance
No ratings yet
CCS 1202 Lecture 2 - Computer Evolution and Performance
32 pages
HPC Neal
No ratings yet
HPC Neal
32 pages
L1.0 HPC Overview
No ratings yet
L1.0 HPC Overview
58 pages
Unit-1 - Cloud Computing
No ratings yet
Unit-1 - Cloud Computing
19 pages
Parallel Computing 1 Unit
No ratings yet
Parallel Computing 1 Unit
59 pages
CH02-COA10e Spring 2025
No ratings yet
CH02-COA10e Spring 2025
24 pages
FIT9134 Week11
No ratings yet
FIT9134 Week11
21 pages
CAQA5e ch1
No ratings yet
CAQA5e ch1
42 pages
Comp422 2011 Lecture1 Introduction
No ratings yet
Comp422 2011 Lecture1 Introduction
50 pages
Cloud Computing Unit-1
100% (1)
Cloud Computing Unit-1
88 pages
Aula Ch1
No ratings yet
Aula Ch1
40 pages
Lecture 1 Computer Abstraction and Performance
No ratings yet
Lecture 1 Computer Abstraction and Performance
25 pages
CH02-COA10e Spring 2025
No ratings yet
CH02-COA10e Spring 2025
24 pages
01 - Introduction: 1 Why Parallel Programming Is Important in Research
No ratings yet
01 - Introduction: 1 Why Parallel Programming Is Important in Research
50 pages
Chapter 2
No ratings yet
Chapter 2
15 pages
Ayushagrawal HPC
No ratings yet
Ayushagrawal HPC
17 pages
Parallel Archit 1
No ratings yet
Parallel Archit 1
18 pages
Computer Abstractions and Technology
No ratings yet
Computer Abstractions and Technology
47 pages
1 Introduction
No ratings yet
1 Introduction
48 pages
Computer Architecutre
No ratings yet
Computer Architecutre
77 pages
Lecture1 Introduction To Parallel Computing - 2025
No ratings yet
Lecture1 Introduction To Parallel Computing - 2025
38 pages
2 - Parallel Computer Architecture - 1
No ratings yet
2 - Parallel Computer Architecture - 1
26 pages
CSE 431 Computer Architecture Fall 2005: Mary Jane Irwin
No ratings yet
CSE 431 Computer Architecture Fall 2005: Mary Jane Irwin
23 pages
SMM Cap1
No ratings yet
SMM Cap1
101 pages
HPC Unit 1
No ratings yet
HPC Unit 1
65 pages
HPC Lectures 1 5
No ratings yet
HPC Lectures 1 5
18 pages
CC Unit 1
No ratings yet
CC Unit 1
24 pages
Chapter 1 Measuring Understanding Performance
No ratings yet
Chapter 1 Measuring Understanding Performance
63 pages
Module - 01 CC (BCS601)
No ratings yet
Module - 01 CC (BCS601)
47 pages
Chapter 1 (Parallel Computer Models)
No ratings yet
Chapter 1 (Parallel Computer Models)
20 pages
Parallel Computing
No ratings yet
Parallel Computing
57 pages
Lecture 1
No ratings yet
Lecture 1
37 pages
Lec1 Introduction
No ratings yet
Lec1 Introduction
23 pages
Trends in Computer Architecture
No ratings yet
Trends in Computer Architecture
30 pages
High Performance Computing Lecture 1 HPC Public
No ratings yet
High Performance Computing Lecture 1 HPC Public
50 pages
Flynns
No ratings yet
Flynns
41 pages
Instructor: L. N. Bhuyan
No ratings yet
Instructor: L. N. Bhuyan
32 pages
Son-CA - Lec1 - 1 - Computer Abstraction and Technology
No ratings yet
Son-CA - Lec1 - 1 - Computer Abstraction and Technology
31 pages
CMP2008 L1
No ratings yet
CMP2008 L1
47 pages
Defining Computer Architecture
No ratings yet
Defining Computer Architecture
6 pages
CSE 820 Graduate Computer Architecture: Dr. Enbody
No ratings yet
CSE 820 Graduate Computer Architecture: Dr. Enbody
25 pages
Software Configuration Management Plan
No ratings yet
Software Configuration Management Plan
18 pages
Module 4 B1
No ratings yet
Module 4 B1
9 pages
Functionality Considerations in Custom SCADA Development Tools
No ratings yet
Functionality Considerations in Custom SCADA Development Tools
5 pages
Administrative Stuff : Instructor
No ratings yet
Administrative Stuff : Instructor
8 pages
Advanced OOP and Design Patterns
100% (3)
Advanced OOP and Design Patterns
229 pages
TCP Concurrent Echo Program Using Fork and Thread
No ratings yet
TCP Concurrent Echo Program Using Fork and Thread
5 pages
GPU Programming Slides 1
No ratings yet
GPU Programming Slides 1
33 pages
TMP 3492
No ratings yet
TMP 3492
2 pages
The Source-Free RC Circuit: V (T) Across The
No ratings yet
The Source-Free RC Circuit: V (T) Across The
58 pages
Payroll Management System
100% (1)
Payroll Management System
7 pages
Myrio Do Engineering
No ratings yet
Myrio Do Engineering
31 pages
Fiber Optic IC
No ratings yet
Fiber Optic IC
12 pages
OOPS
No ratings yet
OOPS
11 pages
Combined Cell
No ratings yet
Combined Cell
36 pages
4000 Manual
0% (1)
4000 Manual
162 pages
44 Install Manual PBS250-3
No ratings yet
44 Install Manual PBS250-3
24 pages
SoCT Slides
No ratings yet
SoCT Slides
157 pages
4 TH
No ratings yet
4 TH
13 pages
Arduino-Powered Portable Air Cooler With Temperature Display and Speed Control
No ratings yet
Arduino-Powered Portable Air Cooler With Temperature Display and Speed Control
7 pages
C
100% (3)
C
31 pages
MCZ33399 Freescale Semiconductor
No ratings yet
MCZ33399 Freescale Semiconductor
19 pages
Instructions Manual
No ratings yet
Instructions Manual
22 pages
Software Engineering Detailed Answers
No ratings yet
Software Engineering Detailed Answers
19 pages
CD4085
No ratings yet
CD4085
9 pages
Hold Queue Command
No ratings yet
Hold Queue Command
3 pages
Implementation of Program Scheduling
No ratings yet
Implementation of Program Scheduling
10 pages
Problema 1 "Lavadora" Código VHDL Module
No ratings yet
Problema 1 "Lavadora" Código VHDL Module
40 pages
Chapter 2 - Introduction To Networking
No ratings yet
Chapter 2 - Introduction To Networking
35 pages
En781 Brochure 20190905
No ratings yet
En781 Brochure 20190905
3 pages
Power Measurement On Pulsed Signals With Spectrum Analyzers: Application Note
No ratings yet
Power Measurement On Pulsed Signals With Spectrum Analyzers: Application Note
12 pages
Universal Asynchronous Receiver/Transmitter (Uart) : EXAR Corporation, 48720 Kato Road, Fremont, CA 94538
No ratings yet
Universal Asynchronous Receiver/Transmitter (Uart) : EXAR Corporation, 48720 Kato Road, Fremont, CA 94538
3 pages
Full LB PCC Hotspot
No ratings yet
Full LB PCC Hotspot
2 pages
Technology in Telecommunications Networks
From Everand
Technology in Telecommunications Networks
Tanushri Kaniyar
No ratings yet
Digital Filters Design for Signal and Image Processing
From Everand
Digital Filters Design for Signal and Image Processing
Mohamed Najim
No ratings yet
Stack Computers: The New Wave
From Everand
Stack Computers: The New Wave
Philip Koopman
No ratings yet