0% found this document useful (0 votes)

12 views37 pages

19 JobSchedulers

The document provides details about supercomputers, including the IBM Blue Gene/Q system. It describes the architecture and components of Blue Gene/Q, such as the compute chip, node boards, and 5D torus interconnect. It also discusses job scheduling on supercomputers, including concepts like backfilling, queues, and batch queueing systems.

Uploaded by

sayo3712

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views37 pages

19 JobSchedulers

Uploaded by

sayo3712

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

Supercomputers

Apr 1, 2024
IBM Blue Gene/Q
• November 2011
• 4,096-node BG/Q (Sequoia)
• #17 on top500 at 677.10 TF
• #1 Graph 500 at 254 Gteps (Giga traversed edges/second)
• #1 on Green 500 list at 2.0 Gflops/W
• June 2012
• #1 Sequoia at Lawrence Livermore National Laboratory (#13 in 2019)
• 96K nodes, 16.3 PF Max, 20 PF Peak, 7.8 MW
• #3 Mira at Argonne National Laboratory (#24 in 2019)
• 48K nodes, 8.1 PF Max, 10 PF Peak, 3.9 MW
• Decommissioned in Dec 2019
2
Real Applications on Sequoia

Cosmology code HACC 14 PFLOPS Heart simulation code Cardioid 12 PFLOPS

3
Wikipedia

BG/Q Compute Chip

• 18.96 x 18.96 mm chip (45 nm, 1 billion

transistors)
• 16 active cores, memory, cache, NoC
• PowerPC A2 Processor Core
• 1.6 GHz
• 64-bit Power ISA
• In order execution
• 4-way SMT
• 2-way concurrent instruction issue
• Quad FPU
The IBM Blue Gene/Q Compute Chip, IEEE MICRO, 2012 4
Machine Architecture (lstopo)

5
BG/Q Compute Node Board (32 nodes)

6
BG/Q Hierarchy
1 Rack (1024 nodes)->
2 Midplanes (512 nodes)->
16 Node boards (32 nodes)

7
Interconnects in BG

• BG/P has a 3D torus with 425 MB/s per link

• BG/Q has a 5D torus with 2 GB/s per link

Why 5D torus?
- Lower diameter, higher bisection width, lower latency than 3D torus
- High nearest neighbour bandwidth

8
BG/Q Messaging Unit and Network Logic
• A, B, C, D, E dimensions (5D torus)
• Last dimension E is of size 2 (reduces wiring)
• Link chips on each node board connect via optics to node boards on other midplanes
• Dimension-order routing

• On-chip per hop latency: 40 ns (20 network cycles)

• 16x16x16x12x2 P2P latency is about 2.6 μs
• 0.6 μs at 1 hop, 1.17 μs at 13 hops

• Injection and reception FIFOs (More than half latency incurred here)
• Packets arriving on A- receiver are always placed on A- reception FIFO

9
BG/Q Network Device

Messaging Unit (MU)

10
References for BG/Q
• The IBM Blue Gene/Q Compute Chip, IEEE MICRO, 2012.
• The IBM Blue Gene/Q Interconnection Fabric, IEEE MICRO, 2012.
• The IBM Blue Gene/Q Interconnection Network and Message Unit, SC 2011.
• Looking Under the Hood of the IBM Blue Gene/Q Network, SC 2012.
• IBM System Blue Gene Solution: Blue Gene/Q Application Development, IBM
Redbooks, 2013.

11
Supercomputer Job Allocation

12
status.alcf.anl.gov -> Theta (retired) 13
Resources Required
• Number of nodes
• Wall-clock time
• Users are charged for node-hours

Should there be any constraints on the above requirements?

14
User Jobs
• Different types of applications
• Interactive vs. batch jobs
• Debug in interactive mode
• Exclusive vs. shared access
• Charged based on total resource usage
• Job is killed when requested wall-clock time is over
• Need to plan resource usage apriori

15
David Lifka, The ANL/IBM SP Scheduling
System, JSSPP 1995

16
ANL IBM SP System Observations (Typical
User Requirement)

Users were asked to use

their scheduler and
provide feedback
17
Desirable Features of Scheduler
• Fair
• Simple
• Low average queue wait times
• High system utilization
• Provide optimum performance for all kinds of jobs
• Support different job classes (interactive vs. batch)
• Provide priority for special jobs

18
FCFS with Backfilling
• FCFS scheduling
• Poor system utilization
• Backfilling – to overcome inefficiency of FCFS
• Scan the queue of jobs for a job that does not cause the first
queued job to wait for any longer than they otherwise would
• Improve system utilization
• Lower queue waiting times

19
Backfilling – 128-node Example
128 96

20
32 8
Scheduler Queues
• Jobs are submitted to a queue
• Different queuing policies (decided by the administrator)
• Multiple queues in some systems
• Based on the usage
• Queue waiting time different
• Static vs. dynamic partitioning

21
Anomaly

# Jobs executing per day on HPC2010 22

An Example Scheduling Policy (144 nodes)

Henderson, “Job Scheduling Under the Portable Batch System”, JSSPP 1995. 23
An Example Scheduler Script

Henderson, “Job Scheduling Under the Portable Batch System”, JSSPP 1995. 24
What is missing?

25
Network Utilization in Different Applications

Petrini and Feng, Time-Sharing Parallel Jobs in the Presence of Multiple Resource Requirements, JSSPP 2000 26
Network Utilization in FFT

27
Batch Queueing Systems
• Schedules jobs based on queues
• Has full knowledge of queued, running jobs
• Has full knowledge of the resource usage
• Often combination of best fit, fair share, priority-based
• Designed to be generic, can be customized
• Suited to meet demands of the scheduling goals of the centre
• Typically FIFO/FCFS with backfilling

28
Workload managers/Schedulers
• Portable Batch System (PBS)
• LoadLeveler
• Application Level Placement Scheduler (ALPS)
• Moab/Torque
• Simple Linux Utility for Resource Management (SLURM)

29
Example Batch Scheduler
• Network Queueing System developed at NASA
• Supported multiple queues of several types
• Disable/enable each queue
• Tune the #jobs running in each queue

Henderson, “Job Scheduling Under the Portable Batch System”, JSSPP 1995. 30
Portable Batch Scheduler
• Genesis of PBS in NASA (from NQS)
• Client commands for submission, modification, and monitoring jobs
• Daemons running on service nodes, compute nodes, and servers

31
PBS daemons
• Server (pbs_server) SERVICE NODE
• Handles PBS commands
pbs_server
• Creates batch jobs
• Sends jobs for execution pbs_sched

• Scheduler (pbs_sched) pbs_mom

• Schedules jobs according to system policy

• MOM (pbs_mom) COMPUTE NODE
• Manage job execution on hosts
• Resource usage monitor
pbs_mom
• Record diagnostic messages
• Notify server about job completion
• Clean up after job completion 32
PBS daemons
SERVICE NODE
• Server contacts scheduler
pbs_server
• Job is queued
• Job terminates pbs_sched
• Scheduler contacts the resource pbs_mom
monitor (MOM)
• Queries resource usages
COMPUTE NODE
• Records diagnostic messages

pbs_mom

33
Application Level Placement Scheduler (Cray)
LOGIN NODE SERVICE NODE

aprun apsys apsched

client daemon daemon

COMPUTE NODE

apinit apinit
daemon daemon

34
SLURM • Monitors states of nodes
• Accepts job requests
• Maintains queue of requests
• Schedules jobs
• Initiates job execution and cleanup
• Polls slurmd periodically
• Maintains complete state information

qstat (PBS)

qdel (PBS)

qsub (PBS)

• Responds to controller requests

• Maintains job state
• Initiate, manage, cleanup processes
Slurm architecture [Jette et al.] • I/O handling
35
Scheduler Commands (Example)

36
#!/bin/bash
HPC2010 #PBS -N test
#PBS -q small
#PBS -l nodes=2:ppn=8
#PBS -l walltime=00:05:00

cd $PBS_O_WORKDIR

• qsub –I –X source /opt/software/intel/initpaths intel64

export I_MPI_FABRICS=shm:dapl
• mpiicc –o sample sample.c
mpirun -np 8 ./sample
• qsub sub.sh
• qstat
• http://172.31.30.3/new/code/index.html

BCA Program Guide 2011
50% (2)
BCA Program Guide 2011
104 pages
Types of Benches in Greenhouse
No ratings yet
Types of Benches in Greenhouse
11 pages
Slide 02
No ratings yet
Slide 02
101 pages
Sri Venkateswara University: Tirupati: Department of Computer Science
No ratings yet
Sri Venkateswara University: Tirupati: Department of Computer Science
24 pages
Thomas Douglas Hacker Culture
100% (1)
Thomas Douglas Hacker Culture
296 pages
Lect 3
No ratings yet
Lect 3
36 pages
Module 2 OS BCS303
No ratings yet
Module 2 OS BCS303
81 pages
04 Scheduling
No ratings yet
04 Scheduling
21 pages
IBM BlueGene Super Computer, A Step To A New World
100% (2)
IBM BlueGene Super Computer, A Step To A New World
3 pages
Chapter 02 Scheduling
No ratings yet
Chapter 02 Scheduling
23 pages
Cpu Scheduling: Dr.P.Suresh
No ratings yet
Cpu Scheduling: Dr.P.Suresh
29 pages
RAHAT AI AGENT - Docx - 20250215 - 173118 - 0000
No ratings yet
RAHAT AI AGENT - Docx - 20250215 - 173118 - 0000
57 pages
Free BSD Operating Sytem
No ratings yet
Free BSD Operating Sytem
23 pages
Intro To Linux and HPC
No ratings yet
Intro To Linux and HPC
67 pages
CPU Scheduling
No ratings yet
CPU Scheduling
78 pages
IBM Blue Gene
No ratings yet
IBM Blue Gene
25 pages
Chapter 2 Part III CPU Scheduling
No ratings yet
Chapter 2 Part III CPU Scheduling
33 pages
Introductory Supercomputing PDF
No ratings yet
Introductory Supercomputing PDF
94 pages
OS C2B Scheduling
No ratings yet
OS C2B Scheduling
27 pages
Chapter 5-CPU Scheduling
No ratings yet
Chapter 5-CPU Scheduling
26 pages
Cloth Store Management System
No ratings yet
Cloth Store Management System
25 pages
AP2152 - IT Support (How To Access VDI Environment)
No ratings yet
AP2152 - IT Support (How To Access VDI Environment)
15 pages
Os 05 Scheduling
No ratings yet
Os 05 Scheduling
25 pages
HPC Introduction Lecture 2
No ratings yet
HPC Introduction Lecture 2
55 pages
Embracing The Quantum Economy-Pathway For Business Leaders
No ratings yet
Embracing The Quantum Economy-Pathway For Business Leaders
70 pages
Ansys 2023 R1 - Job Schedulers and Queuing Systems Support
No ratings yet
Ansys 2023 R1 - Job Schedulers and Queuing Systems Support
1 page
CSE Prospectus
No ratings yet
CSE Prospectus
8 pages
GDB
No ratings yet
GDB
4 pages
Scheduling
No ratings yet
Scheduling
31 pages
Distributed & Parallel Computing Cluster: Patrick Mcguigan
No ratings yet
Distributed & Parallel Computing Cluster: Patrick Mcguigan
42 pages
Cpu Scheduling
No ratings yet
Cpu Scheduling
48 pages
4 Arrays in 'C'
100% (10)
4 Arrays in 'C'
19 pages
IA 3116 U2i P - DS
No ratings yet
IA 3116 U2i P - DS
1 page
Computer Training
No ratings yet
Computer Training
19 pages
The YouTube Video Recommendation System
No ratings yet
The YouTube Video Recommendation System
5 pages
Bluegenu Casestudy
No ratings yet
Bluegenu Casestudy
41 pages
A Practical Guide To Building High-Performance Computing Clusters
No ratings yet
A Practical Guide To Building High-Performance Computing Clusters
69 pages
Module 3 - CPU Scheduling
No ratings yet
Module 3 - CPU Scheduling
42 pages
Quad Core
No ratings yet
Quad Core
31 pages
05 Scheduling
No ratings yet
05 Scheduling
46 pages
SVMCM Manual Non Net Applicant
No ratings yet
SVMCM Manual Non Net Applicant
12 pages
Linux Cpu Scheduler
100% (1)
Linux Cpu Scheduler
38 pages
Scheduling in Cloud
No ratings yet
Scheduling in Cloud
10 pages
SciNet Tutorial
No ratings yet
SciNet Tutorial
22 pages
LSF Doc1
No ratings yet
LSF Doc1
49 pages
IBM Blue Gene Supercomputer
No ratings yet
IBM Blue Gene Supercomputer
21 pages
ImageJ User Guide
100% (1)
ImageJ User Guide
199 pages
IcarosDesktop Manual
No ratings yet
IcarosDesktop Manual
56 pages
How To Make Computers Work For You When You Are Enjoying Life
No ratings yet
How To Make Computers Work For You When You Are Enjoying Life
29 pages
Challenges of Large-Scale Augmented Reality On Smartphones: Clemens Arth Dieter Schmalstieg
No ratings yet
Challenges of Large-Scale Augmented Reality On Smartphones: Clemens Arth Dieter Schmalstieg
4 pages
04 - Computer Clusters
No ratings yet
04 - Computer Clusters
66 pages
Linux Clusters Institute: Scheduling
No ratings yet
Linux Clusters Institute: Scheduling
93 pages
Civil Blue Gene
No ratings yet
Civil Blue Gene
19 pages
SciNet Tutorial
No ratings yet
SciNet Tutorial
22 pages
Martin Tsenkov ELFE 221210014 77B
No ratings yet
Martin Tsenkov ELFE 221210014 77B
17 pages
Blue Gene: Seminar On
No ratings yet
Blue Gene: Seminar On
19 pages
Maxima Documentation
No ratings yet
Maxima Documentation
3 pages
03 SGE Training
No ratings yet
03 SGE Training
39 pages
Blue Gene 2
No ratings yet
Blue Gene 2
28 pages
Presented By: M Chandra Sekhar Reddy 09Q61A0525
No ratings yet
Presented By: M Chandra Sekhar Reddy 09Q61A0525
16 pages
Lift Book
No ratings yet
Lift Book
277 pages
Job Scheduling in High Perfomance Computing
No ratings yet
Job Scheduling in High Perfomance Computing
6 pages
CPU Scheduling
No ratings yet
CPU Scheduling
16 pages
Operating System
No ratings yet
Operating System
66 pages
A Super Computer Technology: By:-Prabhjot Dua CS - Final Year
No ratings yet
A Super Computer Technology: By:-Prabhjot Dua CS - Final Year
27 pages
Microsoft Word 2013 Lesson 3 PDF
No ratings yet
Microsoft Word 2013 Lesson 3 PDF
25 pages
BG External Presentation January 2002
No ratings yet
BG External Presentation January 2002
21 pages
2013luv Supercomputers
No ratings yet
2013luv Supercomputers
12 pages
Scheduling: Ren-Song Ko National Chung Cheng University
No ratings yet
Scheduling: Ren-Song Ko National Chung Cheng University
57 pages
Abloy Protec
No ratings yet
Abloy Protec
8 pages
COS 318: Operating Systems CPU Scheduling: Andy Bavier Computer Science Department Princeton University
No ratings yet
COS 318: Operating Systems CPU Scheduling: Andy Bavier Computer Science Department Princeton University
27 pages
Blue Gene: Shubham Mishra Roll No-36 (B-2)
No ratings yet
Blue Gene: Shubham Mishra Roll No-36 (B-2)
13 pages
Slurm Talk
No ratings yet
Slurm Talk
40 pages
Aird
No ratings yet
Aird
3 pages
Scheduling Algorithms
No ratings yet
Scheduling Algorithms
75 pages
Release RTWlanU
No ratings yet
Release RTWlanU
10 pages
Install & Running An EMC VNX VSA v2.0
No ratings yet
Install & Running An EMC VNX VSA v2.0
42 pages
Qadiyani Shubhat Ke Jawabat - 1 by SHEIKH ALLAH WASAYA
No ratings yet
Qadiyani Shubhat Ke Jawabat - 1 by SHEIKH ALLAH WASAYA
208 pages
Certified Tester Foundation Level Syllabus: Version 2018 Version
No ratings yet
Certified Tester Foundation Level Syllabus: Version 2018 Version
94 pages
CPU Scheduling Algorithms (Presentation)
100% (2)
CPU Scheduling Algorithms (Presentation)
14 pages
Scheduling & Data Communication and Networking.: Platform Technologies
No ratings yet
Scheduling & Data Communication and Networking.: Platform Technologies
28 pages
Operating System: Chapter 6: CPU Scheduling
No ratings yet
Operating System: Chapter 6: CPU Scheduling
62 pages
Random and Raster Scan Displays
No ratings yet
Random and Raster Scan Displays
20 pages
883 Question Paper
No ratings yet
883 Question Paper
2 pages
Xcerts Certifications
No ratings yet
Xcerts Certifications
4 pages
Computer Networks - Lecture Notes, Study Material and Important Questions, Answers
No ratings yet
Computer Networks - Lecture Notes, Study Material and Important Questions, Answers
8 pages
Oracle: Question & Answers
No ratings yet
Oracle: Question & Answers
7 pages
Zig Programming: From Zero to Systems Master
From Everand
Zig Programming: From Zero to Systems Master
Niklas Hoffmann
No ratings yet
Q Tips: Fast, Scalable, and Maintainable Kdb+
From Everand
Q Tips: Fast, Scalable, and Maintainable Kdb+
Nick Psaris
No ratings yet
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
From Everand
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
MARIO FRANCO
No ratings yet

19 JobSchedulers

Uploaded by

19 JobSchedulers

Uploaded by

Supercomputers

Cosmology code HACC 14 PFLOPS Heart simulation code Cardioid 12 PFLOPS

BG/Q Compute Chip

• 18.96 x 18.96 mm chip (45 nm, 1 billion

• BG/P has a 3D torus with 425 MB/s per link

• On-chip per hop latency: 40 ns (20 network cycles)

Messaging Unit (MU)

Should there be any constraints on the above requirements?

Users were asked to use

# Jobs executing per day on HPC2010 22

• Scheduler (pbs_sched) pbs_mom

• Schedules jobs according to system policy

aprun apsys apsched

• Responds to controller requests

• qsub –I –X source /opt/software/intel/initpaths intel64

You might also like