0% found this document useful (0 votes)

17 views30 pages

Lecture 16

Uploaded by

sikandar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views30 pages

Lecture 16

Uploaded by

sikandar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

Lecture 16: Threads

William Gropp
www.cs.illinois.edu/~wgropp
Add to the (Model)
Architecture
• What do you do with a billion
transistors?
♦ For a long time, try to make an individual
processor (what we now call a core) faster
♦ Increasingly complicated hardware yielded
less and less benefit (speculation, out of
order execution, prefetch, …)
• An alternative is to simply put multiple
processing elements (cores) on the
same chip
• Thus the “multicore processor” or
“multicore chip” 2
Adding Processing Elements

• Here’s our model

Core so far, with the
vector and
L1 pipelining part of
Cache the “core”
♦ Most systems
L2 today have an L3
Cache cache as well)
• We can (try to)
replicate
Memory
everything…
3
Adding Processing Elements

• Something
Core Core Core Core like this
would be
simple
L1 L1 L1 L1
• But in
Cache Cache Cache Cache
practice,
some
L2 L2 L2 L2 resources
Cache Cache Cache Cache are shared,
giving us…

Memory Memory Memory Memory

4
Adding Processing Elements

Core Core Core Core

L1 L1 L1 L1
Cache Cache Cache Cache

L2 Cache L2 Cache

Memory

5
Notes on Multicore

• Some resources are shared

♦ Typically the larger (slower) caches, path to
memory
♦ May share functional units within the core
(variously called simultaneous
multithreading (SMT) or hyperthreading)
♦ Rarely enough bandwidth for shared
resources (cache, memory) to supply all
cores at the same time.
• Variations trade complexity of core with
number of cores
♦ Manycore vs. Multicore
6
Programming Models For
Multicore processors
• Parallelism within a process
♦ Compiler-managed parallelism
• Transparent to programmer
• Rarely successful
• Threads
♦ Within a process, all memory shared
♦ Each “thread” executes “normal” code
♦ Many subtle issues (more later)
• Parallelism between processes
within a node covered later
7
What are Threads?

• Executing program (process) is

defined by
♦ Address space
♦ Program Counter
• Threads are multiple program
counters

8
Inside a Thread’s Memory

9
Kinds of Threads

• Almost a process
♦ Kernel (Operating System) schedules
♦ Each thread can make independent
system calls
• Co-routines and lightweight
processes
♦ User schedules (sort of…)
• Memory references
♦ Hardware schedules

10
Kernel Threads

• System calls (e.g., read, accept)

block calling thread but not
process
• Alternative to “nonblocking” or
“asynchronous” I/O:
♦ create_thread
thread calls blocking read
• Can be expensive (many cycles to
start, switch between threads)
11
User Threads

• System calls (may) block all threads in

process
• Allows multiple cores to cooperate on
data operations
♦ loop: create # threads = # cores - 1
each thread does part of loop
• Cheaper than kernel threads
♦ Still must save registers (if in same core)
♦ Parallelism requires OS to schedule threads
on different cores

12
Hardware Threads

• Hardware controls threads

• Allows single core to interleave memory
references and operations
♦ Unsatisfied memory reference changes thread
♦ Separate registers for each thread
• Single cycle thread switch with appropriate
hardware
♦ Basis of Tera MTA computer http://www.tera.com
Now YarcData Urika
♦ Like kernel threads, replaces nonblocking hardware
operations - multiple pending loads
♦ Even lighter weight—just change program counter
(PC)

13
Simultaneous Multithreading
(SMT)
• Share the functional units in a single
core
♦ Remember the pipelining example – not all
functional units (integer, floating point,
load/store) are busy each cycle
♦ SMT idea is to have two threads sharing a
single set of functional units
♦ May be able to keep more of the hardware
busy (thus improving throughput)
• Each SMT thread takes more time that
it would if it was the only thread
• Almost entirely managed
14
by hardware
Why Use Threads?

• Manage multiple points of

interaction

Latency Hiding
♦ Low overhead steering/probing
♦ Background checkpoint save
• Alternate method for
nonblocking operations
♦ CORBA method invocation (no
funky nonblocking calls)
• Hiding memory latency
• Fine-grain parallelism
♦ Compiler parallelism

15
Common Thread
Programming Models
• Library-based (invoke a routine in a
separate thread)
♦ pthreads (POSIX threads)
♦ See “Threads cannot be implemented as a
library,” H. Boehm
http://www.hpl.hp.com/techreports/2004/
HPL-2004-209.pdf
• Separate enhancements to existing
languages
♦ OpenMP, OpenACC, OpenCL, CUDA, …
• Within the language itself
♦ Java, C11, others 16
Thread Issues

• Synchronization
♦ Avoiding conflicting operations
(memory references) between
threads
• Variable Name Space
♦ Interaction between threads and the
language
• Scheduling
♦ Will the OS do what you want?

17
Synchronization of Access

• Read/write model
a = 1; b = 1;
barrier(); barrier();
b = 2; while (a==1) ;
a = 2; printf( “%d\n”, b );
What does thread 2 print?

• Take a few minutes and think about the

possibilities

18
Synchronization of Access

• Read/write model
a = 1; b = 1;
barrier(); barrier();
b = 2; while (a==1) ;
a = 2; printf( “%d\n”, b );
What does thread 2 print?
• Many possibilities:
♦ 2 (what the programmer expected)
♦ 1 (thread 1 reorders stores so a=2
executed before b=2 (valid in language)
♦ Nothing: a never changes in thread 2
♦ Some other value from thread 1 (value of b
before this code starts)
19
How Can We Fix This?

• Need to impose an order on the

memory updates
♦ OpenMP has FLUSH (more than required)
♦ Memory barriers (more on this later)
• Need to ensure that data updated by
another thread is reloaded
♦ Copies of memory in cache may update
eventually
♦ In this example, a may be (is likely to be) in
register, never updated
♦ volatile in C, Fortran indicate value might be
changed outside of program
20
Synchronization of Access

• Often need to ensure that updates

happen atomically (all or nothing)
♦ Critical sections, lock/unlock, and
similar methods
• Java has “synchronized” methods
(procedures)
• C11 provides atomic memory
operations

21
Variable Names
• Each thread can access all of a processes
memory (except for the thread’s stack*)
♦ Named variables refer to the address space—thus
visible to all threads
♦ Compiler doesn’t distinguish A in one thread from A
in another
♦ No modularity
♦ Like using Fortran blank COMMON for all variables
• “Thread private” extensions are becoming
common
♦ “Thread local storage” (tls) is becoming common as
an attribute
♦ NEC has a variant where all variables names refer to
different variables unless specified
• All variables are on thread stack by default (even
globals)
• More modular
22
Scheduling Threads

• If threads used for latency hiding

♦ Schedule on the same core
• Provides better data locality, cache usage
• If threads used for parallel
execution
♦ Schedule on different cores using
different memory pathways
♦ Appropriate for data parallelism
♦ Appropriate for certain types of task
parallelism
23
The Changing Computing
Model
• More interaction
♦ Threads allow low-overhead agents on any
computation
• OS schedules if necessary; no overhead if nothing
happens (almost…)
♦ Changes the interaction model from batch (give
commands, wait for results) to constant interaction
• Fine-grain parallelism
♦ Simpler programming model
• Lowering the Memory Wall
♦ CPU speeds increasing much faster than memory
♦ Hardware threads can hide memory latency

24
Node Execution Models

• Where do threads run on a node?

♦ Typical user expectation: User’s
applications uses all cores and has
complete access to them
• Reality is complex. Common cases
include:
♦ OS pre-empts core 0; Or cores 0,2
♦ OS pre-empts user threads,
distributes across cores
♦ Hidden core (BG/Q)
25
Blue Gene/Q Processor

• 1 spare core for

yield
• 1 core reserved
for system (OS,
services)

26
Performance Models

• Easiest: Everything independent

♦ Usually appropriate for L1 cache
♦ L2 may be shared, L3 almost
certainly shared
♦ Two limits on performance: Maximum
performance per thread and
maximum overall (aggregate).

27
Performance Models:
Memory
• Assume the time to move a unit of
memory is tm
♦ Due to latency in hardware; clock rate of
data paths
♦ Rate is 1/tm = rm
• Also assume that there is a maximum
rate rmax
♦ E.g., width of data path * clock rate
• Then the rate at which k threads can
move data is
♦ min(k/tm,rmax) = min(krm,rmax)
28
Limits on Thread
Performance
• Threads share
memory resources
• Performance is
roughly linear with
additional threads
until the maximum Rmax

bandwidth is reached
• At that point each
thread receives a
decreasing fraction Number of threads

of available
bandwidth

29
Questions

• How do you expect a

multithreaded STREAM to perform
as you add threads? Sketch a
graph.
• What’s the difference between a
software thread and a hardware
thread?
• What happens if there are more
threads that cores? Can programs
run faster in that case?
30

3 - Section-III Consolidated Scope of Work (SOW) - 2
No ratings yet
3 - Section-III Consolidated Scope of Work (SOW) - 2
46 pages
Digital Marketing Data
No ratings yet
Digital Marketing Data
78 pages
Caterpillar 777D SN 2255 Hand Out PDF
No ratings yet
Caterpillar 777D SN 2255 Hand Out PDF
50 pages
Chapter 4. Threads & Concurrency
No ratings yet
Chapter 4. Threads & Concurrency
64 pages
Tle Ict Css 9 q2 Module 3 Pitd
No ratings yet
Tle Ict Css 9 q2 Module 3 Pitd
25 pages
4 Threads and Concurrency
No ratings yet
4 Threads and Concurrency
62 pages
Lecture 3 Five Step Process
100% (1)
Lecture 3 Five Step Process
13 pages
Pthreads Programming
No ratings yet
Pthreads Programming
54 pages
3-Java-Variables & Data Types
No ratings yet
3-Java-Variables & Data Types
29 pages
Threads: Multicore Programming Multithreading Models Thread Libraries Threading Issues Operating System Examples
No ratings yet
Threads: Multicore Programming Multithreading Models Thread Libraries Threading Issues Operating System Examples
22 pages
(并行课件w3) 第2讲 1&2
No ratings yet
(并行课件w3) 第2讲 1&2
143 pages
Distributed Systems
No ratings yet
Distributed Systems
238 pages
Threads: Tevfik Koşar
100% (1)
Threads: Tevfik Koşar
40 pages
Oops
No ratings yet
Oops
8 pages
OS-PROCESS MANAGEMENT Module - 2.2
No ratings yet
OS-PROCESS MANAGEMENT Module - 2.2
89 pages
Os2 &3module
No ratings yet
Os2 &3module
69 pages
Chapter - 4 Threads (Full)
No ratings yet
Chapter - 4 Threads (Full)
72 pages
Datasheet EPP Panda Security
No ratings yet
Datasheet EPP Panda Security
2 pages
NV Operating Systems UNIT II
No ratings yet
NV Operating Systems UNIT II
91 pages
4.OS Threads Dr. Punit
No ratings yet
4.OS Threads Dr. Punit
48 pages
Parallel Programming Unit 2
No ratings yet
Parallel Programming Unit 2
71 pages
Lecture 03
No ratings yet
Lecture 03
49 pages
Chapter 1 To 5.18
No ratings yet
Chapter 1 To 5.18
99 pages
(OS) - Unit-2.2-2.5 Process Management
No ratings yet
(OS) - Unit-2.2-2.5 Process Management
72 pages
4 Threads
No ratings yet
4 Threads
61 pages
Lec04 SOFE3950 Threads
No ratings yet
Lec04 SOFE3950 Threads
53 pages
Lecture06 Sharedmem jwd15
No ratings yet
Lecture06 Sharedmem jwd15
60 pages
Module 01 Data & Signal
No ratings yet
Module 01 Data & Signal
55 pages
05 Thread
No ratings yet
05 Thread
33 pages
OS Module2 Unit2
No ratings yet
OS Module2 Unit2
43 pages
Lecture 4
No ratings yet
Lecture 4
38 pages
Lecture 25
No ratings yet
Lecture 25
41 pages
Csi 3131 Mod 3 Threads
No ratings yet
Csi 3131 Mod 3 Threads
54 pages
Ai Os CH4
No ratings yet
Ai Os CH4
26 pages
OS Module 1 Slides-2
No ratings yet
OS Module 1 Slides-2
47 pages
ECU Remapping Solution
No ratings yet
ECU Remapping Solution
19 pages
4 Threads
No ratings yet
4 Threads
41 pages
ACA Lecture 28 Multiprocessors
No ratings yet
ACA Lecture 28 Multiprocessors
20 pages
Ds Ch3, Process
No ratings yet
Ds Ch3, Process
27 pages
Chapter-5 Threads and Concurrancy
No ratings yet
Chapter-5 Threads and Concurrancy
47 pages
Threads
No ratings yet
Threads
16 pages
Operating Systems-3-Threads 3
No ratings yet
Operating Systems-3-Threads 3
17 pages
Chapter 04
No ratings yet
Chapter 04
37 pages
Lec6 - TLP Data Dependence Solutions
No ratings yet
Lec6 - TLP Data Dependence Solutions
20 pages
(PolyPhy) UC CROSS GSoC Contributor Proposal
No ratings yet
(PolyPhy) UC CROSS GSoC Contributor Proposal
12 pages
Threads OS
No ratings yet
Threads OS
21 pages
Lecture 04 ThreadAndMultithreading
No ratings yet
Lecture 04 ThreadAndMultithreading
35 pages
2.2 DD2356 Threads
No ratings yet
2.2 DD2356 Threads
22 pages
Processes: Hongfei Yan School of EECS, Peking University 3/16/2011
No ratings yet
Processes: Hongfei Yan School of EECS, Peking University 3/16/2011
69 pages
Chapter 3
No ratings yet
Chapter 3
14 pages
Threads
No ratings yet
Threads
38 pages
TR PY TX100 S3 CentOS PDF
No ratings yet
TR PY TX100 S3 CentOS PDF
17 pages
CH 4
No ratings yet
CH 4
21 pages
Concurrency: CS2403 Programming Languages
No ratings yet
Concurrency: CS2403 Programming Languages
44 pages
Threads & Concurrency: Lecture 23 - CS2110 - Fall 2018
No ratings yet
Threads & Concurrency: Lecture 23 - CS2110 - Fall 2018
34 pages
Chapter 4
No ratings yet
Chapter 4
18 pages
GSM BSS Optimization: Elnaz Yaghoubi
No ratings yet
GSM BSS Optimization: Elnaz Yaghoubi
73 pages
Threads
No ratings yet
Threads
8 pages
Threads
No ratings yet
Threads
11 pages
Week 4 - Threads
No ratings yet
Week 4 - Threads
37 pages
Threads
No ratings yet
Threads
23 pages
SL4000
No ratings yet
SL4000
17 pages
Threads in Operating System
No ratings yet
Threads in Operating System
103 pages
CSCI 350 Ch. 4 - Threads and Concurrency: Mark Redekopp Michael Shindler & Ramesh Govindan
No ratings yet
CSCI 350 Ch. 4 - Threads and Concurrency: Mark Redekopp Michael Shindler & Ramesh Govindan
41 pages
Lec17 Threads Introduction
No ratings yet
Lec17 Threads Introduction
20 pages
BBA 8th Semester-MoT Syllabus
No ratings yet
BBA 8th Semester-MoT Syllabus
3 pages
Ritam Singha - Updated - Resume
No ratings yet
Ritam Singha - Updated - Resume
3 pages
Processes: Hongfei Yan School of EECS, Peking University 3/15/2010
No ratings yet
Processes: Hongfei Yan School of EECS, Peking University 3/15/2010
69 pages
Update Ms-Isac-Cybersecurity-Resources Guide 51822
No ratings yet
Update Ms-Isac-Cybersecurity-Resources Guide 51822
14 pages
AI Principles and Techniques
No ratings yet
AI Principles and Techniques
8 pages
RK400-09 Miniature Tipping Bucket Rainfall Sensor: Features
No ratings yet
RK400-09 Miniature Tipping Bucket Rainfall Sensor: Features
2 pages
Exams Questions On Computing and Rme
No ratings yet
Exams Questions On Computing and Rme
10 pages
System Programming - II Threads
No ratings yet
System Programming - II Threads
46 pages
Lec 4 Superscalarprocessor PDF
No ratings yet
Lec 4 Superscalarprocessor PDF
23 pages
Operating Systems: Threads
No ratings yet
Operating Systems: Threads
32 pages
DS160HKAI-VX1 Datasheet
No ratings yet
DS160HKAI-VX1 Datasheet
3 pages
Java Concurrency
No ratings yet
Java Concurrency
70 pages
Web Design Essentials For Non Designers
No ratings yet
Web Design Essentials For Non Designers
17 pages
Operating System 4
No ratings yet
Operating System 4
33 pages
Hardware MP Project
No ratings yet
Hardware MP Project
7 pages
Enclosed Switches For Residential and Commercial Installations
No ratings yet
Enclosed Switches For Residential and Commercial Installations
2 pages
Processes and Threads: - An Operating System Executes A Variety of Programs
No ratings yet
Processes and Threads: - An Operating System Executes A Variety of Programs
24 pages
Questions Answered in This Lecture:: - Why Are Threads Useful? - How Does One Use POSIX Pthreads?
No ratings yet
Questions Answered in This Lecture:: - Why Are Threads Useful? - How Does One Use POSIX Pthreads?
6 pages
Solution Consultant and Project Manager Requirements For Authorizations & Recognized Expertise: Eligible Certifications
No ratings yet
Solution Consultant and Project Manager Requirements For Authorizations & Recognized Expertise: Eligible Certifications
19 pages
Ground Floor Lighting Layout 2Nd Floor Lighting Layout: A C D E F G B 1 A C D E F G B 1
No ratings yet
Ground Floor Lighting Layout 2Nd Floor Lighting Layout: A C D E F G B 1 A C D E F G B 1
1 page
SEAT Arona 2019 Brochure
No ratings yet
SEAT Arona 2019 Brochure
4 pages
34 Chap - Module 5 - Direct Marketing
No ratings yet
34 Chap - Module 5 - Direct Marketing
7 pages
Powerflarm 1090 Receiver Module: Mode-S and Ads-B Interoperability For Maximum Safety
No ratings yet
Powerflarm 1090 Receiver Module: Mode-S and Ads-B Interoperability For Maximum Safety
2 pages
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
From Everand
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
Jonathan Rigdon
No ratings yet
Nintendo 64 Architecture: Architecture of Consoles: A Practical Analysis, #8
From Everand
Nintendo 64 Architecture: Architecture of Consoles: A Practical Analysis, #8
Rodrigo Copetti
No ratings yet

Lecture 16

Uploaded by

Lecture 16

Uploaded by

Lecture 16: Threads

• Here’s our model

Memory Memory Memory Memory

Core Core Core Core

• Some resources are shared

• Executing program (process) is

• System calls (e.g., read, accept)

• System calls (may) block all threads in

• Hardware controls threads

• Manage multiple points of

• Take a few minutes and think about the

• Need to impose an order on the

• Often need to ensure that updates

• If threads used for latency hiding

• Where do threads run on a node?

• 1 spare core for

• Easiest: Everything independent

• How do you expect a

You might also like