Lecture25
Lecture25
This document is copyright (C) Stanford Computer Science and Nick Troccoli, licensed under
Creative Commons Attribution 2.5 License. All rights reserved.
Based on slides and notes created by John Ousterhout, Jerry Cain, Chris Gregg, and others.
NOTICE RE UPLOADING TO WEBSITES: This content is protected and may not be shared, 1
uploaded, or distributed. (without expressed written permission)
Key question: How do hardware
advances impact the design of operating
systems?
2
CS111 Topic 4: Virtual Memory
Modern Technologies and OSes - How do hardware advances impact the design
of operating systems?
3
Learning Goals
• Learn about multicore CPUs and how they change scheduling and lock
implementations
• Understand the benefits and drawbacks of flash storage and how flash storage
can impact filesystem design
4
Plan For Today
• Example 1: Multicore CPUs
• Example 2: Flash Storage
5
Plan For Today
• Example 1: Multicore CPUs
• Multicore scheduling
• Multicore locks
• Example 2: Flash Storage
6
Multicore CPUs
• True multitasking: multiple cores let us run multiple threads simultaneously
• Starting mid-2000s, multicore processors more common in consumer devices
• OS manages these cores; new challenges!
7
Multicore CPUs Picture of a snapdragon processor
9
Multicore Challenges
OS management of multiple cores surfaces new challenges:
• Example: how does scheduling work with multiple CPUs?
• Example: how can we implement mutexes where there are multiple CPUs?
10
Plan For Today
• Example 1: Multicore CPUs
• Multicore scheduling
• Multicore locks
• Example 2: Flash Storage
11
Scheduling
Key Question: How does the operating system decide which thread to run next?
(e.g. many ready threads).
Previously: First-Come-First-Serve, Round-Robin, SRPT, Priority-Based
What about when we have multiple cores to schedule threads on? (assume all
cores equal)
12
Multicore Scheduling
Initial idea: one ready queue shared by k cores
• Share ready queue data structure across cores, lock to synchronize access
• One dispatcher per core
• Separate timer interrupts for each core
• Run the k highest-priority threads on the k cores
• When a new thread is marked “ready”, compare its priority against lowest-
priority running thread, preempt if new thread has higher priority.
• This works fine for 2 cores but breaks down with lots more cores. What is the
main bottleneck with this approach when used with many cores?
Respond on PollEv: pollev.com/cs111
or text CS111 to 22333 once to join. 13
14
Multicore Scheduling
The single ready queue is a huge bottleneck - cores must wait for access!
Tension between work stealing (want to move often) and core affinity (don’t
want to move often)
16
Gang Scheduling
How should we approach scheduling if one process has several threads?
• threads may be coordinating / exchanging info
• “gang scheduling” – run all threads together on different cores.
• Why? Thread progress may be intertwined. E.g. one thread holds lock then de-
scheduled, another runs but soon needs to wait for that same lock.
17
Multicore Scheduling
In general: these systems all have good and bad situations – e.g. Linux scheduler
had problems for many years, better now, but still some problems with load
balancing and moving threads too rapidly between cores.
18
Plan For Today
• Example 1: Multicore CPUs
• Multicore scheduling
• Multicore locks
• Example 2: Flash Storage
19
Single-Core Locks
So far: our Mutex implementation relied on disabling interrupts to prevent race
conditions.
class Lock { void Lock::unlock() {
int locked = 0; IntrGuard guard;
ThreadQueue q; if (q.empty()) {
}; locked = 0;
} else {
void Lock::lock() { unblockThread(q.remove());
IntrGuard guard; }
if (!locked) { }
locked = 1;
} else {
q.add(currentThread);
blockThread();
}
}
20
Multicore Locks
Problem: only works with single-core processors! If multiple cores, even if
interrupts are disabled, some other thread could be running on another core.
How do we approach this on multicore systems?
• Turn off all other cores? Not a great option.
Key Idea: we must use a (small amount) of busy waiting (!!). We need a
mechanism for cores to sync up before proceeding, and setting/checking a
shared value is the only option.
• There’s no other way to synchronize with the other cores; until we have
synchronized, we can’t even put a thread to sleep
21
Single-Core Locks
class Lock { void Lock::unlock() {
int locked = 0; IntrGuard guard;
ThreadQueue q; if (q.empty()) {
}; locked = 0;
} else {
void Lock::lock() { unblockThread(q.remove());
IntrGuard guard; }
if (!locked) { }
locked = 1;
} else {
q.add(currentThread);
blockThread();
}
}
22
Multicore Locks, V1
class Lock { void Lock::lock() {
int locked = 0; // try to change sync from 0 to 1
ThreadQueue q; while (true) {
int sync = 0; int old = sync;
}; sync = 1;
if (old == 0) break;
}
// we are only one proceeding now
if (!locked) {
locked = 1;
sync = 0;
} else {
q.add(currentThread);
sync = 0;
blockThread();
}
}
23
Multicore Locks, V1
class Lock { void Lock::lock() {
int locked = 0; // try to change sync from 0 to 1
ThreadQueue q; while (sync.exchange(1)) {}
std::atomic<int> sync(0); // we are only one proceeding now
};
if (!locked) {
locked = 1;
sync = 0;
} else {
q.add(currentThread);
sync = 0;
blockThread();
}
}
Example: exchange: atomically read memory value, replace it with a given value,
and get old value.
Busy waiting unavoidable! However, it’s very short – just long enough to
manipulate the lock structure.
27
Multicore Locks, V1
class Lock { void Lock::unlock() {
int locked = 0; while (sync.exchange(1)) {};
ThreadQueue q; if (q.empty()) {
std::atomic<int> sync(0); locked = 0;
}; } else {
unblockThread(q.remove());
void Lock::lock() { }
while (sync.exchange(1)) {} sync = 0;
if (!locked) { }
locked = 1;
sync = 0;
} else {
q.add(currentThread);
sync = 0;
blockThread();
}
}
28
Multicore Locks, V1
class Lock { void Lock::unlock() {
int locked = 0; while (sync.exchange(1)) {};
ThreadQueue q; if (q.empty()) {
std::atomic<int> sync(0); locked = 0;
}; } else {
unblockThread(q.remove());
void Lock::lock() { }
while (sync.exchange(1)) {} sync = 0;
if (!locked) { }
locked = 1;
sync = 0;
} else {
q.add(currentThread); Problem: there’s an air gap in between unlocking the lock
sync = 0; and blocking. Another thread could call unlock here,
blockThread(); unblocking us, and then we block forever
}
}
29
Multicore Locks
We won’t worry about these, but there are a few more steps/tweaks needed
(specifically; tweaking how we block to fix race condition and continuing to use
IntrGuard to disable interrupts). (See optional slides at end if you’re interested!)
30
Plan For Today
• Example 1: Multicore CPUs
• Example 2: Flash Storage
31
Flash Storage
• Much faster than hard disks: no moving parts (no delays from platters/head!),
smaller, faster
• Flash storage has become more common with increase in mobile devices,
nowadays common in PCs too.
• Can buy separately, or some devices have non-removable storage (e.g., many
mobile devices)
• New opportunities and challenges with managing filesystem designs for flash -
has own quirks Picture of a Samsung 980 Pro SSD, which is a small chip/board with a connector on the right side to insert into a
computer or other device.
32
Flash Storage Quirks
Quirk #1: Writing Data: flash storage doesn’t support just writing arbitrary data
to a portion of the storage. Instead, it supports two operations that combined
allow us to write data:
• Erase: set all bits of an erase unit to 1. The storage is divided up into erase
units, typically 256Kbytes big.
• Write: modify one page, can only clear bits to 0. The storage is also divided up
into pages, typically 512 bytes or 4Kbytes big.
33
Flash Storage Quirks
Quirk #2: Wear-out: after erasing an erase unit many times, it no longer reliably
stores data (!). Typically, around 100K.
Wear Leveling: want erase units to erase at same rate everywhere (rather than
having some parts wear out before others). Ideas about moving “hot” (short-
lived) and “cold” (long-lived) data around to even out storage usage.
34
Flash Storage and Filesystem Design
• A common approach has been to abstract away these quirks and include
software in the Flash Storage that makes it look like a hard disk.
• “Flash Translation Layer” – software that manages flash device, built in to drive, typically
mimics disk interface (read/write blocks)
• OS has no visibility into erase units, etc. – looks like a disk! Virtualization.
• Advantage: use existing filesystem software
• Disadvantages: sacrifice performance, no direct access to raw hardware, unnecessary
layers / duplication
• Lots of interesting questions about what filesystems would look like if designed
with flash storage in mind, without an FTL.
• Other storage technologies in the future?
35
Recap
• Example 1: Multicore CPUs Lecture 25 takeaway:
• Multicore scheduling
• Multicore locks
Operating systems and
hardware changes are tightly
• Example 2: Flash Storage
intertwined; multicore
processors and flash storage
provide two examples of the
impact of hardware changes
on OS implementations.
36
Extra Slides
37
Multicore Locks, V2
Somehow, we need to block and then unlock the lock??
• Key insight: we don’t need to block prior to unlocking the lock; we just need to
be marked as blocked.
• Solution (awkward): let’s change the interface of our thread
scheduler/dispatcher to allow us to separately mark a thread as blocked and
context switch. (Linux does something like this).
38
Multicore Locks, V2
class Lock { void Lock::unlock() {
int locked = 0; while (sync.exchange(1)) {};
ThreadQueue q; if (q.empty()) {
std::atomic<int> sync(0); locked = 0;
}; } else {
unblockThread(q.remove());
void Lock::lock() { }
while (sync.exchange(1)) {} sync = 0;
if (!locked) { }
locked = 1;
sync = 0;
} else {
q.add(currentThread);
currentThread->state = BLOCKED;
sync = 0;
blockThreadIfNecessary();
}
}
39
Multicore Locks, Final Version
One last change – we must disable interrupts.
• E.g. if the timer fires right after we acquire the int, another thread trying to get
it would just busy wait, wasting resources.
void Lock::lock() {
while (sync.exchange(1)) {}
if (!locked) {
locked = 1;
sync = 0;
} else {
q.add(currentThread);
currentThread->state = BLOCKED;
sync = 0;
blockThreadIfNecessary();
}
}
40
Multicore Locks, Final Version
class Lock { void Lock::unlock() {
int locked = 0; IntrGuard guard;
ThreadQueue q; while (sync.exchange(1)) {};
std::atomic<int> sync(0); if (q.empty()) {
}; locked = 0;
} else {
void Lock::lock() { unblockThread(q.remove());
IntrGuard guard; }
while (sync.exchange(1)) {} sync = 0;
if (!locked) { }
locked = 1;
sync = 0;
} else {
q.add(currentThread);
currentThread->state = BLOCKED;
sync = 0;
blockThreadIfNecessary();
}
} 41