Peterson's Algorithm for Mutual Exclusion | Set 2 (CPU Cycles and Memory Fence)
Last Updated :
07 Apr, 2025
Problem: Given 2 process i and j, you need to write a program that can guarantee mutual exclusion between the two without any additional hardware support.
Wastage of CPU clock cycles
In layman terms, when a thread was waiting for its turn, it ended in a long while loop which tested the condition millions of times per second thus doing unnecessary computation. There is a better way to wait, and it is known as "yield".
To understand what it does, we need to dig deep into how the Process scheduler works in Linux. The idea mentioned here is a simplified version of the scheduler, the actual implementation has lots of complications.
Consider the following example,
There are three processes, P1, P2 and P3. Process P3 is such that it has a while loop similar to the one in our code, doing not so useful computation, and it exists from the loop only when P2 finishes its execution. The scheduler puts all of them in a round robin queue. Now, say the clock speed of processor is 1000000/sec, and it allocates 100 clocks to each process in each iteration. Then, first P1 will be run for 100 clocks (0.0001 seconds), then P2(0.0001 seconds) followed by P3(0.0001 seconds), now since there are no more processes, this cycle repeats until P2 ends and then followed by P3's execution and eventually its termination.
This is a complete waste of the 100 CPU clock cycles. To avoid this, we mutually give up the CPU time slice, i.e. yield, which essentially ends this time slice and the scheduler picks up the next process to run. Now, we test our condition once, then we give up the CPU. Considering our test takes 25 clock cycles, we save 75% of our computation in a time slice. To put this graphically,

Considering the processor clock speed as 1MHz this is a lot of saving!.
Different distributions provide different function to achieve this functionality. Linux provides sched_yield().
C
void lock(int self)
{
flag[self] = 1;
turn = 1-self;
while (flag[1-self] == 1 &&
turn == 1-self)
// Only change is the addition of
// sched_yield() call
sched_yield();
}
Memory fence.
The code in earlier tutorial might have worked on most systems, but is was not 100% correct. The logic was perfect, but most modern CPUs employ performance optimizations that can result in out-of-order execution. This reordering of memory operations (loads and stores) normally goes unnoticed within a single thread of execution, but can cause unpredictable behaviour in concurrent programs.
Consider this example,
C
while (f == 0);
// Memory fence required here
print x;
In the above example, the compiler considers the 2 statements as independent of each other and thus tries to increase the code efficiency by re-ordering them, which can lead to problems for concurrent programs. To avoid this we place a memory fence to give hint to the compiler about the possible relationship between the statements across the barrier.
So the order of statements,
flag[self] = 1;
turn = 1-self;
while (turn condition check)
yield();
has to be exactly the same in order for the lock to work, otherwise it will end up in a deadlock condition.
To ensure this, compilers provide a instruction that prevent ordering of statements across this barrier. In case of gcc, its __sync_synchronize().
So the modified code becomes,
Full Implementation in C:
C++
// Filename: peterson_yieldlock_memoryfence.cpp
// Use below command to compile:
// g++ -pthread peterson_yieldlock_memoryfence.cpp -o peterson_yieldlock_memoryfence
#include<iostream>
#include<thread>
#include<atomic>
std::atomic<int> flag[2];
std::atomic<int> turn;
const int MAX = 1e9;
int ans = 0;
void lock_init()
{
// Initialize lock by resetting the desire of
// both the threads to acquire the locks.
// And, giving turn to one of them.
flag[0] = flag[1] = 0;
turn = 0;
}
// Executed before entering critical section
void lock(int self)
{
// Set flag[self] = 1 saying you want
// to acquire lock
flag[self]=1;
// But, first give the other thread the
// chance to acquire lock
turn = 1-self;
// Memory fence to prevent the reordering
// of instructions beyond this barrier.
std::atomic_thread_fence(std::memory_order_seq_cst);
// Wait until the other thread loses the
// desire to acquire lock or it is your
// turn to get the lock.
while (flag[1-self]==1 && turn==1-self)
// Yield to avoid wastage of resources.
std::this_thread::yield();
}
// Executed after leaving critical section
void unlock(int self)
{
// You do not desire to acquire lock in future.
// This will allow the other thread to acquire
// the lock.
flag[self]=0;
}
// A Sample function run by two threads created
// in main()
void func(int s)
{
int i = 0;
int self = s;
std::cout << "Thread Entered: " << self << std::endl;
lock(self);
// Critical section (Only one thread
// can enter here at a time)
for (i=0; i<MAX; i++)
ans++;
unlock(self);
}
// Driver code
int main()
{
// Initialize the lock
lock_init();
// Create two threads (both run func)
std::thread t1(func, 0);
std::thread t2(func, 1);
// Wait for the threads to end.
t1.join();
t2.join();
std::cout << "Actual Count: " << ans << " | Expected Count: " << MAX*2 << std::endl;
return 0;
}
C
// Filename: peterson_yieldlock_memoryfence.c
// Use below command to compile:
// gcc -pthread peterson_yieldlock_memoryfence.c -o peterson_yieldlock_memoryfence
#include<stdio.h>
#include<pthread.h>
#include "mythreads.h"
int flag[2];
int turn;
const int MAX = 1e9;
int ans = 0;
void lock_init()
{
// Initialize lock by resetting the desire of
// both the threads to acquire the locks.
// And, giving turn to one of them.
flag[0] = flag[1] = 0;
turn = 0;
}
// Executed before entering critical section
void lock(int self)
{
// Set flag[self] = 1 saying you want
// to acquire lock
flag[self]=1;
// But, first give the other thread the
// chance to acquire lock
turn = 1-self;
// Memory fence to prevent the reordering
// of instructions beyond this barrier.
__sync_synchronize();
// Wait until the other thread loses the
// desire to acquire lock or it is your
// turn to get the lock.
while (flag[1-self]==1 && turn==1-self)
// Yield to avoid wastage of resources.
sched_yield();
}
// Executed after leaving critical section
void unlock(int self)
{
// You do not desire to acquire lock in future.
// This will allow the other thread to acquire
// the lock.
flag[self]=0;
}
// A Sample function run by two threads created
// in main()
void* func(void *s)
{
int i = 0;
int self = (int *)s;
printf("Thread Entered: %d\n",self);
lock(self);
// Critical section (Only one thread
// can enter here at a time)
for (i=0; i<MAX; i++)
ans++;
unlock(self);
}
// Driver code
int main()
{
pthread_t p1, p2;
// Initialize the lock
lock_init();
// Create two threads (both run func)
Pthread_create(&p1, NULL, func, (void*)0);
Pthread_create(&p2, NULL, func, (void*)1);
// Wait for the threads to end.
Pthread_join(p1, NULL);
Pthread_join(p2, NULL);
printf("Actual Count: %d | Expected Count:"
" %d\n",ans,MAX*2);
return 0;
}
Java
import java.util.concurrent.atomic.AtomicInteger;
public class PetersonYieldLockMemoryFence {
static AtomicInteger[] flag = new AtomicInteger[2];
static AtomicInteger turn = new AtomicInteger();
static final int MAX = 1000000000;
static int ans = 0;
static void lockInit() {
flag[0] = new AtomicInteger();
flag[1] = new AtomicInteger();
flag[0].set(0);
flag[1].set(0);
turn.set(0);
}
static void lock(int self) {
flag[self].set(1);
turn.set(1 - self);
// Memory fence to prevent the reordering of instructions beyond this barrier.
// In Java, volatile variables provide this guarantee implicitly.
// No direct equivalent to atomic_thread_fence is needed.
while (flag[1 - self].get() == 1 && turn.get() == 1 - self)
Thread.yield();
}
static void unlock(int self) {
flag[self].set(0);
}
static void func(int s) {
int i = 0;
int self = s;
System.out.println("Thread Entered: " + self);
lock(self);
// Critical section (Only one thread can enter here at a time)
for (i = 0; i < MAX; i++)
ans++;
unlock(self);
}
public static void main(String[] args) {
// Initialize the lock
lockInit();
// Create two threads (both run func)
Thread t1 = new Thread(() -> func(0));
Thread t2 = new Thread(() -> func(1));
// Start the threads
t1.start();
t2.start();
try {
// Wait for the threads to end.
t1.join();
t2.join();
} catch (InterruptedException e) {
e.printStackTrace();
}
System.out.println("Actual Count: " + ans + " | Expected Count: " + MAX * 2);
}
}
Python
import threading
flag = [0, 0]
turn = 0
MAX = 10**9
ans = 0
def lock_init():
# This function initializes the lock by resetting the flags and turn.
global flag, turn
flag = [0, 0]
turn = 0
def lock(self):
# This function is executed before entering the critical section. It sets the flag for the current thread and gives the turn to the other thread.
global flag, turn
flag[self] = 1
turn = 1 - self
while flag[1-self] == 1 and turn == 1-self:
pass
def unlock(self):
# This function is executed after leaving the critical section. It resets the flag for the current thread.
global flag
flag[self] = 0
def func(s):
# This function is executed by each thread. It locks the critical section, increments the shared variable, and then unlocks the critical section.
global ans
self = s
print(f"Thread Entered: {self}")
lock(self)
for _ in range(MAX):
ans += 1
unlock(self)
def main():
# This is the main function where the threads are created and started.
lock_init()
t1 = threading.Thread(target=func, args=(0,))
t2 = threading.Thread(target=func, args=(1,))
t1.start()
t2.start()
t1.join()
t2.join()
print(f"Actual Count: {ans} | Expected Count: {MAX*2}")
if __name__ == "__main__":
main()
JavaScript
class PetersonYieldLockMemoryFence {
static flag = [0, 0];
static turn = 0;
static MAX = 1000000000;
static ans = 0;
// Function to acquire the lock
static async lock(self) {
PetersonYieldLockMemoryFence.flag[self] = 1;
PetersonYieldLockMemoryFence.turn = 1 - self;
// Asynchronous loop with a small delay to yield
while (PetersonYieldLockMemoryFence.flag[1 - self] == 1 &&
PetersonYieldLockMemoryFence.turn == 1 - self) {
await new Promise(resolve => setTimeout(resolve, 0));
}
}
// Function to release the lock
static unlock(self) {
PetersonYieldLockMemoryFence.flag[self] = 0;
}
// Function representing the critical section
static func(s) {
let i = 0;
let self = s;
console.log("Thread Entered: " + self);
// Lock the critical section
PetersonYieldLockMemoryFence.lock(self).then(() => {
// Critical section (Only one thread can enter here at a time)
for (i = 0; i < PetersonYieldLockMemoryFence.MAX; i++) {
PetersonYieldLockMemoryFence.ans++;
}
// Release the lock
PetersonYieldLockMemoryFence.unlock(self);
});
}
// Main function
static main() {
// Create two threads (both run func)
const t1 = new Thread(() => PetersonYieldLockMemoryFence.func(0));
const t2 = new Thread(() => PetersonYieldLockMemoryFence.func(1));
// Start the threads
t1.start();
t2.start();
// Wait for the threads to end.
setTimeout(() => {
console.log("Actual Count: " + PetersonYieldLockMemoryFence.ans + " | Expected Count: " + PetersonYieldLockMemoryFence.MAX * 2);
}, 1000); // Delay for a while to ensure threads finish
}
}
// Define a simple Thread class for simulation
class Thread {
constructor(func) {
this.func = func;
}
start() {
this.func();
}
}
// Run the main function
PetersonYieldLockMemoryFence.main();
C++
// mythread.h (A wrapper header file with assert statements)
#ifndef __MYTHREADS_h__
#define __MYTHREADS_h__
#include <pthread.h>
#include <cassert>
#include <sched.h>
// Function to lock a pthread mutex
void Pthread_mutex_lock(pthread_mutex_t *m)
{
int rc = pthread_mutex_lock(m);
assert(rc == 0); // Assert that the mutex was locked successfully
}
// Function to unlock a pthread mutex
void Pthread_mutex_unlock(pthread_mutex_t *m)
{
int rc = pthread_mutex_unlock(m);
assert(rc == 0); // Assert that the mutex was unlocked successfully
}
// Function to create a pthread
void Pthread_create(pthread_t *thread, const pthread_attr_t *attr,
void *(*start_routine)(void*), void *arg)
{
int rc = pthread_create(thread, attr, start_routine, arg);
assert(rc == 0); // Assert that the thread was created successfully
}
// Function to join a pthread
void Pthread_join(pthread_t thread, void **value_ptr)
{
int rc = pthread_join(thread, value_ptr);
assert(rc == 0); // Assert that the thread was joined successfully
}
#endif // __MYTHREADS_h__
C
// mythread.h (A wrapper header file with assert
// statements)
#ifndef __MYTHREADS_h__
#define __MYTHREADS_h__
#include <pthread.h>
#include <assert.h>
#include <sched.h>
void Pthread_mutex_lock(pthread_mutex_t *m)
{
int rc = pthread_mutex_lock(m);
assert(rc == 0);
}
void Pthread_mutex_unlock(pthread_mutex_t *m)
{
int rc = pthread_mutex_unlock(m);
assert(rc == 0);
}
void Pthread_create(pthread_t *thread, const pthread_attr_t *attr,
void *(*start_routine)(void*), void *arg)
{
int rc = pthread_create(thread, attr, start_routine, arg);
assert(rc == 0);
}
void Pthread_join(pthread_t thread, void **value_ptr)
{
int rc = pthread_join(thread, value_ptr);
assert(rc == 0);
}
#endif // __MYTHREADS_h__
Python
import threading
import ctypes
# Function to lock a thread lock
def Thread_lock(lock):
lock.acquire() # Acquire the lock
# No need for assert in Python, acquire will raise an exception if it fails
# Function to unlock a thread lock
def Thread_unlock(lock):
lock.release() # Release the lock
# No need for assert in Python, release will raise an exception if it fails
# Function to create a thread
def Thread_create(target, args=()):
thread = threading.Thread(target=target, args=args)
thread.start() # Start the thread
# No need for assert in Python, thread.start() will raise an exception if it fails
# Function to join a thread
def Thread_join(thread):
thread.join() # Wait for the thread to finish
# No need for assert in Python, thread.join() will raise an exception if it fails
Output:
Thread Entered: 1
Thread Entered: 0
Actual Count: 2000000000 | Expected Count: 2000000000
Similar Reads
Basics & Prerequisites
Data Structures
Array Data StructureIn this article, we introduce array, implementation in different popular languages, its basic operations and commonly seen problems / interview questions. An array stores items (in case of C/C++ and Java Primitive Arrays) or their references (in case of Python, JS, Java Non-Primitive) at contiguous
3 min read
String in Data StructureA string is a sequence of characters. The following facts make string an interesting data structure.Small set of elements. Unlike normal array, strings typically have smaller set of items. For example, lowercase English alphabet has only 26 characters. ASCII has only 256 characters.Strings are immut
2 min read
Hashing in Data StructureHashing is a technique used in data structures that efficiently stores and retrieves data in a way that allows for quick access. Hashing involves mapping data to a specific index in a hash table (an array of items) using a hash function. It enables fast retrieval of information based on its key. The
2 min read
Linked List Data StructureA linked list is a fundamental data structure in computer science. It mainly allows efficient insertion and deletion operations compared to arrays. Like arrays, it is also used to implement other data structures like stack, queue and deque. Hereâs the comparison of Linked List vs Arrays Linked List:
2 min read
Stack Data StructureA Stack is a linear data structure that follows a particular order in which the operations are performed. The order may be LIFO(Last In First Out) or FILO(First In Last Out). LIFO implies that the element that is inserted last, comes out first and FILO implies that the element that is inserted first
2 min read
Queue Data StructureA Queue Data Structure is a fundamental concept in computer science used for storing and managing data in a specific order. It follows the principle of "First in, First out" (FIFO), where the first element added to the queue is the first one to be removed. It is used as a buffer in computer systems
2 min read
Tree Data StructureTree Data Structure is a non-linear data structure in which a collection of elements known as nodes are connected to each other via edges such that there exists exactly one path between any two nodes. Types of TreeBinary Tree : Every node has at most two childrenTernary Tree : Every node has at most
4 min read
Graph Data StructureGraph Data Structure is a collection of nodes connected by edges. It's used to represent relationships between different entities. If you are looking for topic-wise list of problems on different topics like DFS, BFS, Topological Sort, Shortest Path, etc., please refer to Graph Algorithms. Basics of
3 min read
Trie Data StructureThe Trie data structure is a tree-like structure used for storing a dynamic set of strings. It allows for efficient retrieval and storage of keys, making it highly effective in handling large datasets. Trie supports operations such as insertion, search, deletion of keys, and prefix searches. In this
15+ min read
Algorithms
Searching AlgorithmsSearching algorithms are essential tools in computer science used to locate specific items within a collection of data. In this tutorial, we are mainly going to focus upon searching in an array. When we search an item in an array, there are two most common algorithms used based on the type of input
2 min read
Sorting AlgorithmsA Sorting Algorithm is used to rearrange a given array or list of elements in an order. For example, a given array [10, 20, 5, 2] becomes [2, 5, 10, 20] after sorting in increasing order and becomes [20, 10, 5, 2] after sorting in decreasing order. There exist different sorting algorithms for differ
3 min read
Introduction to RecursionThe process in which a function calls itself directly or indirectly is called recursion and the corresponding function is called a recursive function. A recursive algorithm takes one step toward solution and then recursively call itself to further move. The algorithm stops once we reach the solution
14 min read
Greedy AlgorithmsGreedy algorithms are a class of algorithms that make locally optimal choices at each step with the hope of finding a global optimum solution. At every step of the algorithm, we make a choice that looks the best at the moment. To make the choice, we sometimes sort the array so that we can always get
3 min read
Graph AlgorithmsGraph is a non-linear data structure like tree data structure. The limitation of tree is, it can only represent hierarchical data. For situations where nodes or vertices are randomly connected with each other other, we use Graph. Example situations where we use graph data structure are, a social net
3 min read
Dynamic Programming or DPDynamic Programming is an algorithmic technique with the following properties.It is mainly an optimization over plain recursion. Wherever we see a recursive solution that has repeated calls for the same inputs, we can optimize it using Dynamic Programming. The idea is to simply store the results of
3 min read
Bitwise AlgorithmsBitwise algorithms in Data Structures and Algorithms (DSA) involve manipulating individual bits of binary representations of numbers to perform operations efficiently. These algorithms utilize bitwise operators like AND, OR, XOR, NOT, Left Shift, and Right Shift.BasicsIntroduction to Bitwise Algorit
4 min read
Advanced
Segment TreeSegment Tree is a data structure that allows efficient querying and updating of intervals or segments of an array. It is particularly useful for problems involving range queries, such as finding the sum, minimum, maximum, or any other operation over a specific range of elements in an array. The tree
3 min read
Pattern SearchingPattern searching algorithms are essential tools in computer science and data processing. These algorithms are designed to efficiently find a particular pattern within a larger set of data. Patten SearchingImportant Pattern Searching Algorithms:Naive String Matching : A Simple Algorithm that works i
2 min read
GeometryGeometry is a branch of mathematics that studies the properties, measurements, and relationships of points, lines, angles, surfaces, and solids. From basic lines and angles to complex structures, it helps us understand the world around us.Geometry for Students and BeginnersThis section covers key br
2 min read
Interview Preparation
Practice Problem