Skip to content

Implement barrier and double barrier #11725

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 25, 2014
Merged

Conversation

derekchiang
Copy link
Contributor

No description provided.


pub fn enter(&self) {
self.barrier.enter();
self.barrier.exit();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this exit may just be unnecessary synchronization? You'll have all the tasks get into enter, but then they'll all immediately stampede to exit, and then they'll all be allowed out. The code above is pretty small, so perhaps this could just be reimplemented in terms of the enter above? I do think that the barrier should be able to get re-used after tasks have exited the barrier, though.

What do you think about calling this function wait instead?

@brson
Copy link
Contributor

brson commented Jan 23, 2014

Are there other libraries we can look to for the names of these types? These don't do what I expected from the names: I figured a 'barrier' would be a memory barrier and I wouldn't be able to guess what a double barrier is. Googling doesn't turn up hits for 'double barrier'. Seems like the documentation could be beefed up a lot too since I had to read the code to understand what these do.

@derekchiang
Copy link
Contributor Author

For reference, these are the two places I have seen that discuss double barriers:

  1. The Zookeeper documentation and the paper (page 6)
  2. The Little Book of Semaphores (section 3.5)

Per @alexcrichton's previous comments, I added some comments and code examples. Also, my previous implementation was flatly wrong, as I overlooked the issue of spurious wakeups.

@alexcrichton
Copy link
Member

What do you think about only implementing Barrier, but allowing re-use as you do now? I would think that a DoubleBarrier is just trivially two waits on a single barrier.

@derekchiang
Copy link
Contributor Author

Do you mean: rename DoubleBarrier to Barrier, and implement DoubleBarrier such that both enter and exit just call wait?

@alexcrichton
Copy link
Member

Not quite. I would imagine this sequence of events:

  • Remove Barrier
  • Rename DoubleBarrier to Barrier
  • Remove exit
  • Rename enter to wait
  • Modify the broadcast() in wait to also reset the count.

@c-a
Copy link
Contributor

c-a commented Jan 23, 2014

Not quite. I would imagine this sequence of events:

Remove Barrier
Rename DoubleBarrier to Barrier
Remove exit
Rename enter to wait
Modify the broadcast() in wait to also reset the count.

FWIW, this seems to be the behaviour the Pthreads barrier has: http://linux.die.net/man/3/pthread_barrier_wait

@derekchiang
Copy link
Contributor Author

@alexcrichton The standard implementation of a barrier, like that given in the The Little Book of Semaphores (page 44), is to have two phases (which correspond to enter and exit in my implementation) and have the wait() call the two phases. Note that the implementation on the book uses semaphores, while mine uses condition variables, but conceptually they are the same.

If you think about it, you can't simply reset the count. Consider this code:

pub fn enter(&self) {
    self.arc.access_cond(|state, cond| {
        state.count += 1;
        if state.count < self.num_tasks {
            cond.wait();
        } else {
            state.count = 0;
            cond.broadcast();
        }
    });
}

The problem with this code is that it doesn't guard against spurious wakeup. As a standard practice, condition variables should wait only in a while loop which checks for a condition, to prevent a thread from accidentally waking up when the condition has not become true yet. So, the code above could be rewritten like this:

pub fn enter(&self) {
    self.arc.access_cond(|state, cond| {
        state.count += 1;
        if state.count < self.num_tasks {
            while state.count < self.num_tasks {
                cond.wait();
            }
        } else {
            state.count = 0;
            cond.broadcast();
        }
    });
}

However, since we are reseting the counter to 0, the tasks being woken up might not be able to escape the while loop.

@alexcrichton
Copy link
Member

It's true that condition variables are often susceptible to spurious wakeups, but remember that it is an implementation detail that isn't necessarily true in all circumstances. Notably our condition variables are not susceptible spurious wakeups.

Additionally, I believe that it's still possible to write a generic wait function:

fn wait(&mut self) {
    self.arc.access_cond(|state, cond| {
        let id = state.generation_id;
        state.count += 1;
        if state.count < self.num_tasks {
            while state.generation_id == id && state.count < self.num_tasks {
                cond.wait();
            }
        } else {
            state.count = 0;
            state.generation_id += 1;
            cond.broadcast();
        }
    });
}

The generation id ensures that a thread only waits in one usage of the a barrier. With our condition variables as-is I don't believe that this is necessary, but we may be susceptible to spurious wakeups at some point.

@derekchiang
Copy link
Contributor Author

From my understanding, spurious wakeup is an inherent problem of pthread, so I'm not sure how Rust could avoid it. Could you explain a bit?

Anyway, I changed the code according to your instruction. Note that instead of using generation_id, I'm using a boolean flag to make the barrier truly reusable.

@alexcrichton
Copy link
Member

We can get around spurious wakeups in two ways:

  1. Don't use pthreads. We have scheduling primitives on the tasks themselves, so we can use those to implement a cvar
  2. Have fine-grained knowledge about how a cvar is being used. For example, native tasks right now use a pthread cvar in order to implement deschedule, and they're able to protect against spurious wakeups because they have detailed knowledge about the usage pattern of the wakeup procedure (reawaken)

I think we may want to stick with an integer generation, though. Consider a sequence of events like this with a barrier of count 2:

  1. Thread A blocks with the generation of false.
  2. Thread B finishes the generation, waking up A, the generation is now true.
  3. Threads C and D sleep on the barrier, flipping it back to false.
  4. Thread A wakes up, sees the generation is still false, and the count is 0, so it goes back to sleep.

Essentially you don't know how many generations have passed since when you were signaled and when you've actually woken up. If we use an integer it's pretty unlikely to have 4 billion generations between when you were signaled and when you woke up, but I think it would be more likely to have 1 generation happen in that span of time.

@derekchiang
Copy link
Contributor Author

You are absolutely correct. I missed the fact that it doesn't matter if generation_id overflows. Fixed.

// The inner state of a double barrier
struct BarrierState {
priv count: uint,
// For each usage of the barrier, we flip the flag
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is a little out of date now.

@derekchiang
Copy link
Contributor Author

Thanks for reviewing. Fixed.

@alexcrichton
Copy link
Member

Could you rebase these commits into one? Other than that, this looks good to me!

@derekchiang
Copy link
Contributor Author

@alexcrichton done.

@derekchiang
Copy link
Contributor Author

@alexcrichton Fixed formatting issues. Retry?

@bors bors closed this Jan 25, 2014
@bors bors merged commit a937d18 into rust-lang:master Jan 25, 2014
bors added a commit to rust-lang-ci/rust that referenced this pull request Jul 25, 2022
Revert rust-lang#11490

Closes rust-lang#11725

rust-lang#11490 was a little misguided. Quoting the test name should be a client concern, since it's the client that actually runs `cargo`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants