x/sync/errgroup: Group: document that Go may not be concurrent with Wait unless semaphore > 0 #70284

haaawk · 2024-11-11T13:53:38Z

(Edit: skip down to #70284 (comment); this is now a doc change request. --@adonovan)

Go version

go version go1.23.1 darwin/amd64

Output of `go env` in your module/workspace:

GO111MODULE=''
GOARCH='amd64'
GOBIN=''
GOCACHE='/Users/haaawk/Library/Caches/go-build'
GOENV='/Users/haaawk/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMODCACHE='/Users/haaawk/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='darwin'
GOPATH='/Users/haaawk/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/local/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/local/go/pkg/tool/darwin_amd64'
GOVCS=''
GOVERSION='go1.23.1'
GODEBUG=''
GOTELEMETRY='local'
GOTELEMETRYDIR='/Users/haaawk/Library/Application Support/go/telemetry'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='clang'
CXX='clang++'
CGO_ENABLED='1'
GOMOD='/dev/null'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -arch x86_64 -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/_y/nj5hcsh93l12tmsszxgx7ntr0000gn/T/go-build2396141927=/tmp/go-build -gno-record-gcc-switches -fno-common'

What did you do?

When following code is run:

package main

import (
	"runtime"

	"golang.org/x/sync/errgroup"
)

func main() {
	runtime.GOMAXPROCS(1)
	g := &errgroup.Group{}
	g.SetLimit(1)
	ch := make(chan struct{})
	wait := make(chan struct{}, 2)
	g.Go(func() error {
		<-ch
		wait <- struct{}{}
		return nil
	})
	go g.Go(func() error {
		println("I'm not blocked")
		wait <- struct{}{}
		return nil
	})
	println("Ok let's play")
	close(ch)
	g.Wait()
	println("It's over already?")
	<-wait
	<-wait
}

https://go.dev/play/p/xTIsT1iouTd

What did you see happen?

The program printed:

Ok let's play
It's over already?
I'm not blocked

What did you expect to see?

The program printing:

Ok let's play
I'm not blocked
It's over already?

The text was updated successfully, but these errors were encountered:

gabyhelp · 2024-11-11T13:53:55Z

Related Issues

Related Discussions

Potential Issue in errgroup

_{(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)}

haaawk · 2024-11-11T14:02:13Z

I send a patch here https://go-review.googlesource.com/c/sync/+/627075

gopherbot · 2024-11-11T14:02:49Z

Change https://go.dev/cl/627075 mentions this issue: x/sync/errgroup: ensure all goroutines finish before Wait returns

jrick · 2024-11-11T15:57:57Z

go g.Go(

I don't think this is what you meant to write.

haaawk · 2024-11-11T16:21:41Z

go g.Go(

I don't think this is what you meant to write.

It is exactly what I meant to write. Otherwise the g.Go would block.

jrick · 2024-11-11T16:36:30Z

Your program contains a data race, because it's not valid to increment a waitgroup with 0 count while waiting on it. Your submitted patch doesn't change this.

haaawk · 2024-11-11T16:42:08Z

Your program contains a data race, because it's not valid to increment a waitgroup with 0 count while waiting on it. Your submitted patch doesn't change this.

The waitgroup has count of 1 when the go g.Go is called

jrick · 2024-11-11T16:44:47Z

No. You would need additional synchronization to guarantee this.

package main

import (
        "runtime"

        "golang.org/x/sync/errgroup"
)

func main() {
        runtime.GOMAXPROCS(1)
        g := &errgroup.Group{}
        g.SetLimit(1)
        ch := make(chan struct{})
        wait := make(chan struct{}, 2)
        insideGo := make(chan struct{})
        g.Go(func() error {
                <-ch
                wait <- struct{}{}
                return nil
        })
        go g.Go(func() error {
                close(insideGo)
                println("I'm not blocked")
                wait <- struct{}{}
                return nil
        })
        println("Ok let's play")
        close(ch)
        <-insideGo
        g.Wait()
        println("It's over already?")
        <-wait
        <-wait
}

cherrymui · 2024-11-11T17:00:09Z

@jrick is right. go g.Go(...) is not what you want. g.Go is intentionally blocking to limit the number of active goroutines. You've called g.SetLimit(1), so that limits it to 1 active goroutine a time. If you don't want that limit, you can remove that line, or set a higher limit.

In your case, there is no synchronization between g.Wait and the g.Go call in the new goroutine created by go g.Go(...). If is possible that g.Wait has finished while the new goroutine created by go g.Go(...) hasn't run. So this works as intended. Your CL doesn't seem to change it either: g.Wait can still finish even before the new goroutine reaches g.Go.

In general, if you have questions about how to use the API, please see https://go.dev/wiki/Questions . Thanks.

adonovan · 2024-11-11T17:01:23Z

Perhaps I misunderstand, but I think the Group is working as intended. SetLimit may prevent the Group from accepting new work, and the client must deal with that. It definitely cannot call Go asynchronously as there would be no guarantee that it happened before the later call to Wait.

Perhaps the documentation could be improved, but I don't think there's a bug.

adonovan · 2024-11-11T17:02:01Z

Ah, @cherrymui beat me to it!

haaawk · 2024-11-11T17:12:26Z

This was just my failed attempt to have a simple reproducer to the issue that's still there.
In the code below the order is always right and the program still prints the same result. The problem is that g.Go can wake up from sem in a group that was already waited on.
Let me thing of less artificial example and come back. It will need to be more complex unfortunately.

package main

import (
	"runtime"
	"time"

	"golang.org/x/sync/errgroup"
)

func main() {
	runtime.GOMAXPROCS(1)
	g := &errgroup.Group{}
	g.SetLimit(1)
	ch := make(chan struct{})
	wait := make(chan struct{}, 2)
	g.Go(func() error {
		<-ch
		wait <- struct{}{}
		return nil
	})
	go g.Go(func() error {
		println("I'm not blocked")
		wait <- struct{}{}
		return nil
	})
	println("Ok let's play")
	time.Sleep(1 * time.Second)
	close(ch)
	g.Wait()
	println("It's over already?")
	<-wait
	<-wait
}

adonovan · 2024-11-11T17:40:32Z

In the code below the order is always right and the program still prints the same result.

This program has a race: the second call to Go races with Wait. Consider: the goroutine created by the first Go function could complete (along with println, Sleep, close) before the goroutine created by the go statement even begins to execute the second call to g.Go.

haaawk · 2024-11-11T21:29:19Z

In the code below the order is always right and the program still prints the same result.

This program has a race: the second call to Go races with Wait. Consider: the goroutine created by the first Go function could complete (along with println, Sleep, close) before the goroutine created by the go statement even begins to execute the second call to g.Go.

I know it's not properly synchronized and I never said it is. I said that the order is right. I wouldn't expect runtime to stay idle during sleep and not execute the runnable goroutine.

I guess a race free reproducer could be:

package main

import (
	"runtime"
	"runtime/pprof"
	"strings"
	"time"

	"golang.org/x/sync/errgroup"
)

func main() {
	runtime.GOMAXPROCS(1)
	g := &errgroup.Group{}
	g.SetLimit(1)
	ch := make(chan struct{})
	wait := make(chan struct{}, 2)
	go func() {
		println("Starting task scheduler")
		println("Scheduling first task")
		g.Go(func() error {
			<-ch
			println("First task ran")
			wait <- struct{}{}
			return nil
		})
		println("Scheduling second task")
		g.Go(func() error {
			println("What about me?")
			wait <- struct{}{}
			return nil
		})
	}()
	time.Sleep(1 * time.Second)
	var b strings.Builder
	pprof.Lookup("goroutine").WriteTo(&b, 1)
	close(ch)
	if strings.Contains(b.String(), "errgroup.go:71") {
		g.Wait()
		println("Game over")
	} else {
		println("Second task not blocked yet")
	}
	<-wait
	<-wait
}

https://go.dev/play/p/Zg5GpZJz2bK

but I have to admit, there's probably no way to reproduce the problem in a completely race free way. If g.Go and g.Wait are in two different goroutines then runtime can always preempt g.Wait in the middle and then schedule g.Go to run and set g.wg.Add(1) while g.Wait is running with counter equal to 0.

haaawk · 2024-11-11T21:43:10Z

I guess it would be cleaner if the docs were saying explicitly that g.Wait has to be called only after all calls to g.Go finish. This is not obvious without looking into the implementation. One could imagine an implementation that does not have a data race when g.Wait and g.Go are called concurrently.

adonovan · 2024-11-11T23:23:13Z

I guess it would be cleaner if the docs were saying explicitly that g.Wait has to be called only after all calls to g.Go finish. This is not obvious without looking into the implementation. One could imagine an implementation that does not have a data race when g.Wait and g.Go are called concurrently.

It's fine to call Go concurrently with Wait, and indeed useful, if one item of work might add others to the queue. But you can't use SetLimit in this scenario or else the Group will stop accepting work. We could certainly document that, but we should not try to disallow concurrent calls to Go.

haaawk · 2024-11-12T08:28:21Z

I guess it would be cleaner if the docs were saying explicitly that g.Wait has to be called only after all calls to g.Go finish. This is not obvious without looking into the implementation. One could imagine an implementation that does not have a data race when g.Wait and g.Go are called concurrently.

It's fine to call Go concurrently with Wait, and indeed useful, if one item of work might add others to the queue. But you can't use SetLimit in this scenario or else the Group will stop accepting work. We could certainly document that, but we should not try to disallow concurrent calls to Go.

It is a very unintuitive requirement which is really driven by implementation details not the best API design but you're right.

adonovan · 2024-11-12T10:53:00Z

It is a very unintuitive requirement which is really driven by implementation details not the best API design

I think your criticism of the design is unfair. The doc comment for SetLimit says that it "limits the number of active goroutines", and that "any subsequent call to the Go method will block ...". The alternative that you propose would remove the blocking behavior, which would be an incompatible change (since it removes happens-before edges), and it would require that Group maintain an arbitrarily large queue of functions provided to Go before a goroutine is available to run them, which could cause an application to use unbounded memory when dealing with very long streams of tasks.

haaawk · 2024-11-12T11:58:43Z

It is a very unintuitive requirement which is really driven by implementation details not the best API design

I think your criticism of the design is unfair. The doc comment for SetLimit says that it "limits the number of active goroutines", and that "any subsequent call to the Go method will block ...". The alternative that you propose would remove the blocking behavior, which would be an incompatible change (since it removes happens-before edges), and it would require that Group maintain an arbitrarily large queue of functions provided to Go before a goroutine is available to run them, which could cause an application to use unbounded memory when dealing with very long streams of tasks.

So I'm not advocating for my change any more at all. My criticism is about the fact that calling Wait and Go concurrently with no synchronization is a data race and the docs don't say a word about it. Calling Go from a goroutine running inside a group is a form of synchronization. The requirement is really specific - "You can call Go concurrently with Wait only if you have guaranteed that at least 1 other goroutine that is already running inside the group won't finish until after either Go or Wait finishes first - or both.

adonovan · 2024-11-12T12:09:06Z

My criticism is about the fact that calling Wait and Go concurrently with no synchronization is a data race and the docs don't say a word about it. Calling Go from a goroutine running inside a group is a form of synchronization. The requirement is really specific - "You can call Go concurrently with Wait only if you have guaranteed that at least 1 other goroutine that is already running inside the group won't finish until after either Go or Wait finishes first - or both.

True, but the exact same principle applies to plain old WaitGroups: to add an item to the group you call Add(1) and later Done. In the simple case you make all the Add calls in sequence and all the Dones happen asynchronously. In more complex cases a task begets more tasks, so it calls Add (for the child) before calling Done for itself, preserving the invariant. But you get into trouble if you make the Add asynchronous to the Wait without otherwise ensuring that the semaphore is nonzero.

Perhaps we could add some clarifying documentation.

haaawk · 2024-11-12T12:35:18Z

My criticism is about the fact that calling Wait and Go concurrently with no synchronization is a data race and the docs don't say a word about it. Calling Go from a goroutine running inside a group is a form of synchronization. The requirement is really specific - "You can call Go concurrently with Wait only if you have guaranteed that at least 1 other goroutine that is already running inside the group won't finish until after either Go or Wait finishes first - or both.

True, but the exact same principle applies to plain old WaitGroups: to add an item to the group you call Add(1) and later Done. In the simple case you make all the Add calls in sequence and all the Dones happen asynchronously. In more complex cases a task begets more tasks, so it calls Add (for the child) before calling Done for itself, preserving the invariant. But you get into trouble if you make the Add asynchronous to the Wait without otherwise ensuring that the semaphore is nonzero.

Perhaps we could add some clarifying documentation.

Yes but WaitGroup has the following in the docs:

Note that calls with a positive delta that occur when the counter is zero must happen before a Wait. Calls with a negative delta, or calls with a positive delta that start when the counter is greater than zero, may happen at any time. Typically this means the calls to Add should execute before the statement creating the goroutine or other event to be waited for. If a WaitGroup is reused to wait for several independent sets of events, new Add calls must happen after all previous Wait calls have returned.

which gives a user a chance to use the API correctly without looking into its implementation.

haaawk · 2024-11-12T12:39:01Z

BTW I came up with this issue because I found in the codebase I'm working on the following code outside of group goroutines:

	if !w.g.TryGo(f) {
		go w.g.Go(f)
	}

and it seemed fishy from the concurrency point of view so I kept digging. Apparently someone was misled by errgroup.Group docs/API.

adonovan · 2024-11-13T00:42:51Z

Reopening as a doc change request.

berbreik · 2025-01-06T10:32:18Z

I'd like to work on this issue. I'II update the documentation to clarify the behavior when the semaphore limit is set to zero

gopherbot · 2025-03-21T19:53:15Z

Change https://go.dev/cl/660075 mentions this issue: errgroup: document calling Go before Wait

gopherbot added this to the Unreleased milestone Nov 11, 2024

cherrymui closed this as not planned Won't fix, can't repro, duplicate, stale Nov 11, 2024

adonovan reopened this Nov 13, 2024

adonovan changed the title ~~x/sync/errgroup: Group.Wait may return before all Group.Go calls are finished~~ x/sync/errgroup: Group: document that Go may not be concurrent with Wait unless semaphore > 0 Nov 13, 2024

gopherbot added the Documentation Issues describing a change to documentation. label Nov 13, 2024

ianlancetaylor added help wanted NeedsFix The path to resolution is known, but the work has not been done. labels Nov 13, 2024

ianlancetaylor added the gopls Issues related to the Go language server, gopls. label Nov 13, 2024

findleyr removed the gopls Issues related to the Go language server, gopls. label Nov 13, 2024

gopherbot closed this as completed in golang/sync@396f3a0 Apr 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

x/sync/errgroup: Group: document that Go may not be concurrent with Wait unless semaphore > 0 #70284

x/sync/errgroup: Group: document that Go may not be concurrent with Wait unless semaphore > 0 #70284

haaawk commented Nov 11, 2024 •

edited by adonovan

Loading

gabyhelp commented Nov 11, 2024

haaawk commented Nov 11, 2024

gopherbot commented Nov 11, 2024

jrick commented Nov 11, 2024

haaawk commented Nov 11, 2024

jrick commented Nov 11, 2024

haaawk commented Nov 11, 2024

jrick commented Nov 11, 2024

cherrymui commented Nov 11, 2024

adonovan commented Nov 11, 2024 •

edited

Loading

adonovan commented Nov 11, 2024

haaawk commented Nov 11, 2024

adonovan commented Nov 11, 2024

haaawk commented Nov 11, 2024

haaawk commented Nov 11, 2024

adonovan commented Nov 11, 2024

haaawk commented Nov 12, 2024

adonovan commented Nov 12, 2024

haaawk commented Nov 12, 2024

adonovan commented Nov 12, 2024

haaawk commented Nov 12, 2024

haaawk commented Nov 12, 2024

adonovan commented Nov 13, 2024

berbreik commented Jan 6, 2025

gopherbot commented Mar 21, 2025

x/sync/errgroup: Group: document that Go may not be concurrent with Wait unless semaphore > 0 #70284

x/sync/errgroup: Group: document that Go may not be concurrent with Wait unless semaphore > 0 #70284

Comments

haaawk commented Nov 11, 2024 • edited by adonovan Loading

Go version

Output of go env in your module/workspace:

What did you do?

What did you see happen?

What did you expect to see?

gabyhelp commented Nov 11, 2024

haaawk commented Nov 11, 2024

gopherbot commented Nov 11, 2024

jrick commented Nov 11, 2024

haaawk commented Nov 11, 2024

jrick commented Nov 11, 2024

haaawk commented Nov 11, 2024

jrick commented Nov 11, 2024

cherrymui commented Nov 11, 2024

adonovan commented Nov 11, 2024 • edited Loading

adonovan commented Nov 11, 2024

haaawk commented Nov 11, 2024

adonovan commented Nov 11, 2024

haaawk commented Nov 11, 2024

haaawk commented Nov 11, 2024

adonovan commented Nov 11, 2024

haaawk commented Nov 12, 2024

adonovan commented Nov 12, 2024

haaawk commented Nov 12, 2024

adonovan commented Nov 12, 2024

haaawk commented Nov 12, 2024

haaawk commented Nov 12, 2024

adonovan commented Nov 13, 2024

berbreik commented Jan 6, 2025

gopherbot commented Mar 21, 2025

haaawk commented Nov 11, 2024 •

edited by adonovan

Loading

Output of `go env` in your module/workspace:

adonovan commented Nov 11, 2024 •

edited

Loading