fix(cluster): 修复节点重启时 WaitGroup 负计数器 panic #74
+8
−2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
问题描述
启动 Node 节点后停止,再次重启时出现以下 panic:
堆栈跟踪指向
/cluster/node/node.go:473的doneWait()函数。根因分析
问题出在
BindNode和UnbindNode中的addWait()/doneWait()调用没有严格配对:addWait()和doneWait()都依赖节点状态检查 (n.getState() != cluster.Shut)BindNode时状态为Shut(addWait被跳过),但UnbindNode时状态不是Shut(doneWait被执行),会导致Done()多于Add()Add,但解绑只有一次Done修复方案
在
Proxy结构体中添加boundUsers sync.Map字段,用于跟踪已绑定的用户,确保addWait/doneWait严格配对:BindNode: 使用LoadOrStore确保每个用户只调用一次addWaitUnbindNode: 使用LoadAndDelete确保只有已绑定的用户才调用doneWait测试步骤
sync: negative WaitGroup counterpanic影响范围
cluster/node/proxy.go兼容性
sync.Map是 Go 标准库,无额外依赖