-
Notifications
You must be signed in to change notification settings - Fork 10.1k
raft: kill TODO about behavior when snapshot fails #3609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -263,7 +263,8 @@ func (r *raft) sendAppend(to uint64) { | |||
m.Type = pb.MsgSnap | |||
snapshot, err := r.raftLog.snapshot() | |||
if err != nil { | |||
panic(err) // TODO(bdarnell) | |||
r.logger.Debugf("%x failed to send snapshot to %x because snapshot is unavailable (%v)", r.id, to, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we still check the error before logging. we can still panic on other cases. But it @bdarnell thinks it is OK. I think we can just do the logging. Just try to avoid unnecessary logic changes.
LGTM, but I'd rather use |
The snapshot message may send every heartbeat interval in probe state. If heartbeat interval is 0.1s, it would print out 10 messages per second. So i don't want to print it out if unavailable snapshot is expected.
Sound good. Will do. For panic, I would keep original panic in other cases considering i don't know how it will be used in the future. |
9496e4b
to
6b482af
Compare
Updated. PTAL |
@@ -31,6 +31,8 @@ const noLimit = math.MaxUint64 | |||
|
|||
var errNoLeader = errors.New("no leader") | |||
|
|||
var ErrTemporarilyUnavailable = errors.New("snapshot is temporarily unavailable") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either name the variable ErrSnapshotTemporarilyUnavailable
or remove the word "snapshot" from the message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ErrSnapshotTemporarilyUnavaliable seems to be better.
LGTM |
@@ -57,6 +57,9 @@ type Storage interface { | |||
// first log entry is not available). | |||
FirstIndex() (uint64, error) | |||
// Snapshot returns the most recent snapshot. | |||
// If snapshot is temporarily unavailable, it should return ErrTemporarilyUnavailable, | |||
// so raft instance could know that Storage needs some time to prepare |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instance -> statemachine
LGTM |
6b482af
to
132fd81
Compare
etcd is going to support incremental snapshot, and we design to let it send at most one snapshot out at first stage. So when one snapshot is in flight, snapshot request will return error. When failing to get snapshot when sending MsgSnap, raft prints out related log and abort sending this message.
132fd81
to
4c82b48
Compare
raft: kill TODO about behavior when snapshot fails
etcd is going to support incremental snapshot, and we design to let it
send at most one snapshot out at first stage. So when one snapshot is in
flight, snapshot request will return error.
When failing to get snapshot when sending MsgSnap, raft prints out
related log and abort sending message.
/cc @xiang90 @bdarnell
for #3549