raft: kill TODO about behavior when snapshot fails #3609

yichengq · 2015-09-29T07:13:16Z

etcd is going to support incremental snapshot, and we design to let it
send at most one snapshot out at first stage. So when one snapshot is in
flight, snapshot request will return error.

When failing to get snapshot when sending MsgSnap, raft prints out
related log and abort sending message.

/cc @xiang90 @bdarnell

for #3549

xiang90 · 2015-09-29T14:56:16Z

raft/raft.go

@@ -263,7 +263,8 @@ func (r *raft) sendAppend(to uint64) {
 		m.Type = pb.MsgSnap
 		snapshot, err := r.raftLog.snapshot()
 		if err != nil {
-			panic(err) // TODO(bdarnell)
+			r.logger.Debugf("%x failed to send snapshot to %x because snapshot is unavailable (%v)", r.id, to, err)


can we still check the error before logging. we can still panic on other cases. But it @bdarnell thinks it is OK. I think we can just do the logging. Just try to avoid unnecessary logic changes.

bdarnell · 2015-09-29T17:10:29Z

LGTM, but I'd rather use Warningf instead of Debugf. Or, if Warningf would be too noisy for you, we could introduce a new special error like ErrUnavailable which snapshot() could return to skip the snapshot without logging.

yichengq · 2015-09-29T17:27:35Z

The snapshot message may send every heartbeat interval in probe state. If heartbeat interval is 0.1s, it would print out 10 messages per second. So i don't want to print it out if unavailable snapshot is expected.

introduce a new special error like ErrUnavailable which snapshot() could return to skip the snapshot without logging.

Sound good. Will do.

For panic, I would keep original panic in other cases considering i don't know how it will be used in the future.

yichengq · 2015-09-29T17:53:46Z

Updated. PTAL

bdarnell · 2015-09-29T17:55:38Z

raft/raft.go

@@ -31,6 +31,8 @@ const noLimit = math.MaxUint64

 var errNoLeader = errors.New("no leader")

+var ErrTemporarilyUnavailable = errors.New("snapshot is temporarily unavailable")


Either name the variable ErrSnapshotTemporarilyUnavailable or remove the word "snapshot" from the message.

ErrSnapshotTemporarilyUnavaliable seems to be better.

bdarnell · 2015-09-29T17:55:44Z

LGTM

xiang90 · 2015-09-29T17:57:05Z

raft/storage.go

@@ -57,6 +57,9 @@ type Storage interface {
 	// first log entry is not available).
 	FirstIndex() (uint64, error)
 	// Snapshot returns the most recent snapshot.
+	// If snapshot is temporarily unavailable, it should return ErrTemporarilyUnavailable,
+	// so raft instance could know that Storage needs some time to prepare


instance -> statemachine

xiang90 · 2015-09-29T17:57:13Z

LGTM

etcd is going to support incremental snapshot, and we design to let it send at most one snapshot out at first stage. So when one snapshot is in flight, snapshot request will return error. When failing to get snapshot when sending MsgSnap, raft prints out related log and abort sending this message.

raft: kill TODO about behavior when snapshot fails

xiang90 reviewed Sep 29, 2015
View reviewed changes

yichengq force-pushed the raft-snapshot branch from 9496e4b to 6b482af Compare September 29, 2015 17:52

bdarnell reviewed Sep 29, 2015
View reviewed changes

xiang90 reviewed Sep 29, 2015
View reviewed changes

yichengq force-pushed the raft-snapshot branch from 6b482af to 132fd81 Compare September 29, 2015 22:21

yichengq force-pushed the raft-snapshot branch from 132fd81 to 4c82b48 Compare September 30, 2015 02:15

yichengq added a commit that referenced this pull request Sep 30, 2015

Merge pull request #3609 from yichengq/raft-snapshot

533e728

raft: kill TODO about behavior when snapshot fails

yichengq merged commit 533e728 into etcd-io:master Sep 30, 2015

yichengq deleted the raft-snapshot branch September 30, 2015 02:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

raft: kill TODO about behavior when snapshot fails #3609

raft: kill TODO about behavior when snapshot fails #3609

Uh oh!

yichengq commented Sep 29, 2015

Uh oh!

xiang90 Sep 29, 2015

Uh oh!

bdarnell commented Sep 29, 2015

Uh oh!

yichengq commented Sep 29, 2015

Uh oh!

yichengq commented Sep 29, 2015

Uh oh!

bdarnell Sep 29, 2015

Uh oh!

xiang90 Sep 29, 2015

Uh oh!

bdarnell commented Sep 29, 2015

Uh oh!

xiang90 Sep 29, 2015

Uh oh!

xiang90 commented Sep 29, 2015

Uh oh!

Uh oh!

		@@ -31,6 +31,8 @@ const noLimit = math.MaxUint64

		var errNoLeader = errors.New("no leader")

		var ErrTemporarilyUnavailable = errors.New("snapshot is temporarily unavailable")

raft: kill TODO about behavior when snapshot fails #3609

raft: kill TODO about behavior when snapshot fails #3609

Uh oh!

Conversation

yichengq commented Sep 29, 2015

Uh oh!

xiang90 Sep 29, 2015

Choose a reason for hiding this comment

Uh oh!

bdarnell commented Sep 29, 2015

Uh oh!

yichengq commented Sep 29, 2015

Uh oh!

yichengq commented Sep 29, 2015

Uh oh!

bdarnell Sep 29, 2015

Choose a reason for hiding this comment

Uh oh!

xiang90 Sep 29, 2015

Choose a reason for hiding this comment

Uh oh!

bdarnell commented Sep 29, 2015

Uh oh!

xiang90 Sep 29, 2015

Choose a reason for hiding this comment

Uh oh!

xiang90 commented Sep 29, 2015

Uh oh!

Uh oh!