[1/4] - protofsm: add new package for driving generic protocol FSMs #8337

Roasbeef · 2024-01-03T01:39:52Z

In this PR, we create a new package, protofsm which is intended to
abstract away from something we've done dozens of time in the daemon:
create a new event-drive protocol FSM. One example of this is the co-op
close state machine, and also the channel state machine itself.

This packages picks out the common themes of:

clear states and transitions between them
calling out to special daemon adapters for I/O such as transaction
broadcast or sending a message to a peer
cleaning up after state machine execution
notifying relevant callers of updates to the state machine

The goal of this PR, is that devs can now implement a state machine
based off of this primary interface:

// State defines an abstract state along, namely its state transition function
// that takes as input an event and an environment, and returns a state
// transition (next state, and set of events to emit). As state can also either
// be terminal, or not, a terminal event causes state execution to halt.
type State[Event any, Env Environment] interface {
	// ProcessEvent takes an event and an environment, and returns a new
	// state transition. This will be iteratively called until either a
	// terminal state is reached, or no further internal events are
	// emitted.
	ProcessEvent(event Event, env Env) (*StateTransition[Event, Env], error)

	// IsTerminal returns true if this state is terminal, and false otherwise.
	IsTerminal() bool
}

With their focus being only on each state transition, rather than all
the boiler plate involved (processing new events, advancing to
completion, doing I/O, etc, etc).

Instead, they just make their states, then create the state machine
given the starting state and env. The only other custom component needed
is something capable of mapping wire messages or other events from the
"outside world" into the domain of the state machine.

The set of types is based on a pseudo sum type system wherein you
declare an interface, make the sole method private, then create other
instances based on that interface. This restricts call sites (must pass
in that interface) type, and with some tooling, exhaustive matching can
also be enforced via a linter.

The best way to get a hang of the pattern proposed here is to check out
the tests. They make a mock state machine, and then use the new executor
to drive it to completion. You'll also get a view of how the code will
actually look, with the focus being on the: input event, current state,
and output transition (can also emit events to drive itself forward).

github-actions · 2024-01-03T02:18:34Z

Pull reviewers stats

Stats of the last 30 days for lnd:

User	Total reviews	Time to review	Total comments
yyforyongyu 🥇	6 ▀▀▀	3d 19h 57m ▀▀▀▀▀	3 ▀
Roasbeef 🥈	5 ▀▀▀	1d 18h 46m ▀▀	18 ▀▀▀▀▀▀▀
guggero 🥉	4 ▀▀	1d 7h 51m ▀▀	3 ▀
bhandras	2 ▀	3h 27m	0
calvinrzachman	1 ▀	6m	1
ziggie1984	1 ▀	20h 14m ▀	0

yyforyongyu

Really like the uniformed StateMachine🤩 I think the blockbeat from #7951 can even fit in this picture - that there exists a set of universal events such as a new block event, and we force every state machine to process it. My main question is whether we could stop generalizing at ProcessEvent, and leave the implementations of executeDaemonEvent to specific subsystems? This naturally leads to the question of whether we need this DaemonAdapters interface, as it seems it's not a common functionality that's shared by all subsystems.

Still need to think through, but a few ideas,

we could make StateMachine an interface, and maybe add something like BaseMachine that has the minimal methods such as driveMachine.
I like that State is an interface which makes writing the tests much easier. It's just that the name is a bit confusing I guess, as it's sort like a processor, and each state has its own processor.

My understanding of the design is, an event-driven machine that's pipelined with state processors, the machine doesn't care about the specifics of the event, instead, it's the state processor's responsibility to handle the event and instruct a new state. I think we could stop here without distinguishing interval vs external events, apply it to a few subsystems to see its effect.

protofsm/daemon_events.go

protofsm/state_machine.go

yyforyongyu · 2024-01-03T12:07:33Z

protofsm/state_machine.go

+// executeDaemonEvent executes a daemon event, which is a special type of event
+// that can be emitted as part of the state transition function of the state
+// machine. An error is returned if the type of event is unknown.
+func (s *StateMachine[Event, Env]) executeDaemonEvent(event DaemonEvent) error {


feels like it's leaking the implementation details from other subsystems

I think you'll have a better idea of the interaction once the new co-op close stuff is up, but the general idea is that:

All the state machine state transitions are pure functions

They emit events for the executor (prob should rename this struct slightly) to apply themselves

Something needs to be aware of the boundary between the pure state machine, and the daemon execution env it runs in

This thing handles that role of knowing all the global I/O or daemon actions to execute itself, and potentially emit an event back into the state machine (post execution hook)

Otherwise, what do you think should be handling the I/O between the daemon and the state machine?

Maybe it could hidden behind currentState.ProcessEvent? Since it generates the transition, it might as well process it based on the new transition, like broadcast or send message.

I think putting more things behind ProcessEvent would negatively impact testability. With this construction we can test the state transitions themselves in a pure environment and then wire up the execution of the generated events separately.

Maybe it could hidden behind currentState.ProcessEvent? Since it generates the transition, it might as well process it based on the new transition, like broadcast or send message.

So the idea is that the actual state transitions never need to concern themselves with any of these details. They just emit the event, then wait for w/e new event to be sent in. There's no leakage of implementation details at this StateMachine level, as we'll pass in a concrete implementation based on lnd later, here's an idea of what that looks like: ce75ef8.

This is to be considered universal, just like the POSIX interface we all know and love today. In this case, our processes re these FSMs, and the syscalls ways to interact with the chain or daemon.

protofsm/state_machine.go

protofsm/state_machine_test.go

Roasbeef · 2024-01-04T00:35:46Z

we could make StateMachine an interface,

Why do you think this should be an interface? The goal here is to provide a generic implementation that can drive any FSM, which is defined from that starting/initial state, and all the state transition functions. If you look at the test, it takes that mock state machine, and is able to drive that with the shared semantics of: terminal states, clean up functions, pure state transitions that emit any side effects as events, etc.

and leave the implementations of executeDaemonEvent to specific subsystems

The goal of those was to implement all the side effects we'd ever need in a single place. The daemon events added were just the ones I needed to implement the new co-op close state machine nearly from scratch. I think if we look at all the state machines we've written in the codebase, maybe there's ~10 daemon level adapters that are used continuously. One that's missing right now is requesting to be notified of something confirming.

protofsm/state_machine.go

protofsm/state_machine_test.go

ProofOfKeags

I did a no-nit high level review here. My biggest squint was around the SendWhen impure pseudo-predicate. Not gonna lie, I don't like it. However, I suspect that the reason you went this route is that making it pure would require hooks into the state changes of surrounding subsystems in ways that would require significant changes to the overall LND codebase before this could be inserted.

That said, there still may be no way around it. The main concern here is that the polling approach may miss the opportunities it needs to send the message out. The example here is OnCommit/OnFlush where we poll and still owe a commitment so we can't send, but then we do the commit and immediately follow up with another state change, thereby re-falsifying the SendWhen predicate before the next poll cycle.

In the case of shutdown and the coop close negotiations, this technically violates the spec. Idk what the practical consequences of that would be (they may be benign), but unless we can synchronize directly into the channel update lifecycle, we can't really be spec compliant.

On the other hand, you could make the argument that it isn't the state machine's responsibility to understand when a message should be synchronized into the message stream at all. It's job is simply to generate the response and the caller would queue it for sending at the next possible opportunity. This is the approach I took with the coop close v1: The ChanCloser is completely unaware of how the messages are dispatched, it just knows what to send, not when or how.

ProofOfKeags · 2024-01-08T20:34:42Z

protofsm/daemon_events.go

+
+// SendPredicate is a function that returns true if the target message should
+// sent.
+type SendPredicate = func() bool


I find myself very suspicious of this type of predicate construction. I would like for the function to take an argument to formally make it a predicate, and one that is ideally pure.

I haven't finished tackling the rest of the PR yet but I'm looking for opportunities to make this a reality in a way that cleans up the model.

Sure, I guess could call like a BoolCallback or something? Or just a ContinuationFunc?

I think this is useful for areas where we don't have the new hook concept. Eg: something waiting for a channel to be added to the graph before it acts.

In the RBF closer, I use this to hook into the case where no dangling updates exist (can send `shutdown): https://github.com/lightningnetwork/lnd/blob/43386d5643f961f948bf95513933c7d5a72fc74e/peer/chan_observer.go#L46-L51

Alternatively, we can use the hooks to send an event into the state machine once the state has been achieved. Then we have a new transition to just handle that event (send the shutdown). I slightly prefer it as is though, as the code reads as more imperative. Otherwise, the state machine would need more knowledge of hooks, and the ability of the hook to do things limit emit a daemon event, wherein the only way to emit that today is as a return value (the state transitions don't have handle on the thing executing it, they're effectively sandboxed).

TrafficLight? Trigger?

We can leave as is. I just don't know about that poll-cycle issue.

I think the best way to handle this is to make the "true yielding" calls into events. This removes the need for time-based poll loops.

ProofOfKeags · 2024-01-08T20:39:37Z

protofsm/state_machine.go

+type State[Event any, Env Environment] interface {
+	// ProcessEvent takes an event and an environment, and returns a new
+	// state transition. This will be iteratively called until either a
+	// terminal state is reached, or no further internal events are
+	// emitted.
+	ProcessEvent(event Event, env Env) (*StateTransition[Event, Env], error)


One of the things I'm noticing about this type of construction is that it forces the Event type that the StateMachine consumes to be the same type as the Event type that it produces. I don't think this is necessary. Does go require all of the Type Variables in the ProcessEvent function to be scoped to the interface, or can you introduce new tyvars on the method itself?

One of the things I'm noticing about this type of construction is that it forces the Event type that the StateMachine consumes to be the same type as the Event type that it produces. I don't think this is necessary

Good observation. I don't think it's necessary, but it felt natural in that if the state machine is defined by the type of events it accepts and the env, then most of the time, you want to also return something of that very same type.

I think in the future if we want the ability for one state machine to turn into another then, we can add in a new type and a code path to handle the switch over. I needed to do something similar to this, but I was able to just make a new composite state:

https://github.com/lightningnetwork/lnd/blob/43386d5643f961f948bf95513933c7d5a72fc74e/lnwallet/chancloser/rbf_coop_states.go#L659-L665

https://github.com/lightningnetwork/lnd/blob/43386d5643f961f948bf95513933c7d5a72fc74e/lnwallet/chancloser/rbf_coop_states.go#L319-L328

Does go require all of the Type Variables in the ProcessEvent function to be scoped to the interface, or can you introduce new tyvars on the method itself?

Current limitation is that you need to scope it all on the interface. You can't have new type params on methods.

I don't think it's necessary, but it felt natural in that if the state machine is defined by the type of events it accepts and the env, then most of the time, you want to also return something of that very same type.

It's natural but will limit composability of machines which can be useful.

Current limitation is that you need to scope it all on the interface. You can't have new type params on methods.

Boo. k.

I think for the longevity of this library that we need to decouple events that we consume from the events we produce.

ProofOfKeags · 2024-01-08T20:41:17Z

protofsm/state_machine.go

+	ProcessEvent(event Event, env Env) (*StateTransition[Event, Env], error)
+
+	// IsTerminal returns true if this state is terminal, and false otherwise.
+	IsTerminal() bool


An opportunity for a "law" here is that IsTerminal = true ==> ProcessEvent = nop

Totally. Here's an instance of that in the RBF coop PR: https://github.com/lightningnetwork/lnd/blob/43386d5643f961f948bf95513933c7d5a72fc74e/lnwallet/chancloser/rbf_coop_transitions.go#L917-L936

Available tooling wise, I think the best way for us to enforce this would be at the unit test level. I can think of some more involved mechanisms that involve stuff like registering all types in a global map, to then run a generic unit test again, but not so sure we should reach for that yet at this stage.

Yep. Unit tests would be the way to ensure this.

ProofOfKeags · 2024-01-08T21:21:56Z

protofsm/state_machine.go

+// executeDaemonEvent executes a daemon event, which is a special type of event
+// that can be emitted as part of the state transition function of the state
+// machine. An error is returned if the type of event is unknown.
+func (s *StateMachine[Event, Env]) executeDaemonEvent(event DaemonEvent) error {


I think putting more things behind ProcessEvent would negatively impact testability. With this construction we can test the state transitions themselves in a pure environment and then wire up the execution of the generated events separately.

ProofOfKeags · 2024-01-08T21:37:19Z

protofsm/state_machine.go

+	// If this is a disable channel event, then we'll disable the channel.
+	// This is usually done for things like co-op closes.
+	case *DisableChannelEvent:
+		err := s.daemon.DisableChannel(daemonEvent.ChanPoint)
+		if err != nil {
+			return fmt.Errorf("unable to disable channel: %w", err)
+		}
+
+		return nil


Are all of these state machines guaranteed to be attached to a particular channel id? This Event feels less general than the other two, but I don't have a concrete argument for why that's the case.

Are all of these state machines guaranteed to be attached to a particular channel id? This Event feels less general than the other two, but I don't have a concrete argument for why that's the case.

Yeah good point. In terms of usage, I guess this could become a new interface that we accept as part of the Environment. This one felt more syscall-y to me though, as the other interfaces don't affect/mutate the outside world, they just want to examine an attribute (no dangling updates), or do something pure like sign a signature.

Yeah maybe we just don't process these events if they are emitted by a state machine that doesn't have an association with an active channel. I agree it could be syscall-y but I am somewhat convinced that the security of said OS should include not being able to disable other channels...

ProofOfKeags · 2024-01-08T21:49:49Z

protofsm/state_machine.go

+		// Otherwise, this has a SendWhen predicate, so we'll need
+		// launch a goroutine to poll the SendWhen, then send only once
+		// the predicate is true.


Is there a way to avoid the polling approach? i.e. can we subscribe to the state changes we are actually interested in monitoring and make the predicate operate on a before/after pairing? Would that be too onerous?

I think the poll-an-opaque-boolean-returning-fn approach is potentially troublesome since it relies on impurity. If it's not prohibitively difficult I'd suggest we find a way to make SendWhen's predicate pure, and actually feed it the state changes it's monitoring rather than time based polling of state, but I do recognize that may not be an easy lift.

Yeah agree the polling (as you mentioned) has the downside on lagging the first instance where the new state/predicate is present, and in the case of something that can revert (flip on, then off) it may miss the opportunity all together.

If it's not prohibitively difficult I'd suggest we find a way to make SendWhen's predicate pure, and actually feed it the state changes it's monitoring rather than time based polling of state, but I do recognize that may not be an easy lift.

Hmm, yeah I'm not sure what we would pass in here. With the code as is in the RBF closer PR, it could pass in the ChanStateObserver (or w/e it's called now), but then that would require the FSM executor to be able to extract attributes from the env, whereas rn it's based on an interface.

Yeah it's very unclear what you would want to send I agree. Ideally it's the state we are polling that says "coast is clear" and send in a new copy of it on each change. Generally I think of the relevant state here as channel state (the state of balances and commitment transactions) and link state (the state associated with where in the protocol trace we are with our peer). LightningChannel captures the channel state well but we don't really have a good tracker of link state since our link is busted up across multiple different data structures with the chan closers and funding manager. Making the desired change here may require us finishing the ChannelLifecycle refactor. Maybe we can try out a polling answer and just see what happens? It can mean untimely shutdown responses though.

ProofOfKeags · 2024-01-08T22:00:50Z

protofsm/state_machine.go

+
+	// ExternalEvent is an optional external event that is to be sent to
+	// the daemon for dispatch. Usually, this is some form of I/O.
+	ExternalEvents fn.Option[DaemonEventSet]


Is there a material difference in semantics between None and Some({}) here?

I think setting Some({}) would be considered a bug, or quirky logic. This should only be set to Some when the FSM has some I/O that it wants to perform. Similar to the whole nil vs []byte{} thing with slices.

Can we just make it a set then, since sets can be empty?

Still relevant.

protofsm/state_machine.go

ProofOfKeags · 2024-01-08T22:07:17Z

protofsm/state_machine.go

+				// With the event processed, we'll process any
+				// new daemon events that were emitted as part
+				// of this new state transition.
+				err := fn.MapOptionZero(events.ExternalEvents, func(dEvents DaemonEventSet) error {
+					for _, dEvent := range dEvents {
+						err := s.executeDaemonEvent(dEvent)
+						if err != nil {
+							return err
+						}
+					}
+
+					return nil
+				})
+				if err != nil {
+					return err
+				}
+
+				// Next, we'll add any new emitted events to
+				// our event queue.
+				events.InternalEvent.WhenSome(func(inEvent Event) {
+					eventQueue.Enqueue(inEvent)
+				})
+
+				return nil
+			})
+			if err != nil {
+				return err
+			}


If I'm reading this correctly it means that external events will always synchronize before internal ones. I believe this is OK (even preferred), but wanted to double check that this is what we want. Is there ever a situation in which we'd want to emit both but have them synchronize the other direction?

I think Internal events are a way to buy us "cut-through" where we don't rely on synthetic events from the surrounding environment to drive ourselves forward and we opportunistically drive ourselves as far forward as possible in any given moment. Under this interpretation I think this is indeed what we want.

Is there ever a situation in which we'd want to emit both but have them synchronize the other direction?

Good q....my mental model here is that the executor context switches to execute all the syscalls, then resumes execution of the state machine with any emitted internal events. Re the opposite ordering, I guess this is sort of a system call interface the executor has with the FSM: what's the execution order of emitted events? One could likely devise state machines where if you flip the ordering you may end up with incorrect (?) behavior.

I think Internal events are a way to buy us "cut-through" where we don't rely on synthetic events from the surrounding environment to drive ourselves forward and we opportunistically drive ourselves as far forward as possible in any given moment.

Yep, I think for the most part, you can elide emitting an internal event just by doing even more within a given state transition. I like them though from the PoV of minimal state transitions, and also domain modeling as well.

protofsm/state_machine_test.go

coderabbitai · 2024-01-24T02:59:50Z

Important

Review skipped

Auto reviews are limited to specific labels.

Labels to auto review (1)

llm-review

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai generate interesting stats about this repository and render them as a table.
- @coderabbitai show all the console.log statements in this repository.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

Roasbeef · 2024-02-06T03:07:13Z

PTAL.

protofsm/state_machine_test.go

ProofOfKeags

Main thing is I think we want to make state machines not able to "throw a disable" to another channel.

ProofOfKeags · 2024-02-08T22:30:12Z

protofsm/daemon_events.go

+
+// SendPredicate is a function that returns true if the target message should
+// sent.
+type SendPredicate = func() bool


TrafficLight? Trigger?

We can leave as is. I just don't know about that poll-cycle issue.

ProofOfKeags · 2024-02-08T22:32:26Z

protofsm/state_machine.go

+	ProcessEvent(event Event, env Env) (*StateTransition[Event, Env], error)
+
+	// IsTerminal returns true if this state is terminal, and false otherwise.
+	IsTerminal() bool


Yep. Unit tests would be the way to ensure this.

ProofOfKeags · 2024-02-08T22:36:28Z

protofsm/state_machine.go

+	// If this is a disable channel event, then we'll disable the channel.
+	// This is usually done for things like co-op closes.
+	case *DisableChannelEvent:
+		err := s.daemon.DisableChannel(daemonEvent.ChanPoint)
+		if err != nil {
+			return fmt.Errorf("unable to disable channel: %w", err)
+		}
+
+		return nil


Yeah maybe we just don't process these events if they are emitted by a state machine that doesn't have an association with an active channel. I agree it could be syscall-y but I am somewhat convinced that the security of said OS should include not being able to disable other channels...

Roasbeef · 2024-02-29T22:47:47Z

Pushed up a new set of commits with some bug fixes and some additional functionality that came in handy when starting to hook up the new RBF coop close state machine to the peer struct.

Roasbeef · 2024-03-06T03:14:22Z

Updated the branch to remove DisableChannel as a syscall.

ProofOfKeags

I have a bunch of comments peppered throughout. However, the biggest squint I have is the fact that the StateMachines we define using this approach are always structured as their own CSP and so any time we try to compose them together we will have concurrency considerations. This makes it harder for us to fix the existing races we have between the chancloser machinery and the link. For that reason, I think we need to engineer into this approach a way to mark an Event as fully processed to the world outside of the StateMachine.

Concretely, SendEvent needs to return something approximating a sync primitive that will wake/unlock when the event is fully ack'ed.

ProofOfKeags · 2024-03-07T19:28:14Z

protofsm/state_machine.go

+	// RegisterSpendNtfn registers an intent to be notified once the target
+	// outpoint is successfully spent within a transaction. The script that
+	// the outpoint creates must also be specified. This allows this
+	// interface to be implemented by BIP 158-like filtering.
+	RegisterSpendNtfn(outpoint *wire.OutPoint, pkScript []byte,
+		heightHint uint32) (*chainntnfs.SpendEvent, error)


When you say "that the outpoint creates" are you referring to the script of the outpoint being spent? or referring to the script of ... one of (?) ... the outpoints that are created by the tx that spends the specified outpoint?

ProofOfKeags · 2024-03-07T19:39:07Z

protofsm/state_machine.go

+		s.wg.Add(1)
+		go func() {
+			defer s.wg.Done()
+			for {


If I'm reading this right, this for loop is unnecessary

ProofOfKeags · 2024-03-07T19:39:22Z

protofsm/state_machine.go

+		s.wg.Add(1)
+		go func() {
+			defer s.wg.Done()
+			for {


same here. I'm on a quest to make for loops an endangered species.

ProofOfKeags · 2024-03-07T20:18:34Z

protofsm/state_machine.go

+	// Before we start, if we have an init daemon event specified, then
+	// we'll handle that now.
+	err := fn.MapOptionZ(s.initEvent, func(event DaemonEvent) error {
+		return s.executeDaemonEvent(event)
+	})
+	if err != nil {
+		log.Errorf("unable to execute init event: %w", err)
+		return
+	}


As a matter of taste, I think these single run actions should be run in the main body of the start method rather than the prelude of the goroutine method. Thoughts?

ProofOfKeags · 2024-03-07T20:22:34Z

protofsm/state_machine.go

-	initialState State[Event, Env],
-	env Env) StateMachine[Event, Env] {
+	initialState State[Event, Env], env Env,
+	initEvent fn.Option[DaemonEvent]) StateMachine[Event, Env] {


Maybe this is a dumb question, but why make the caller responsible for setting this. IIUC, the caller is saying "Hey can you tell me to do the thing immediately, then I'll do the thing". Why not just do the thing? Or alternatively, why have the caller tell the StateMachine "tell me to do the thing". The state machine could just know that it needs to do the thing.

ProofOfKeags · 2024-03-07T20:43:23Z

protofsm/state_machine.go

+// SendMessage attempts to send a wire message to the state machine. If the
+// message can be mapped using the default message mapper, then true is
+// returned indicating that the message was processed. Otherwise, false is
+// returned.
+func (s *StateMachine[Event, Env]) SendMessage(msg lnwire.Message) bool {
+	// If we have no message mapper, then return false as we can't process
+	// this message.
+	if !s.cfg.MsgMapper.IsSome() {
+		return false
+	}
+
+	// Otherwise, try to map the message using the default message mapper.
+	// If we can't extract an event, then we'll return false to indicate
+	// that the message wasn't processed.
+	var processed bool
+	s.cfg.MsgMapper.WhenSome(func(mapper MsgMapper[Event]) {
+		event := mapper.MapMsg(msg)
+
+		event.WhenSome(func(event Event) {
+			s.SendEvent(event)
+
+			processed = true
+		})
+	})
+
+	return processed
+}


This is also simplified by going for the const(None) approach.

ProofOfKeags · 2024-03-07T20:44:57Z

protofsm/state_machine.go

+	// Name returns the name of the environment. This is used to uniquely
+	// identify the environment of related state machines.
+	Name() string


I suggest naming this "Id" instead of "Name" since it will build intuition for posterity that it carries semantic meaning and must be unique for it to function as expected.

ProofOfKeags · 2024-03-07T20:46:12Z

protofsm/log.go

+// newLogClosure returns a new closure over a function that returns a string
+// which itself provides a Stringer interface so that it can be used with the
+// logging system.
+func newLogClosure(c func() string) logClosure {
+	return logClosure(c)
+}


Do we really need this? can't we just use the type conversion function generated by the typedef a few lines above?

ProofOfKeags · 2024-03-07T20:51:05Z

protofsm/state_machine.go

+				// An error occurred, so we'll tear down the
+				// entire state machine as we can't proceed.
+				go s.Stop()


Do we have plans to bring it back up? This seems like a pretty important thing to know both for implementers of this interface, as well as consumers.

ProofOfKeags · 2024-03-07T20:53:55Z

protofsm/daemon_events.go

+// SpendMapper is a function that's used to map a spend notification to a
+// custom state machine event.
+type SpendMapper[Event any] func(*chainntnfs.SpendDetail) Event


So unlike the MsgMapper, it seems that this will always be guaranteed to produce a valid event from a Spend. Can you elaborate on why this difference in choices makes sense. The incongruence bothers me. I also think we can easily fix it via the const approach as I specified earlier, but I am curious about why you made two different choices in the first place.

morehouse · 2024-03-22T21:29:30Z

On initial look, I'm not excited about this change.

I find the event-driven pattern less readable, with code blocks like this:

switch event.(type) {
	case *IncomingStfu:
		stfu := lnwire.Stfu{
			ChanID:    env.cid,
			Initiator: false,
		}
		send := protofsm.SendMsgEvent[Events]{
			Msgs:       []lnwire.Message{&stfu},
			TargetPeer: env.key,
			SendWhen:   fn.Some(env.canSend),
			PostSendEvent: fn.Some(
				Events(&gotoQuiescent{}), // gross
			),
		}

		return &protofsm.StateTransition[Events, *Env]{
			NextState: &Live{},
			NewEvents: fn.Some(
				protofsm.EmittedEvent[Events]{
					ExternalEvents: fn.Some(
						protofsm.DaemonEventSet{&send},
					),
				},
			),
		}, nil
	case *Initiate:
		stfu := lnwire.Stfu{
			ChanID:    env.cid,
			Initiator: true,
		}
		send := protofsm.SendMsgEvent[Events]{
			Msgs:       []lnwire.Message{&stfu},
			TargetPeer: env.key,
			SendWhen:   fn.Some(env.canSend),
			PostSendEvent: fn.Some(
				Events(&gotoAwaitingStfu{}), // gross
			),
		}

		return &protofsm.StateTransition[Events, *Env]{
			NextState: &Live{},
			NewEvents: fn.Some(
				protofsm.EmittedEvent[Events]{
					ExternalEvents: fn.Some(
						protofsm.DaemonEventSet{&send},
					),
				},
			),
		}, nil
	case *gotoAwaitingStfu:
		return &protofsm.StateTransition[Events, *Env]{
			NextState: &AwaitingStfu{},
			NewEvents: fn.None[protofsm.EmittedEvent[Events]](),
		}, nil
	case *gotoQuiescent:
		return &protofsm.StateTransition[Events, *Env]{
			NextState: &Quiescent{},
			NewEvents: fn.None[protofsm.EmittedEvent[Events]](),
		}, nil
	default:
		panic("impossible: invalid QuiescerEvent")
	}
}

instead of readable equivalent code something like this:

func (q *quiescer) sendStfu() error {
  stfu := lnwire.Stfu{
    ChanID:    env.cid,
    Initiator: q.state == Initiate,
  }
  if err := sendMsg(stfu); err != nil {
    return err
  }

  switch q.state {
    case IncomingStfu: q.state = Quiescent
    case Initiate:     q.state = AwaitingStfu
    default:           return fmt.Errorf("Invalid state change")
  }

  return nil
}

I also find it more difficult to trace the flow of a program written with protofsm, with states and events and transitions being passed all over the place. I fear that debugging code written in this style may be much more difficult.

I find it quite confusing to think about what is executing at any given time. It seems each protofsm gets its own goroutine and daemon events also get their own goroutines. And the concurrency behavior is hidden from the protofsm user, which seems a disaster just waiting to happen.

Maybe I'm slower than others, but I've been trying to grok protofsm for a day now and I'm still not confident I fully grasp the intricacies. If I had to write or modify code in this style, I would not be confident that my code was bug-free.

Roasbeef · 2024-04-04T18:36:04Z

I also find it more difficult to trace the flow of a program written with protofsm, with states and events and transitions being passed all over the place. I fear that debugging code written in this style may be much more difficult.

I think the exact opposite is the case. With the framework as is, you have a standardized way of handling new state transitions, and you're forced to only maintain state within the protocol state definition, instead of a large struct with many variables that are only conditionally set if a certain state is present. You can examine a single state transition at a time, which clearly enumerates all its inputs and outputs.

You also don't need to re-write the very same executor loop (take in message, select on quit channel, apply state, loop agaon) that we've implemented several times over in the codebase. You just write your state transitions, and hand it off for handling.

Re debugging, my experience of debugging the rbf-coop state machine was pretty straight forward. The only state you need to wrangle with is the state in the protocol state. There's no concurrency within the state machine either, you're forced to implement everything with serial execution. You write unit tests for a given state transition, and can even employ property based testing to assert invariants re inputs/outputs.

I find it quite confusing to think about what is executing at any given time. It seems each protofsm gets its own goroutine and daemon events also get their own goroutines. And the concurrency behavior is hidden from the protofsm user, which seems a disaster just waiting to happen.

For a given state machine, everything is executed serially (we can also make it fully blocking, but nothing works like that today, since you don't want to block wire message ingestion). You define the transitions, then a generic executor handles mapping a wire message to a protocol state (just one example) to apply directly. The daemon events executed async are the very same ones that you'd normally spwan a goroutine to funnel a response into a channel (waiting for a spend/confirmation, etc). Transaction broadcast and wire message sending are synchronous.

Roasbeef · 2024-04-04T18:43:28Z

Haven't dived deep into that PR yet, but looking at the example, the top two transitions to Live don't look necessarily, and they can just go directly to AwaitingStfu. I think with that, you have a more accurate comparison:

s(Live, IncomingStfu) -> Quiescent.
s(Live, Initiate) -> AwaitingStfu.

So just two switch cases. There def is a bit more line noise going on there due to Go's lack basic type inference, but you can make some helper funcs to handle the defs.

Even with that they don't look quite equivalent, as one wants to wait on a certain state to send the message, while the other would unconditionally send it. As mentioned above, to compare directly, you'd also need to implement the executor/event loop for the second version, IIRC that hadn't yet been done.

ProofOfKeags · 2024-04-04T20:07:45Z

I think all of the comments here that @Roasbeef makes about protofsm in general are correct. I also think that the quiescence protofsm implementation exaggerates its costs and understates its benefits. I made certain choices in the quiescence implementation in order to bind the state transition itself when the message itself gets sent as opposed to when it gets staged to send. This may not be necessary -- It may not even be good!

The main benefit that is understated here is that it is often the case that a state machine is best expressed as a sum of products. Product types are very easily expressed in go via structs. Sums on the other hand are another story. The sealed interface pattern helps us model it better and makes it such that we can structurally guarantee the presence or absence of the associated state paremeters with the state itself rather than having a swiss cheese block of potentially valid or invalid pointers depending on the state selector row. It also allows us to explicitly enumerate the valid state transitions away from a particular state in a way that is very well organized and isolated. Could this be accomplished in another way? Yes. Is it better to do another way? I'm not so sure.

So while it was a very simple state machine to implement, quiescence is probably not an illustrative example of the leverage that protofsm can provide. The essential tradeoff being made is that protofsm adds a close to fixed overhead in terms of the naturality of expressing the state machine, and its benefits compound as the state machine itself gets bigger.

Roasbeef · 2024-08-02T01:32:11Z

Rebased to get a fresh CI run going.

ProofOfKeags

Biggest changes I'd like to see are terminology changes regarding the StateMachine and State. I think from reviewing the code it is clear that the currently named StateMachine is really an Executor and that the State is really a StateMachine.

This is important because I think the (current) State abstraction will be nice to use in the context of embedding one automata within another.

The second major change is to ensure that the executor only has one thread and we can predict the ordering of events. As it stands right now the SendWhen predicate has its own separate ticker that operates independently of the (current) StateMachine's event loop. I think instead we want to have the event loop call the sendWhen predicates predictably in the same cycle that it calls the main driver.

ProofOfKeags · 2024-08-14T21:28:15Z

protofsm/log.go

+// logClosure is used to provide a closure over expensive logging operations
+// so they aren't performed when the logging level doesn't warrant it.
+type logClosure func() string
+
+// String invokes the underlying function and returns the result.
+func (c logClosure) String() string {
+	return c()
+}
+
+// newLogClosure returns a new closure over a function that returns a string
+// which itself provides a Stringer interface so that it can be used with the
+// logging system.
+func newLogClosure(c func() string) logClosure {
+	return logClosure(c)
+}


Since this PR was put in, we have added an LND-wide version of this: https://github.com/lightningnetwork/lnd/blob/master/lnutils/log.go

ProofOfKeags · 2024-08-14T21:44:34Z

protofsm/state_machine.go

+	// MsgMapper is an optional message mapper that can be used to map
+	// normal wire messages into FSM events.
+	MsgMapper fn.Option[MsgMapper[Event]]


This is still relevant and we now have it in fn

ProofOfKeags · 2024-08-14T21:47:25Z

protofsm/state_machine.go

+
+	// ExternalEvent is an optional external event that is to be sent to
+	// the daemon for dispatch. Usually, this is some form of I/O.
+	ExternalEvents fn.Option[DaemonEventSet]


Still relevant.

ProofOfKeags · 2024-08-14T21:48:55Z

protofsm/state_machine.go

+type State[Event any, Env Environment] interface {
+	// ProcessEvent takes an event and an environment, and returns a new
+	// state transition. This will be iteratively called until either a
+	// terminal state is reached, or no further internal events are
+	// emitted.
+	ProcessEvent(event Event, env Env) (*StateTransition[Event, Env], error)


I think for the longevity of this library that we need to decouple events that we consume from the events we produce.

ProofOfKeags · 2024-08-14T21:50:56Z

protofsm/state_machine.go

+	return fn.MapOptionZ(cfgMapper, func(mapper MsgMapper[Event]) bool {
+		return mapper.MapMsg(msg).IsSome()
+	})


Yeah again I think that creating a message mapper that returns None for all inputs handily solves this issue. Should be easy to do with the new Const function.

ProofOfKeags · 2024-08-14T22:39:58Z

protofsm/daemon_events.go

+
+// SendPredicate is a function that returns true if the target message should
+// sent.
+type SendPredicate = func() bool


I think the best way to handle this is to make the "true yielding" calls into events. This removes the need for time-based poll loops.

ProofOfKeags · 2024-08-14T22:40:36Z

protofsm/log.go

+
+// The default amount of logging is none.
+func init() {
+	UseLogger(build.NewSubLogger("PRCL", nil))


Do we want a logger at the library level?

In this PR, we create a new package, `protofsm` which is intended to abstract away from something we've done dozens of time in the daemon: create a new event-drive protocol FSM. One example of this is the co-op close state machine, and also the channel state machine itself. This packages picks out the common themes of: * clear states and transitions between them * calling out to special daemon adapters for I/O such as transaction broadcast or sending a message to a peer * cleaning up after state machine execution * notifying relevant callers of updates to the state machine The goal of this PR, is that devs can now implement a state machine based off of this primary interface: ```go // State defines an abstract state along, namely its state transition function // that takes as input an event and an environment, and returns a state // transition (next state, and set of events to emit). As state can also either // be terminal, or not, a terminal event causes state execution to halt. type State[Event any, Env Environment] interface { // ProcessEvent takes an event and an environment, and returns a new // state transition. This will be iteratively called until either a // terminal state is reached, or no further internal events are // emitted. ProcessEvent(event Event, env Env) (*StateTransition[Event, Env], error) // IsTerminal returns true if this state is terminal, and false otherwise. IsTerminal() bool } ``` With their focus being _only_ on each state transition, rather than all the boiler plate involved (processing new events, advancing to completion, doing I/O, etc, etc). Instead, they just make their states, then create the state machine given the starting state and env. The only other custom component needed is something capable of mapping wire messages or other events from the "outside world" into the domain of the state machine. The set of types is based on a pseudo sum type system wherein you declare an interface, make the sole method private, then create other instances based on that interface. This restricts call sites (must pass in that interface) type, and with some tooling, exhaustive matching can also be enforced via a linter. The best way to get a hang of the pattern proposed here is to check out the tests. They make a mock state machine, and then use the new executor to drive it to completion. You'll also get a view of how the code will actually look, with the focus being on the: input event, current state, and output transition (can also emit events to drive itself forward).

In this commit, we add an optional daemon event that can be specified to dispatch during init. This is useful for instances where before we start, we want to make sure we have a registered spend/conf notification before normal operation starts. We also add new unit tests to cover this, and the prior spend/conf event additions.

In this commit, we add the ability for the state machine to consume wire messages. This'll allow the creation of a new generic message router that takes the place of the current peer `readHandler` in an upcoming commit.

This'll be used later to uniquely identify state machines for routing/dispatch purposes.

We'll use this to be able to signal to a caller that a critical error occurred during the state transition.

Adding this makes a state machine easier to unit test, as the caller can specify a custom polling interval.

In this commit, we add the SpendMapper which allows callers to create custom spent events. Before this commit, the caller would be able to have an event sent to them in the case a spend happens, but that event wouldn't have any of the relevant spend details. With this new addition, the caller can specify how to take a generic spend event, and transform it into the state machine specific spend event.

In this commit, we update the execution logic to allow multiple internal events to be emitted. This is useful to handle potential out of order state transitions, as they can be cached, then emitted once the relevant pre-conditions have been met.

lightninglabs-deploy · 2024-09-05T01:02:10Z

@yyforyongyu: review reminder
@Crypt-iQ: review reminder
@morehouse: review reminder
@Roasbeef, remember to re-request review from reviewers when ready

Roasbeef added spec no-changelog protocol fsm labels Jan 3, 2024

Roasbeef added this to the v0.18.0 milestone Jan 3, 2024

Roasbeef requested a review from ProofOfKeags January 3, 2024 01:39

Roasbeef force-pushed the fn-module-goodies branch from 9c090b1 to f35b72e Compare January 3, 2024 02:35

Roasbeef force-pushed the protofsm branch from 1b5bd31 to 4aff35b Compare January 3, 2024 02:36

yyforyongyu reviewed Jan 3, 2024

View reviewed changes

Roasbeef commented Jan 4, 2024

View reviewed changes

protofsm/state_machine.go Outdated Show resolved Hide resolved

Roasbeef commented Jan 4, 2024

View reviewed changes

protofsm/state_machine_test.go Outdated Show resolved Hide resolved

saubyk assigned Roasbeef Jan 4, 2024

ProofOfKeags reviewed Jan 8, 2024

View reviewed changes

Roasbeef force-pushed the fn-module-goodies branch from f35b72e to 1d1c138 Compare January 24, 2024 03:12

Roasbeef changed the base branch from fn-module-goodies to master January 24, 2024 03:21

Roasbeef force-pushed the protofsm branch from 6c75f3e to b1a273c Compare January 24, 2024 03:24

Roasbeef force-pushed the protofsm branch from 66d9199 to 345bd6d Compare February 6, 2024 03:06

Roasbeef requested review from ProofOfKeags and yyforyongyu February 6, 2024 03:07

guggero reviewed Feb 6, 2024

View reviewed changes

protofsm/state_machine_test.go Show resolved Hide resolved

Roasbeef force-pushed the protofsm branch 2 times, most recently from e0265c1 to 057c481 Compare February 7, 2024 00:16

ProofOfKeags requested changes Feb 8, 2024

View reviewed changes

Roasbeef force-pushed the protofsm branch from 057c481 to f8c9d29 Compare February 29, 2024 22:47

Roasbeef force-pushed the protofsm branch from 65f991b to bc513b1 Compare March 5, 2024 05:52

ProofOfKeags requested changes Mar 7, 2024

View reviewed changes

Roasbeef changed the title ~~protofsm: add new package for driving generic protocol FSMs~~ [1/4] - protofsm: add new package for driving generic protocol FSMs Mar 8, 2024

saubyk modified the milestones: v0.18.0, v0.18.1 Mar 21, 2024

morehouse mentioned this pull request Mar 25, 2024

[2/4] - peer: add new abstract message router #8520

Merged

saubyk added the P1 MUST be fixed or reviewed label Jun 25, 2024

saubyk modified the milestones: v0.18.3, v0.19.0 Aug 1, 2024

Roasbeef force-pushed the protofsm branch from 578288e to 90f3370 Compare August 2, 2024 01:32

Roasbeef requested a review from ProofOfKeags August 2, 2024 01:43

ProofOfKeags requested changes Aug 14, 2024

View reviewed changes

Roasbeef added 11 commits August 16, 2024 14:59

protofsm: add daemon events for spend+conf registration

014efb3

protofsm: convert state machine args into config

75dffff

protofsm: add ability for state machine to consume wire msgs

e8fbe2d

In this commit, we add the ability for the state machine to consume wire messages. This'll allow the creation of a new generic message router that takes the place of the current peer `readHandler` in an upcoming commit.

protofsm: add a Name() method to the env

edaa465

This'll be used later to uniquely identify state machines for routing/dispatch purposes.

protofsm: add logging

cd26a48

protofsm: add ErrorReporter interface

12e9e07

We'll use this to be able to signal to a caller that a critical error occurred during the state transition.

protofsm: add CustomPollInterval for mocking purposes

383e192

Adding this makes a state machine easier to unit test, as the caller can specify a custom polling interval.

Roasbeef force-pushed the protofsm branch from 90f3370 to 06c691e Compare August 16, 2024 22:00

[1/4] - protofsm: add new package for driving generic protocol FSMs #8337

Are you sure you want to change the base?

[1/4] - protofsm: add new package for driving generic protocol FSMs #8337

Conversation

Roasbeef commented Jan 3, 2024

github-actions bot commented Jan 3, 2024

Pull reviewers stats

yyforyongyu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Roasbeef commented Jan 4, 2024

ProofOfKeags left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coderabbitai bot commented Jan 24, 2024 • edited Loading

Review skipped

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Roasbeef commented Feb 6, 2024

ProofOfKeags left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Roasbeef commented Feb 29, 2024

Roasbeef commented Mar 6, 2024

ProofOfKeags left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

morehouse commented Mar 22, 2024

Roasbeef commented Apr 4, 2024

Roasbeef commented Apr 4, 2024 • edited Loading

ProofOfKeags commented Apr 4, 2024

Roasbeef commented Aug 2, 2024

ProofOfKeags left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coderabbitai bot commented Jan 24, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

Roasbeef commented Apr 4, 2024 •

edited

Loading