Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Netpoller for noos #12

Merged
merged 4 commits into from
Apr 11, 2025

Conversation

clktmr
Copy link

@clktmr clktmr commented Dec 11, 2024

Makes the wakerq global and consumed by a minimal netpoller implementation. The netpoller uses the same sync mechanics as the common netpoller.go implementation. It has one additional state that protects against false wakeup, if a note is enqueued during a Clear.

This is a performance optimization. The current rtos.Note implementation goes into a blocking syscall on every Sleep call, causing an expensive handoff (~800us CPU time on N64). Using the netpoller instead allow to use an ordinary goroutine sleep instead.

The need for this arised while implementing a driver for an DMA controller. Interrupts might trigger every few milliseceonds, which is too long to wait but also too short to sleep on a Note if it eats almost a whole millisecond of CPU time.

TODO

  • Move to netpoller_noos.go
  • Maybe avoid need for mutex?
  • Don't reuse the time.Sleep timer?
  • Testing

@clktmr
Copy link
Author

clktmr commented Dec 13, 2024

Since rtos.Note can be woken up multiple times, I also wan't to propose to make rtos/Note.Clear() return the current state of the Note, see 1706d44. I think this is considered an API breaking change, maybe add something like TryClear() instead.

This will be helpful for interrupts that can trigger independently. Otherwise there is no way to do this race-free.

@clktmr
Copy link
Author

clktmr commented Apr 5, 2025

@michalderkacz As you suggested via email, I had a look how we could allow for a goroutine to sleep on multiple notes/pollfds. The difficulty I see in this is, that it probably breaks the atomics based sync mechanism that I adapted from netpoll_epoll.go: Setting notel.g atomically must then be done multiple times. It might be possible, even trivially, but at the moment I don't know because I'm not savvy enough to come up with my own sync mechanism in the runtime. But at the very least, the implementation will differ very much from netpoll_epoll.go. At that point one might also consider to implement an epoll syscall in noos and just use netpoll_epoll.go instead.

I propose to merge this after I cleaned it up and you approve the changes made, as a transparent change without API changes to rtos.Note(). I thought deeply about the current implementation and tested it to a fair amount. I currently have no real-world needs to sleep on multiple pollfds, and as long as it's not performance relevant, it can be implemented at higher levels. If the need arises, we can built upon this in a new PR.

@embeddedgo
Copy link
Owner

I prefer to leave the current Note implementation as it is for now, without any changes. I've some things that rely on it and want move slowly to any new mechanism. Let it live for a while next to the new netpoll based type (say Event). We can then play with the new Event type for a while, allowing all (mostly my own) existing code to be compiled with new version of Go and still use the old notes or revert to them in case of any problem. The current Note implementation is also the only possible mechanisms to communicate with the raw tasks because they are out of the Go scheduler scope and I want them to be available at least for a while.

I currently have no real-world needs to sleep on multiple pollfds, and as long as it's not performance relevant, it can be implemented at higher levels.

The lack of such select like mechanism is bothersome. It turns around the natural way to do things. If you want a goroutine to wait for two events at the same time the higher level approach requires two additional goroutines with two additional notes or run your event loop using two goroutines with a mutex (this last approach may be problematic if your event loop is also a state machine). As this muti-goroutine approach is almost unacceptable (for me) you end up with a somehow reversed interface that the receiver must give its note to both senders (event sources) instead to simply wait on two event-like variables published by senders. If there are multiple entities interested in one event, the event source must maintain a collection of their notes. I haven't encountered this last case (multiple receivers of one event) in practice yet but the case with the need to wait for two events is quite common.

But let's give up this select like behavior for now and start with your current implementation.

Short TODO list for this PR:

  1. Create the netpool_noos.go for the noos implementation and leave the netpool_stub.go as it is for plan9.
  2. Probably the only thing that should stay in the tasker_noos.go file is checking cpuctx.waker (consider changing its name to wakeNetpoller or something like this) and waking the netpoller task. All the other things related to new note/event type in the runtime can be probably moved to the netpool_noos.go file. This will make this PR very clean change that touches only one existing file, adding only few lines to it. All new things will be in new files in the runtime and rtos packages.
  3. As this PR will add a new synchronization primitive it won't affect existing code so It can be merged without writing new tests/examples for all supported targets. But eventually we need such examples to prove it works for example on Pico with its two cores. The ISR/tasker related things in this code are very simple so I don't expect any multicore flaws but only tests can confirm this.

@clktmr
Copy link
Author

clktmr commented Apr 7, 2025

I do now see that it's a bad idea to change the Note type. While the interface is the same, the new implementation differs in subtle ways. I will create a new type and check if I still need the additional pdClear state, that was added to get the same semantics as the old Note type.

I haven't tried, but given the Note type you might get select-like behaviour with an additional type, let's call it Event. Event.Register(*Note) adds a Note to the event that gets woken up with Event.Signal(), which is called from the interrupt. This is just your second 'reverse interface' approach, hidden behind a new type.

Implementing this behavior directly in the netpoller should, in my opinion, only be done if it's necessary due to synchronization or performance. I'm probably missing something, but if you think this approach might work for you I can give it a try when cleaning up the PR.

@embeddedgo
Copy link
Owner

I haven't tried, but given the Note type you might get select-like behaviour with an additional type, let's call it Event. Event.Register(*Note)...

Your proposal is interesting. I see it more like a tree of events so you can register other event variables to any event variable. But the correct implementation of it if you allow also unregister operations will be difficult because our event sources are mainly IRQ handlers (we have Go channels for non-IRQ stuff) which enforces you to use lockless lists of events and my experiences with such types are that they are very difficult to implement and not as lightweight as you expect for such IRQ use case. Instead, I have good experiences with non-precise events (see them hash tables reduced to hash only, compared to the described above precise event trees).

Let's give up it for now and simply implement a new synchronization mechanisms similar to note in the way that it remembers the fact of Signal/Send/Wakeup/etc. call for upcoming Recv/Wait/Sleep/etc. The detailed behavior and naming is in your hands so indeed it may be more optimized and lightweight than any possible simulation of the runtime.note.

@clktmr
Copy link
Author

clktmr commented Apr 10, 2025

I rebased on the master-embedded and moved the implementation to netpoll_noos.go and left the rtos.Note implementation as is.

The new type is called rtos.Cond and provides only two methods: Signal and Wait, with Wait implicitly clearing the Cond. This basically represents a binary semaphore, which is also the main sync primitive in FreeRTOS for ISRs. It also comes naturally out of a minimal netpoller, which simplified the implementation a bit.

A clear can still be done by calling Wait(0). This also returnes the Cond state, as we discussed previously this might be needed in some cases.

I will go again over my comments, TODOs, FIXMEs.. But if you want to have a look already I would be interested in you comments about (1) the changes made to thetasker from a SMP perspective and (2) the new Cond type from a unprecise-events perspective. I had difficulties fully grasping what you said about unprecise-events and couldn't really find online resources to improve my understanding.

This solution is copied from netpoll.go
@embeddedgo embeddedgo merged commit b914598 into embeddedgo:master-embedded Apr 11, 2025
@embeddedgo
Copy link
Owner

embeddedgo commented Apr 11, 2025

I've written a small example for Pi Pico that uses rtos.Cond (in fact, it is for the WeAct board, because of its ready to use onboard button). It works fine. We need more serious tests with interrupts handled by both cores, goroutines locked (or not) to the threads, threads locked (or not) to different cores.

About the Clean operation, I imagine much of this in my code:

d.dma.SetSrc(buf.Addr())
d.p.CleanIRQs(mask)

d.cond.Wait(0) // clean cond before enable interrupts and start DMA

d.dma.Start()
d.p.EnableIRQs(mask)

but maybe I'm wrong. I like minimal interfaces so we can try it as is. The Clean method may be added in the future if we get too much such comments explaining what the Wait(0) does.

The Clean method is useful to clear all spurious signals that may occur before you setup everything. This is especially possible if you have for example two ISRs that signal on the same cond and you wait for both at the same time (my "reverse interface" case, you explicitly allowed it). It's also useful if Clean has the "release" or "publication barrier" semantic to synchronize the d.dma.Start after the setup data in the RAM.

The separate Clean method makes also clear that some signals may be missed between the return from Wait and the Clean call. The same can be said about the current Wait semantic that consumes the signal but it isn't as obvious.

@embeddedgo
Copy link
Owner

(1) the changes made to thetasker from a SMP perspective

Can you explain this question more? Do you mean last changes I did when added support for Pi Pico?

(2) the new Cond type from a unprecise-events perspective.

From "unprecise-events" perspective the Cond and Note types can be implemented using them as a low level signaling mechanism.

I had difficulties fully grasping what you said about unprecise-events and couldn't really find online resources to improve my understanding.

I didn't read about them anywhere. Maybe the hardware support for WFE instruction in ARM was my inspiration to invent them (I really think I didn't invent them; they probably exists under different name somewhere).

So let's explain this concept starting from the extremely non-precise case. Let's goroutines have an event bit in their G struct that informs about the occurence of an event. The interface is like this.

func RegisterEvent() // registers the gotoutine to wait for events
func UnregisterEvent()
func SendEvent() // sets the G.event bit for all registered goroutines, wakes-up these that wait for it
func WaitEvent(deadline) bool // wait for G.event != 0, clears it at exit 

The simplest, correct but inefficient implementation is no-op for three from these functions and almost no-op for WaitEvent (it must check the deadline). Such implementation reduces any algorithm that uses Event to polling.

Our new Cond type can be implemented like this:

type Cond struct { v atomic.Bool }

func (c *Cond) Signal() {
  c.v.Store(true)
  SendEvent()
}

func (c *Cond) Wait(timeout time.Duration) bool {
  var deadline time.Time
  if c.v.Load() {
    goto end // fast path
  }
  if timeout == 0 {
    return false
  }
  if timeout > 0 {
    deadline = time.Now().Add(timeout)
  }
  RegisterEvent()
  defer UnregisterEvent()
  for !c.v.Load()  {
    if !WaitEvent(deadline) {
      return false
    }
  }
end:
  c.v.Store(false)
  return true
}

As you can see, it's extremely imprecise. Any Signal call on any Cond variable wakes up all registered goroutines. But it can work reasonably well if the number of groutines that sleep at the same time waiting for an event is small (say 3-4) and the Signal calls are infrequent.

What can be done to improve this situation? The one-bit G.event flag can be extended to 32-bits or 64-bits. Now the interface will be like this:

type Event uint

cons genMask = unsafe.Sizeof(uint)*8 - 1

var eventGen atomic.Uint

func AllocEvent() Event { return 1<<(eventGen.Add(1)&genMask) }

func (e Event) Register()  // Event(0).Register() can be used to unregister
func (e Event) Send()
func WaitEvent(deadline) bool // waits for registered events

Until the number of allocated events is greater than 32 (64 on 64-bit machine) these events are precise. This seems to be fine for our interrupt use case where we have a limited number of IRQs. We must accept some growing inefficiency if the number of allocated and actively used events is becoming greater than the number of event bits.

The Cond type will be

type Cond struct {
  v atomic.Bool
  e Event
}

You can probably see the implementation of all its methods.

The hardest thing to implement efficiently in case of such non-precise events seems to be the Register and Send methods. Simple implementation may maintain single list of registered goroutines without paying attention to the Event they register for. The netpoller can have the signalEventReg atomic.Uint variable and the Event.Send method can simply call signalEventReg.Or(uint(e)) then wake up the netpoller which uses signalEventReg.Swap(0) to get signaled events and check the goroutines on the list. But waking up the netpoller thread on every event is probably unacceptable. The Signal method should probably check itself is it any goroutine waiting for its event to wakeup. This can be probably done efficiently by maintaining global waitEventReg uintptr variable.

I don't want to go deeper into possible implementation details (I've already gone too far). I think you already have the picture and see how it fits (or nor) into the netpoller concept.

@clktmr
Copy link
Author

clktmr commented Apr 12, 2025

I currently have a lot of Wait(0) in my code, but I'll have to review if they are really needed. If there are spurious interrupts after things are setup correctly, there is probably already a bug.

A problem with rtos.Note is that it's designed for one-time notifications. If we allow to wake them multiple times, as you said there is the chance to miss interrupts between Sleep and Clear. Making Sleep and Clear an atomic operation in Wait prevents this possibility. I think of a mailbox interrupt that notifies about available data. This also clarifies (your intuition was right) why Clear doesn't return the current value via atomic Cas, as it encourages false usage of Note.

I also think the minimal interface is in line with Go idioms. I see this more like a sync primitive, like futexes, which can be used to build more specific sync types that fit your needs better. These wouldn't necessarily need to live in embedded/rtos.

You comment about unprecise events made things clearer for me, thanks for clarifying! What I specifically meant was, if you think it's possible to implement the API you described using the Cond type. But I see you think about it the other way rount and probably it's not important right now, let's rather get some hands on experience with the new Cond type.

Regarding SMP, I'm interested if you see any issues if waking the netpoller from another CPU.

PS: A PR with WIP prefix is considered not ready for merge. I will open another PR with some changes I was about to push.

@clktmr clktmr deleted the netpoller-noos branch April 12, 2025 17:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants