-
Notifications
You must be signed in to change notification settings - Fork 247
L96: .NET: Load balancing, connectivity and wait for ready in client #240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
L96: .NET: Load balancing, connectivity and wait for ready in client #240
Conversation
@markdroth has agreed to review this from the LB perspective, I'll review from the C# perspective. @markdroth looks like this is ready for initial set of comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the API definitions -- that's really helpful for me to tell where changes are needed.
Please let me know if you have any questions. Thanks!
becebe5
to
cf879d7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm happy with the current version of the proposal. Overall it makes sense to me, but of course @markdroth is the expert on LB, so you'll definitely need his approval as well.
The details of the C# APIs will be figured out as part of reviewing the implementation.
@markdroth can you confirm that all your concerns been sufficiently addressed by approving the PR? If so, I'd like to go ahead, mark the the proposal as approved and merge it. |
Short ping @markdroth What's the current state? Right now we have the issue, that we still use Grpc.Core (since we need to support Windows + Android). We look into switching to grpc-dotnet once .NET 6 is coming. We think right now that most issues should be solvable, only our need for xDS is currently the hottest topic. When support is ended in May 2022, grpclb and xds would both currently not supported officially anymore. So it would be very important for us, that this is finalized and usable until then :) Basically the same topic that @pcwiese mentioned in #521. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay here! Overall, this looks fairly solid, but there are a few remaining open issues to resolve.
Side note: Please avoid force-pushing to this branch, since that makes it harder to review. I think the reason why github isn't letting me reply to some of the existing comment threads is that they were started on a commit that no longer exists.
Please let me know if you have any questions. Thanks!
L80-csharp-load-balancing.md
Outdated
|
||
* `ConnectionManager.PickAsync` is called to get a ready subchannel along with the subchannel's connected address. | ||
* A picker returns a result of one of four types: | ||
* Complete - Has a subchannel. If the subchannel is still in a Ready state then the current address is used for the RPC. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the subchannel is not still in a Ready state, the channel should queue the pick. This is done to handle the race condition between the transport and the LB policy: by the time the picker returns the subchannel, the underlying transport may have become disconnected.
The expectation when this happens is that as soon as the LB policy gets notified of the subchannel state change, it will return a new picker, which will cause the queued pick to be retried.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it queues in a manner of speaking. .NET supports async/await, so the promise returned from ConnectionManager.PickAsync
is pending until there is a ready subchannel.
I'll add that info to the proposal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, I'm not talking about the case where the LB policy doesn't have a READY subchannel to hand out; I'm talking about the race condition where the LB policy thinks that the subchannel is in state READY but by the time the result gets back from the picker, the subchannel does not have an underlying connection to provide to the channel. In that case, the channel should basically discard the pick result and keep the request queued until the LB policy returns a new picker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In keeping with the ISubchannelCallTracker
change, this needs to change to say that if the subchannel is not in READY state, the call will be queued instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor update to API to reflect bug fix in recent version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay; I've been incredibly busy lately.
The resolver changes look a lot better. I have a couple of remaining suggestions to refine it, but I'll leave them up to you.
The remaining issue is the other open discussion about the onComplete()
callback. I think the way you currently have that structured will not work for the case where the subchannel no longer has a transport by the time the LB pick gets back to the channel.
Please let me know if you have any questions. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really good!
c4236ac
to
5f94c51
Compare
Thanks, this looks great! I'll let @jtattermusch merge this when he's happy with it. (I think he's out of the office this week.) |
Thanks @markdroth for the detailed review! |
Preview link: https://github.com/JamesNK/proposal/blob/jamesnk/grpc-dotnet-loadbalancing/L80-csharp-load-balancing.md