Skip to content

L96: .NET: Load balancing, connectivity and wait for ready in client #240

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Mar 8, 2022

Conversation

JamesNK
Copy link
Member

@JamesNK JamesNK commented May 18, 2021

@jtattermusch
Copy link
Contributor

@markdroth has agreed to review this from the LB perspective, I'll review from the C# perspective.

@markdroth looks like this is ready for initial set of comments.

Copy link
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope this info is helpful. Please let me know if you have any questions.

@ejona86 and @dfawley may want to weigh in on this as well.

Copy link
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the API definitions -- that's really helpful for me to tell where changes are needed.

Please let me know if you have any questions. Thanks!

@JamesNK JamesNK force-pushed the jamesnk/grpc-dotnet-loadbalancing branch from becebe5 to cf879d7 Compare May 31, 2021 05:15
@JamesNK JamesNK changed the title L80: Load balancing in grpc-dotnet L80: .NET: Load balancing, connectivity and wait for ready in client Jun 5, 2021
Copy link
Contributor

@jtattermusch jtattermusch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm happy with the current version of the proposal. Overall it makes sense to me, but of course @markdroth is the expert on LB, so you'll definitely need his approval as well.

The details of the C# APIs will be figured out as part of reviewing the implementation.

@jtattermusch
Copy link
Contributor

@markdroth can you confirm that all your concerns been sufficiently addressed by approving the PR? If so, I'd like to go ahead, mark the the proposal as approved and merge it.

@Falco20019
Copy link
Contributor

Short ping @markdroth

What's the current state? Right now we have the issue, that we still use Grpc.Core (since we need to support Windows + Android). We look into switching to grpc-dotnet once .NET 6 is coming. We think right now that most issues should be solvable, only our need for xDS is currently the hottest topic. When support is ended in May 2022, grpclb and xds would both currently not supported officially anymore. So it would be very important for us, that this is finalized and usable until then :) Basically the same topic that @pcwiese mentioned in #521.

Copy link
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay here! Overall, this looks fairly solid, but there are a few remaining open issues to resolve.

Side note: Please avoid force-pushing to this branch, since that makes it harder to review. I think the reason why github isn't letting me reply to some of the existing comment threads is that they were started on a commit that no longer exists.

Please let me know if you have any questions. Thanks!


* `ConnectionManager.PickAsync` is called to get a ready subchannel along with the subchannel's connected address.
* A picker returns a result of one of four types:
* Complete - Has a subchannel. If the subchannel is still in a Ready state then the current address is used for the RPC.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the subchannel is not still in a Ready state, the channel should queue the pick. This is done to handle the race condition between the transport and the LB policy: by the time the picker returns the subchannel, the underlying transport may have become disconnected.

The expectation when this happens is that as soon as the LB policy gets notified of the subchannel state change, it will return a new picker, which will cause the queued pick to be retried.

Copy link
Member Author

@JamesNK JamesNK Nov 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it queues in a manner of speaking. .NET supports async/await, so the promise returned from ConnectionManager.PickAsync is pending until there is a ready subchannel.

I'll add that info to the proposal.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, I'm not talking about the case where the LB policy doesn't have a READY subchannel to hand out; I'm talking about the race condition where the LB policy thinks that the subchannel is in state READY but by the time the result gets back from the picker, the subchannel does not have an underlying connection to provide to the channel. In that case, the channel should basically discard the pick result and keep the request queued until the LB policy returns a new picker.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In keeping with the ISubchannelCallTracker change, this needs to change to say that if the subchannel is not in READY state, the call will be queued instead.

Copy link
Member Author

@JamesNK JamesNK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor update to API to reflect bug fix in recent version

Copy link
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay; I've been incredibly busy lately.

The resolver changes look a lot better. I have a couple of remaining suggestions to refine it, but I'll leave them up to you.

The remaining issue is the other open discussion about the onComplete() callback. I think the way you currently have that structured will not work for the case where the subchannel no longer has a transport by the time the LB pick gets back to the channel.

Please let me know if you have any questions. Thanks!

Copy link
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good!

@JamesNK JamesNK force-pushed the jamesnk/grpc-dotnet-loadbalancing branch from c4236ac to 5f94c51 Compare February 25, 2022 00:51
@markdroth
Copy link
Member

Thanks, this looks great!

I'll let @jtattermusch merge this when he's happy with it. (I think he's out of the office this week.)

@jtattermusch jtattermusch changed the title L80: .NET: Load balancing, connectivity and wait for ready in client L96: .NET: Load balancing, connectivity and wait for ready in client Mar 8, 2022
@jtattermusch
Copy link
Contributor

Thanks @markdroth for the detailed review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants