-
Notifications
You must be signed in to change notification settings - Fork 3k
Enable HTTP proxy support for the client used by REST Catalog #12406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@flyrain Could you please take a look at this PR |
|
@adutra Could you please take a look at this PR |
|
Hi @akhilputhiry while I understand the problem I think there are a few concerns with this PR: First off, there is already some proxy support in
Can you confirm that adding Secondly, the introduction of proxy support in This may not be desirable in all cases. I would like to make it possible to select different proxy configurations depending on the request URL. Do you think that would be possible with |
|
Thanks for the feedback @adutra, please find my thoughts below
Yes it works, I had tested with
Thinking of making it explicit to RESTCatalog by moving to
For IDP scenario, similar approach of having proxy setting and using it via builder should address the problem I believe Updated the PR with new approach. Let me know your thoughts. Thanks |
94d9363 to
7735cc1
Compare
| Integer.parseInt(config.get(CatalogProperties.PROXY_PORT))); | ||
| } | ||
|
|
||
| return builder.build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, you ended up giving up on ProxySupport eventually?
The current code looks OK to me, although, it still doesn't solve the problem when 2 different proxies will be required for contacting the catalog server and the authorization server.
It also doesn't address proxy credentials, but this could be done as a follow-up task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adutra Thanks for the feedback
For point 3, I have added support for simple auth for now, we can add other mechanisms in follow up PRs
For point 2, I shall try the following.
we could use the ProxyRoutePlanner to use different proxies for different domains
The confs would look something like the following, we can adjust the domains parameter so that proxy will be selected accordingly
conf.set("spark.sql.catalog.demo.proxy.myproxy1.hostname", "127.0.0.1")
conf.set("spark.sql.catalog.demo.proxy.myproxy1.port", "8080")
conf.set("spark.sql.catalog.demo.proxy.myproxy1.requires-credentials", "true")
conf.set("spark.sql.catalog.demo.proxy.myproxy1.username", "ac")
conf.set("spark.sql.catalog.demo.proxy.myproxy1.password", "dc")
conf.set("spark.sql.catalog.demo.proxy.myproxy1.domains", "*")
conf.set("spark.sql.catalog.demo.proxy.myproxy1.priority", "1")
conf.set("spark.sql.catalog.demo.proxy.myproxy2.hostname", "127.0.0.1")
conf.set("spark.sql.catalog.demo.proxy.myproxy2.port", "9090")
conf.set("spark.sql.catalog.demo.proxy.myproxy2.requires-credentials", "true")
conf.set("spark.sql.catalog.demo.proxy.myproxy2.username", "ac")
conf.set("spark.sql.catalog.demo.proxy.myproxy2.password", "dc")
conf.set("spark.sql.catalog.demo.proxy.myproxy2.domains", "*")
conf.set("spark.sql.catalog.demo.proxy.myproxy2.priority", "2")
BTW i am using mitmproxy for my testing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also would it be okay to implement multiple proxy support in a different PR ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could use the ProxyRoutePlanner to use different proxies for different domains
Sounds very promising! Your example is pretty much what I had in mind.
Also would it be okay to implement multiple proxy support in a different PR ?
Sure, that's fine with me. Btw I can approve the PR, but I am not a committer, so you will need to obtain another review from somebody else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @adutra I have made changes as per your suggestion.
Can you point me to some committers who will be able to take a look.
| public static final String PROXY_HOSTNAME = "proxy.hostname"; | ||
| public static final String PROXY_PORT = "proxy.port"; | ||
|
|
||
| public static final String PROXY_REQUIRES_CREDENTIALS = "proxy.requires-credentials"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This property could maybe be inferred from the presence (or absence) of PROXY_USERNAME and PROXY_PASSWORD.
0e897dd to
015f3df
Compare
|
@amogh-jahagirdar @rdblue @nastra Could you folks please take a look at this |
flyrain
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @akhilputhiry for working on it. LGTM with minor comments.
|
|
||
| /** http proxy configuration for rest catalog */ | ||
| public static final String PROXY_HOSTNAME = "proxy.hostname"; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: remove the empty line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The line is added automatically when I run 'gradle spotlessJavaApply'
| SessionCatalog.SessionContext.createEmpty(), | ||
| config -> HTTPClient.builder(config).uri(config.get(CatalogProperties.URI)).build()); | ||
| config -> { | ||
| HTTPClient.Builder builder = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so I think one possible issue that I can currently think of is that this approach will only work for the REST catalog itself, but things like refreshing vended credentials with S3/GCS (VendedCredentialsProvider/OAuth2RefreshCredentialsHandler) or S3 signing (S3V4RestSignerClient) won't work, since those places instantiate their own HTTP client that wouldn't configure the proxy
015f3df to
16636c3
Compare
|
Thanks for the feedback @nastra |
| Integer proxyPort = | ||
| PropertyUtil.propertyAsNullableInt(properties, HTTPClient.REST_PROXY_PORT); | ||
|
|
||
| if (proxyHostname != null && !proxyHostname.isEmpty() && proxyPort != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use !Strings.isNullOrEmpty(proxyHostname). Same further below with username/password
nastra
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add some tests to TestHTTPClient by passing different properties and ensuring that the proxy has been configured with/without auth
ec0c4d7 to
7f49607
Compare
|
Thanks @nastra |
d363790 to
3cece07
Compare
| PropertyUtil.propertyAsString(properties, HTTPClient.REST_PROXY_PASSWORD, null); | ||
|
|
||
| if (!Strings.isNullOrEmpty(proxyUsername) && !Strings.isNullOrEmpty(proxyPassword)) { | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| import org.mockserver.model.HttpResponse; | ||
| import org.mockserver.verify.VerificationTimes; | ||
|
|
||
| import static org.assertj.core.api.Assertions.assertThat; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you might need to double-check your IDE settings, but static imports need to be at the top
|
@akhilputhiry can you please fix the ordering of the static imports and also update the PR title to reflect the latest changes? |
3cece07 to
79f4035
Compare
79f4035 to
570cf2a
Compare
|
@nastra recreated the IDEA project files using the following The imports are good now Thanks |
nastra
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @akhilputhiry. @amogh-jahagirdar / @danielcweeks can you also please take a look?
|
I have a few concerns here and it may overlap a little with what @adutra was getting at. This appears to tunnel a separate config/auth to the http client as opposed to extending and using the AuthManager. Since this is primarily setting host/port and auth, why wouldn't we configure this via basic auth manager or a proxy auth manager? I'm just concerned we're creating two alternate paths to configure auth. |
@danielcweeks I am fine having the proxy authentication being done through the http client own machinery rather than an auth manager, for a few reasons:
I expressed however a different concern: we are moving towards a world where the REST client needs to talk to TWO servers instead of one: the catalog server and the authorization server. We should therefore make it possible for the client to use different proxy settings for each server. @akhilputhiry proposed a solution for that using "named" proxy configs: The proposal is interesting but I don't think it has been implemented in this PR, so we'd need to address that as a follow-up task. To summarize my POV: I'm +1 on this PR, provided that we introduce multi-host proxy settings later on. |
I'm not convinced we want to go down this path until there are real world examples where we would have both. We're adding a lot of complexity to the configuration and I don't want to do that speculatively. As for the AuthManager vs. native client, I thought it would be possible (I think it might be for HTTP), but for https it's a little more complicated with how the client communicates with the proxy server. |
|
@danielcweeks @adutra |
|
@akhilputhiry @danielcweeks @adutra any plans to merge this soon? This is blocking some things I'm trying to do and trying to get a sense of timing. |
|
@sfc-gh-mbaron I am also eagerly waiting for this to be merged @nastra @adutra @danielcweeks @amogh-jahagirdar Could you folks please help to move this forward. Thanks |
|
Wanted to follow up on this |
|
Thanks for the discussion. I'm leaning toward keeping proxy separate from The tiny bit we’re adding(an HTTP proxy) is more on the transport-level wiring for the HTTP client. Things like proxy settings, TLS, and timeouts normally belong to the HTTP transport layer and control how HTTP works. |
|
@flyrain @adutra @nastra @danielcweeks @amogh-jahagirdar any update here?? |
danielcweeks
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with exposing proxy support. +1
This PR Fixes the following issues
#12059
#9174
Notes: