-
Notifications
You must be signed in to change notification settings - Fork 366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update webhdfs to support BasicAuth and apply proxy on every call #1409
Update webhdfs to support BasicAuth and apply proxy on every call #1409
Conversation
I see that the build of my PR is having a lot of failed checks. However, the changes I made should not affect these by any means as far as I can see ?!? |
Tests are being hit by pytest-dev/pytest-asyncio#655 - we'll have to wait for a fix upstream |
Some more testing today and had problems if I used the apply_proxy with dictionary for data_proxy on the location headers that were returned because it would start applying the proxy conversion as duplicates |
In case the user is not None and no password has been provided, we will default back to the original behavior
This reverts commit 90683e1.
Later in the code, we use self.password and compare that to None, so we must always store the argument as attribute
This looks good, but tests won't pass until pytest-asyncio resolve their current import issue, or I decide to stop waiting and pin to a previous version. |
@martindurant Thanks for merging this! I only have one quick question about when you are planning to create a new release that will include this modification. I am working downstream on the components in DVC that use this new feature and there it can only be merged if this modification is available in a new release. And only after DVC after that releases new versions can I use it directly in my code at work (instead of relying on some patched local branches where i hardcode some updated versions also) |
I try to do one every month. I may be a bit behind... |
Hi @martindurant, I was looking at the the code of the webhdfs implementation: as it stands I could also use a release version with the changes/fixes for basic authentication from Guido. (I'm trying to write files via Kedro to a hdfs cloudera cluster, which is fronted with a Knox server that takes care of authentication via username/password. Kedro uses fsspec behind the scenes to know how to intereact with WebHDFS.) Thx in advance. br, |
Hi @martindurant, thanks for the release! br, |
I am using DVC and I am trying to store my dvc cache on a webHDFS remote.
Unfortunately, I cannot directly use kerberos authentication and I must use Basic Authentication on a proxy server (which introduces some additional part in the path component of the URL (e.g. instead of /webhdfs/v1 it now becomes /gateway/cluster_name/webhdfs/v1)
I found that the current version of filesystem_spec does not support basic authentication, so this PR will add the support for that.
At the same time, I am using the proxy dictionary/callable argument on every outbound call that is made in the
_call
function.Would be happy to add additional unit tests, but unfortunately as far as I know the cluster created with the htcluster application does not support basic authentication.
The only thing I could add as test is check whether the right ValueError is raised whenever password is provided, but user is not.
I have tested this code with a modified version of DVC (that allows for the extra parameters) and using this updated version I was able to connect to our WebHDFS server without a problem