-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Added optional choice for hdfs clients #2487
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… KerberosClient or TokenClient.
…ng it with an empty string
…e webhdfs config documentation section
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Maybe add tests if you have time and don't want this to break in the future.
Once you’ve addressed the issues and used this in for a few days we can merge.
By the way do your company use both kerberos and token? Otherwise I would only add the one your company uses (so we don't have unused code paths in the codebase)
luigi/contrib/hdfs/webhdfs_client.py
Outdated
@@ -43,6 +43,9 @@ class webhdfs(luigi.Config): | |||
description='Port for webhdfs') | |||
user = luigi.Parameter(default='', description='Defaults to $USER envvar', | |||
config_path=dict(section='hdfs', name='user')) | |||
client_type = luigi.Parameter(default='insecure', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use ChoiceParameter?
luigi/contrib/hdfs/webhdfs_client.py
Outdated
import hdfs | ||
return hdfs.TokenClient(url=self.url, token=self.token) | ||
else: | ||
raise ValueError("Error: Unknown client type specified in webhdfs client_type" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could skip this if you use the ChoiceParameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @Tarrasch on tests and only adding the client_types which your company uses for now.
Thanks for the submission!
luigi/contrib/hdfs/webhdfs_client.py
Outdated
@@ -43,6 +43,9 @@ class webhdfs(luigi.Config): | |||
description='Port for webhdfs') | |||
user = luigi.Parameter(default='', description='Defaults to $USER envvar', | |||
config_path=dict(section='hdfs', name='user')) | |||
client_type = luigi.Parameter(default='insecure', | |||
description='Type of client to use. One of insecure, kerberos or token') | |||
token = luigi.Parameter(default='', description='Hadoop delegation token, only used when client_type="token"') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@daveFNbuck Added an OptionalParameter
. Would that be applicable here?
Only adding KerberosClient now. And I've added some really basic test. Don't know what your standards are, so if you have any specific test in mind let me know. Also changed Parameter to ChoiceParameter. This removed the need for the ValueError is the else clause, so now I made InsecureClient the default in the else clause. In the next couple of days I will test this code in a couple of projects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Report back with a 👍 or 👎 related to your production runs with this code!
Thanks
👍 👍 (it took a while to get it on production, but it runs there :) ) |
Thanks for reporting back @pietermarsman ! Glad to hear things functioning as expected |
* upstream-master: Fix S3Client.copy return value consistency (spotify#2488) s3client check for deprecated host keyword and raise error with the details (spotify#2493) Fix exception when toml lib is not installed (spotify#2506) Add Okko to companies that use luigi (spotify#2512) Added optional choice for hdfs clients (spotify#2487) Version 2.7.8 revert tornado upgrade Version 2.7.7 added a new event 'progress' (spotify#2498) Add Uppsala University / pharmb.io as user (spotify#2496)
Description
Added optional choice for hdfs clients. Previously only InsecureClient was supported. This PR adds KerberosClient and TokenClient.
Motivation and Context
Some hdfs work with kerberos authentication. Mine does... This adds support for those clusters.
This fixes #2481.
Have you tested this? If so, how?
I have tested the KerberosClient in practice and it works! (Thousand times faster than hadoopcli :) )
Not tested the TokenClient though, because I do not have access to a Hadoop cluster with token authentication.