8000 Added optional choice for hdfs clients by pietermarsman · Pull Request #2487 · spotify/luigi · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Added optional choice for hdfs clients #2487

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Aug 27, 2018
Merged

Conversation

pietermarsman
Copy link
Contributor
@pietermarsman pietermarsman commented Aug 10, 2018

Description

Added optional choice for hdfs clients. Previously only InsecureClient was supported. This PR adds KerberosClient and TokenClient.

Motivation and Context

Some hdfs work with kerberos authentication. Mine does... This adds support for those clusters.

This fixes #2481.

Have you tested this? If so, how?

I have tested the KerberosClient in practice and it works! (Thousand times faster than hadoopcli :) )

Not tested the TokenClient though, because I do not have access to a Hadoop cluster with token authentication.

Copy link
Contributor
@Tarrasch Tarrasch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Maybe add tests if you have time and don't want this to break in the future.

Once you’ve addressed the issues and used this in for a few days we can merge.

By the way do your company use both kerberos and token? Otherwise I would only add the one your company uses (so we don't have unused code paths in the codebase)

@@ -43,6 +43,9 @@ class webhdfs(luigi.Config):
description='Port for webhdfs')
user = luigi.Parameter(default='', description='Defaults to $USER envvar',
config_path=dict(section='hdfs', name='user'))
client_type = luigi.Parameter(default='insecure',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use ChoiceParameter?

import hdfs
return hdfs.TokenClient(url=self.url, token=self.token)
else:
raise ValueError("Error: Unknown client type specified in webhdfs client_type"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could skip this if you use the ChoiceParameter.

Copy link
Collaborator
@dlstadther dlstadther left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @Tarrasch on tests and only adding the client_types which your company uses for now.

Thanks for the submission!

@@ -43,6 +43,9 @@ class webhdfs(luigi.Config):
description='Port for webhdfs')
user = luigi.Parameter(default='', description='Defaults to $USER envvar',
config_path=dict(section='hdfs', name='user'))
client_type = luigi.Parameter(default='insecure',
description='Type of client to use. One of insecure, kerberos or token')
token = luigi.Parameter(default='', description='Hadoop delegation token, only used when client_type="token"')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@daveFNbuck Added an OptionalParameter. Would that be applicable here?

@pietermarsman
Copy link
Contributor Author

Only adding KerberosClient now.

And I've added some really basic test. Don't know what your standards are, so if you have any specific test in mind let me know.

Also changed Parameter to ChoiceParameter. This removed the need for the ValueError is the else clause, so now I made InsecureClient the default in the else clause.

In the next couple of days I will test this code in a couple of projects.

Copy link
Collaborator
@dlstadther dlstadther left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Report back with a 👍 or 👎 related to your production runs with this code!

Thanks

@pietermarsman
Copy link
Contributor Author
pietermarsman commented Aug 27, 2018

👍 👍

(it took a while to get it on production, but it runs there :) )

@dlstadther
Copy link
Collaborator

Thanks for reporting back @pietermarsman ! Glad to hear things functioning as expected

@dlstadther dlstadther merged commit 151ca04 into spotify:master Aug 27, 2018
dlstadther added a commit to dlstadther/luigi that referenced this pull request Sep 1, 2018
* upstream-master:
  Fix S3Client.copy return value consistency (spotify#2488)
  s3client check for deprecated host keyword and raise error with the details (spotify#2493)
  Fix exception when toml lib is not installed (spotify#2506)
  Add Okko to companies that use luigi (spotify#2512)
  Added optional choice for hdfs clients  (spotify#2487)
  Version 2.7.8
  revert tornado upgrade
  Version 2.7.7
  added a new event 'progress' (spotify#2498)
  Add Uppsala University / pharmb.io as user (spotify#2496)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use of KerberosClient with webhdfs
3 participants
0