8000 Kerberos support by colinmarc · Pull Request #133 · colinmarc/hdfs · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Kerberos support #133

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Aug 1, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ go_import_path: github.com/colinmarc/hdfs
go: 1.x
env:
- PLATFORM=cdh5
- PLATFORM=cdh5 KERBEROS=true
- PLATFORM=hdp2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we not testing KERBEROS=true for hdp2 ?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it's redundant, and because I didn't factor out the krb setup stuff (although I could probably). I don't actually think it's really necessary to have hdp2 in there right now, but I do want to have hdp3 shortly (I think I need xenial travis support for that?) and cdh6 eventually.

install:
- git clone https://github.com/sstephenson/bats $HOME/bats
Expand All @@ -27,6 +28,6 @@ deploy:
repo: colinmarc/hdfs
tags: true
all_branches: true
condition: $PLATFORM = cdh5
condition: $PLATFORM = hdp2
cache:
- "$HOME/bats"
77 changes: 76 additions & 1 deletion Gopkg.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 4 additions & 0 deletions Gopkg.toml
Original file line number Diff line number Diff line change
Expand Up @@ -32,3 +32,7 @@
[prune]
go-tests = true
unused-packages = true

[[constraint]]
name = "gopkg.in/jcmturner/gokrb5.v5"
version = "5.2.0"
28 changes: 22 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,19 +84,35 @@ Installing the commandline client
Grab a tarball from the [releases page](https://github.com/colinmarc/hdfs/releases)
and unzip it wherever you like.

You'll want to add the following line to your `.bashrc` or `.profile`:
To configure the client, make sure one or both of these environment variables
point to your Hadoop configuration (`core-site.xml` and `hdfs-site.xml`). On
systems with Hadoop installed, they should already be set.

export HADOOP_NAMENODE="namenode:8020"
$ export HADOOP_HOME="/etc/hadoop"
$ export HADOOP_CONF_DIR="/etc/hadoop/conf"

To install tab completion globally on linux, copy or link the `bash_completion`
file which comes with the tarball into the right place:

ln -sT bash_completion /etc/bash_completion.d/gohdfs
$ ln -sT bash_completion /etc/bash_completion.d/gohdfs

By default, the HDFS user is set to the currently-logged-in user. You can
override this in your `.bashrc` or `.profile`:
By default on non-kerberized clusters, the HDFS user is set to the
currently-logged-in user. You can override this with another environment
variable:

export HADOOP_USER_NAME=username
$ export HADOOP_USER_NAME=username

Using the commandline client with Kerberos authentication
---------------------------------------------------------

Like `hadoop fs`, the commandline client expects a `ccache` file in the default
location: `/tmp/krb5cc_<uid>`. That means it should 'just work' to use `kinit`:

$ kinit bob@EXAMPLE.com
$ hdfs ls /

If that doesn't work, try setting the `KRB5CCNAME` environment variable to
wherever you have the `ccache` saved.

Compatibility
-------------
Expand Down
79 changes: 73 additions & 6 deletions client.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ package hdfs

import (
"context"
"errors"
"io"
"io/ioutil"
"net"
Expand All @@ -11,6 +12,7 @@ import (

hdfs "github.com/colinmarc/hdfs/protocol/hadoop_hdfs"
"github.com/colinmarc/hdfs/rpc"
krb "gopkg.in/jcmturner/gokrb5.v5/client"
)

// A Client represents a connection to an HDFS cluster
Expand Down Expand Up @@ -38,7 +40,9 @@ type Client struct {
type ClientOptions struct {
// Addresses specifies the namenode(s) to connect to.
Addresses []string
// User specifies which HDFS user the client will act as.
// User specifies which HDFS user the client will act as. It is required
// unless kerberos authentication is enabled, in which case it will be
// determined from the provided credentials if empty.
User string
// UseDatanodeHostname specifies whether the client should connect to the
// datanodes via hostname (which is useful in multi-homed setups) or IP
Expand All @@ -50,6 +54,17 @@ type ClientOptions struct {
// DatanodeDialFunc is used to connect to the datanodes. If nil, then
// (&net.Dialer{}).DialContext is used.
DatanodeDialFunc func(ctx context.Context, network, addr string) (net.Conn, error)
// KerberosClient is used to connect to kerberized HDFS clusters. If provided,
// the client will always mutually athenticate when connecting to the
// namenode(s).
KerberosClient *krb.Client
// KerberosServicePrincipleName specifies the Service Principle Name
// (<SERVICE>/<FQDN>) for the namenode(s). Like in the
// dfs.namenode.kerberos.principal property of core-site.xml, the special
// string '_HOST' can be substituted for the address of the namenode in a
// multi-namenode setup (for example: 'nn/_HOST'). It is required if
// KerberosClient is provided.
KerberosServicePrincipleName string
// Namenode optionally specifies an existing NamenodeConnection to wrap. This
// is useful if you needed to create the namenode net.Conn manually for
// whatever reason.
Expand All @@ -69,27 +84,77 @@ type ClientOptions struct {
//
// // Determined by dfs.client.use.datanode.hostname.
// UseDatanodeHostname bool
//
// // Set to a non-nil but empty client (without credentials) if the value of
// // hadoop.security.authentication is 'kerberos'. It must then be replaced
// // with a credentialed Kerberos client.
// KerberosClient *krb.Client
//
// // Determined by dfs.namenode.kerberos.principal, with the realm
// // (everything after the first '@') chopped off.
// KerberosServicePrincipleName string
//
// Because of the way Kerberos can be forced by the Hadoop configuration but not
// actually configured, you should check for whether KerberosClient is set in
// the resulting ClientOptions before proceeding:
//
// options, _ := ClientOptionsFromConf(conf)
// if options.KerberosClient != nil {
// // Replace with a valid credentialed client.
// options.KerberosClient = getKerberosClient()
// }
func ClientOptionsFromConf(conf HadoopConf) (ClientOptions, error) {
namenodes, err := conf.Namenodes()
options := ClientOptions{Addresses: namenodes}

options.UseDatanodeHostname = (conf["dfs.client.use.datanode.hostname"] == "true")

if strings.ToLower(conf["hadoop.security.authentication"]) == "kerberos" {
// Set an empty KerberosClient here so that the user is forced to either
// unset it (disabling kerberos altogether) or replace it with a valid
// client. If the user does neither, NewClient will return an error.
options.KerberosClient = &krb.Client{}
}

if conf["dfs.namenode.kerberos.principal"] != "" {
options.KerberosServicePrincipleName = strings.Split(conf["dfs.namenode.kerberos.principal"], "@")[0]
}

return options, err
}

// NewClient returns a connected Client for the given options, or an error if
// the client could not be created.
func NewClient(options ClientOptions) (*Client, error) {
var err error

if options.Namenode == nil {
if options.KerberosClient != nil && options.KerberosClient.Credentials == nil {
return nil, errors.New("kerberos enabled, but kerberos client is missing credentials")
}

if options.KerberosClient != nil && options.KerberosServicePrincipleName == "" {
return nil, errors.New("kerberos enabled, but kerberos namenode SPN is not provided")
}

if options.User == "" {
if options.KerberosClient != nil {
creds := options.KerberosClient.Credentials
options.User = creds.Username + "@" + creds.Realm
} else {
return nil, errors.New("user not specified")
}
}

options.Namenode, err = rpc.NewNamenodeConnectionWithOptions(
rpc.NamenodeConnectionOptions{
Addresses: options.Addresses,
User: options.User,
DialFunc: options.NamenodeDialFunc,
Addresses: options.Addresses,
User: options.User,
DialFunc: options.NamenodeDialFunc,
KerberosClient: options.KerberosClient,
KerberosServicePrincipleName: options.KerberosServicePrincipleName,
},
)

if err != nil {
return nil, err
}
Expand All @@ -101,7 +166,9 @@ func NewClient(options ClientOptions) (*Client, error) {
// New returns a connected Client, or an error if it can't connect. The user
// will be the current system user. Any relevantoptions (including the
// address(es) of the namenode(s), if an empty string is passed) will be loaded
// from the Hadoop configuration present at HADOOP_CONF_DIR.
// from the Hadoop configuration present at HADOOP_CONF_DIR. Note, however,
// that New will not attempt any Kerberos authentication; use NewClient if you
// need that.
func New(address string) (*Client, error) {
conf := LoadHadoopConf("")
options, err := ClientOptionsFromConf(conf)
Expand Down
35 changes: 33 additions & 2 deletions client_test.go
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
package hdfs

import (
"fmt"
"io/ioutil"
"os"
"os/user"
Expand All @@ -9,6 +10,9 @@ import (

"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
krb "gopkg.in/jcmturner/gokrb5.v5/client"
"gopkg.in/jcmturner/gokrb5.v5/config"
"gopkg.in/jcmturner/gokrb5.v5/credentials"
)

var cachedClients = make(map[string]*Client)
Expand Down Expand Up @@ -37,7 +41,12 @@ func getClientForUser(t *testing.T, username string) *Client {
t.Fatal("No hadoop configuration found at HADOOP_CONF_DIR")
}

options.User = username
if options.KerberosClient != nil {
options.KerberosClient = getKerberosClient(t, username)
} else {
options.User = username
}

client, err := NewClient(options)
if err != nil {
t.Fatal(err)
Expand All @@ -47,6 +56,28 @@ func getClientForUser(t *testing.T, username string) *Client {
return client
}

// getKerberosClient expects a ccache file for each user mentioned in the tests
// to live at /tmp/krb5cc_gohdfs_<username>, and krb5.conf to live at
// /etc/krb5.conf
func getKerberosClient(t *testing.T, username string) *krb.Client {
cfg, err := config.Load("/etc/krb5.conf")
if err != nil {
t.Skip("Couldn't load krb config:", err)
}

ccache, err := credentials.LoadCCache(fmt.Sprintf("/tmp/krb5cc_gohdfs_%s", username))
if err != nil {
t.Skipf("Couldn't load keytab for user %s: %s", username, err)
}

client, err := krb.NewClientFromCCache(ccache)
if err != nil {
t.Fatal("Couldn't initialize krb client:", err)
}

return client.WithConfig(cfg)
}

func touch(t *testing.T, path string) {
touchMask(t, path, 0)
}
Expand Down Expand Up @@ -115,7 +146,7 @@ func TestNewWithMultipleNodes(t *testing.T) {
}

nns = append([]string{"localhost:100"}, nns...)
_, err = NewClient(ClientOptions{Addresses: nns})
_, err = NewClient(ClientOptions{Addresses: nns, User: "gohdfs1"})
assert.Nil(t, err)
}

Expand Down
Loading
0