-
Notifications
You must be signed in to change notification settings - Fork 2.6k
S3 driver: Attempt HeadObject on Stat first, fail over to List #4401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
S3 driver: Attempt HeadObject on Stat first, fail over to List #4401
Conversation
17f0e1e
to
7617627
Compare
registry/storage/driver/s3-aws/s3.go
Outdated
Bucket: aws.String(d.Bucket), | ||
Key: aws.String(s3Path), | ||
}) | ||
if err == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://go.dev/wiki/CodeReviewComments#indent-error-flow
I would suggest refactoring this into three functions:
func (d *driver) Stat(ctx context.Context, path string) (storagedriver.FileInfo, error) {
s3Path := d.s3Path(path)
fi, err := d.statHead(ctx, s3Path)
if err != nil {
if _, ok := err.(awserr.Error); ok {
var err error
fi, err = d.statList(ctx, s3Path)
if err != nil {
return nil, err
}
}
}
fi.Path = path
return storagedriver.FileInfoInternal{FileInfoFields: fi}, nil
}
PTAL @thaJeztah @Jamstah @squizzi |
registry/storage/driver/s3-aws/s3.go
Outdated
fi, err := d.statList(ctx, s3Path) | ||
if err != nil { | ||
if errors.Is(err, storagedriver.PathNotFoundError{}) { | ||
return nil, storagedriver.PathNotFoundError{Path: path} | ||
} | ||
return nil, parseError(path, err) | ||
} | ||
fi.Path = path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a fan of mutating the returned type and setting the path here.
Could we instead make statList
and statHead
accept the unprocessed path
argument, and make them set the Path
in the FileInfo they return? (internalising the d.s3Path
into those functions)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason why we're not doing that (see Corey's suggestion) is there is no need to do strings.Trim
ming [in worst case] d.S3Path
does, multiple times. We do it once in Stat
and pass it down.
registry/storage/driver/s3-aws/s3.go
Outdated
// are slightly outdated, the HeadObject actually returns NotFound error | ||
// if querying a key which doesn't exist or a key which has nested keys | ||
// and Forbidden if IAM/ACL permissions do not allow Head but allow List. | ||
if _, ok := err.(awserr.Error); ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps this should be an errors.Is()
/ errors.As
?
registry/storage/driver/s3-aws/s3.go
Outdated
var fi storagedriver.FileInfoFields | ||
if len(resp.Contents) == 1 { | ||
if *resp.Contents[0].Key != s3Path { | ||
fi.IsDir = true | ||
return storagedriver.FileInfoInternal{FileInfoFields: fi}, nil | ||
return &fi, nil | ||
} | ||
|
||
return nil, storagedriver.PathNotFoundError{Path: path} | ||
fi.IsDir = false | ||
fi.Size = *resp.Contents[0].Size | ||
fi.ModTime = *resp.Contents[0].LastModified | ||
return &fi, nil | ||
} | ||
if len(resp.CommonPrefixes) == 1 { | ||
fi.IsDir = true | ||
return &fi, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd remove the var fi storagedriver.FileInfoFields
, and just inline constructing the value; it reduces cognitive load (did fi
already have fields set elsewhere?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It (inline constructed values) also unnecessarily bloats the code in this specific func. The original code I wrote did exactly that. Not sure if it's worth it, tbh.
@milosgajdos opened a quick PR against your PR branch with my suggestions; milosgajdos#1 (if you agree with the changes, you can merge them, and the'll get into this PR, but it probably needs some squashing 😄) |
I saw that and mmm, I'm unconvinced most of those changes are needed 🤔 But also, at the same time, whatever gets anything going in this repo. |
PTAL @thaJeztah |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're going to squash down to one commit, I assume?
I am waiting for Seb to see if he wants his commits to be in this PR or if he doesn't mind my squashing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I am waiting for Seb to see if he wants his commits to be in this PR or if he doesn't mind my squashing
Oh, definitely squash 'em and cleanup the commit message accordingly
You can make Cory and I partner in crime with Co-authored-by:
comments (GitHub takes those into account), e.g.;
Co-authored-by: Cory Snider <corhere@gmail.com>
Co-authored-by: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: Milos Gajdos <milosgajdos83@gmail.com>
1370015
to
68169e9
Compare
Stat always calls ListObjects when stat-ing S3 key. Unfortauntely ListObjects is not a free call - both in terms of egress and actual AWS costs (likely because of the egress). This changes the behaviour of Stat such that we always attempt the HeadObject call first and only ever fall through to ListObjects if the HeadObject returns an AWS API error. Note, that the official docs mention that the only error returned by HEAD is NoSuchKey; experiments show that this is demonstrably wrong and the AWS docs are simply outdated at the time of this commit. HeadObject actually returns the following errors: * NotFound: if the queried key does not exist * NotFound: if the queried key contains subkeys i.e. it's a prefix * BucketRegionError: if the bucket does not exist * Forbidden: if Head operation is not allows via IAM/ACLs Co-authored-by: Cory Snider <corhere@gmail.com> Co-authored-by: Sebastiaan van Stijn <github@gone.nl> Signed-off-by: Sebastiaan van Stijn <github@gone.nl> Signed-off-by: Milos Gajdos <milosthegajdos@gmail.com>
68169e9
to
a18cc8a
Compare
S3.StorageDriver.Stat
always callsListObjects
when stat-ing an S3 key to determine if it's a "directory" or not.Unfortunately, the
ListObjects
is not a free call: both in terms of egress and actual AWS costs (likely because of the egress).This changes the behaviour of
Stat
such that we always attempt theHeadObject
call first and only ever fall through toListObjects
if theHeadObject
call returns an AWS API error.Note, that the official docs mention that the only error returned by HEAD is
NoSuchKey
(https://docs.aws.amazon.com/AmazonS3/latest/API/API_HeadObject.html#API_HeadObject_Errors); experiments show that this is demonstrably wrong and the AWS docs are simply outdated at the time of this commit.HeadObject actually returns the following errors:
NotFound
: if the queried key does not exist or if the queried key contains subkeys i.e. it's a prefixBucketRegionError
: if the bucket does not existForbidden
: if Head operation is not allowed via IAM/ACLsCloses: #4326 (yes I know that's a PR, but the OG author doesn't respond and the PR is in a half-broken state) 8000 p>