10000 S3 driver: Attempt HeadObject on Stat first, fail over to List by milosgajdos · Pull Request #4401 · distribution/distribution · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

S3 driver: Attempt HeadObject on Stat first, fail over to List #4401

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 17, 2024

Conversation

milosgajdos
Copy link
Member
@milosgajdos milosgajdos commented Jul 13, 2024

S3.StorageDriver.Stat always calls ListObjects when stat-ing an S3 key to determine if it's a "directory" or not.

Unfortunately, the ListObjects is not a free call: both in terms of egress and actual AWS costs (likely because of the egress).

This changes the behaviour of Stat such that we always attempt the HeadObject call first and only ever fall through to ListObjects if the HeadObject call returns an AWS API error.

Note, that the official docs mention that the only error returned by HEAD is NoSuchKey (https://docs.aws.amazon.com/AmazonS3/latest/API/API_HeadObject.html#API_HeadObject_Errors); experiments show that this is demonstrably wrong and the AWS docs are simply outdated at the time of this commit.

HeadObject actually returns the following errors:

  • NotFound: if the queried key does not exist or if the queried key contains subkeys i.e. it's a prefix
  • BucketRegionError: if the bucket does not exist
  • Forbidden: if Head operation is not allowed via IAM/ACLs

Closes: #4326 (yes I know that's a PR, but the OG author doesn't respond and the PR is in a half-broken state)

Bucket: aws.String(d.Bucket),
Key: aws.String(s3Path),
})
if err == nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://go.dev/wiki/CodeReviewComments#indent-error-flow

I would suggest refactoring this into three functions:

func (d *driver) Stat(ctx context.Context, path string) (storagedriver.FileInfo, error) {
	s3Path := d.s3Path(path)
	fi, err := d.statHead(ctx, s3Path)
	if err != nil {
		if _, ok := err.(awserr.Error); ok {
			var err error
			fi, err = d.statList(ctx, s3Path)
			if err != nil {
				return nil, err
			}
		}
	}
	fi.Path = path
	return storagedriver.FileInfoInternal{FileInfoFields: fi}, nil
}

@milosgajdos milosgajdos requested a review from corhere July 16, 2024 08:02
@milosgajdos
Copy link
Member Author

PTAL @thaJeztah @Jamstah @squizzi

Comment on lines 820 to 827
fi, err := d.statList(ctx, s3Path)
if err != nil {
if errors.Is(err, storagedriver.PathNotFoundError{}) {
return nil, storagedriver.PathNotFoundError{Path: path}
}
return nil, parseError(path, err)
}
fi.Path = path
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a fan of mutating the returned type and setting the path here.

Could we instead make statList and statHead accept the unprocessed path argument, and make them set the Path in the FileInfo they return? (internalising the d.s3Path into those functions)

Copy link
Member Author
@milosgajdos milosgajdos Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why we're not doing that (see Corey's suggestion) is there is no need to do strings.Trimming [in worst case] d.S3Path does, multiple times. We do it once in Stat and pass it down.

// are slightly outdated, the HeadObject actually returns NotFound error
// if querying a key which doesn't exist or a key which has nested keys
// and Forbidden if IAM/ACL permissions do not allow Head but allow List.
if _, ok := err.(awserr.Error); ok {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this should be an errors.Is() / errors.As ?

Comment on lines 790 to 810
var fi storagedriver.FileInfoFields
if len(resp.Contents) == 1 {
if *resp.Contents[0].Key != s3Path {
fi.IsDir = true
return storagedriver.FileInfoInternal{FileInfoFields: fi}, nil
return &fi, nil
}

return nil, storagedriver.PathNotFoundError{Path: path}
fi.IsDir = false
fi.Size = *resp.Contents[0].Size
fi.ModTime = *resp.Contents[0].LastModified
return &fi, nil
}
if len(resp.CommonPrefixes) == 1 {
fi.IsDir = true
return &fi, nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd remove the var fi storagedriver.FileInfoFields, and just inline constructing the value; it reduces cognitive load (did fi already have fields set elsewhere?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It (inline constructed values) also unnecessarily bloats the code in this specific func. The original code I wrote did exactly that. Not sure if it's worth it, tbh.

@thaJeztah
Copy link
Member

@milosgajdos opened a quick PR against your PR branch with my suggestions; milosgajdos#1 (if you agree with the changes, you can merge them, and the'll get into this PR, but it probably needs some squashing 😄)

@milosgajdos
Copy link
Member Author
milosgajdos commented Jul 16, 2024

I saw that and mmm, I'm unconvinced most of those changes are needed 🤔 But also, at the same time, whatever gets anything going in this repo.

@milosgajdos milosgajdos requested a review from thaJeztah July 16, 2024 13:43
@milosgajdos
Copy link
Member Author

PTAL @thaJeztah

Copy link
Collaborator
@corhere corhere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're going to squash down to one commit, I assume?

@milosgajdos
Copy link
Member Author

You're going to squash down to one commit, I assume?

I am waiting for Seb to see if he wants his commits to be in this PR or if he doesn't mind my squashing

Copy link
Member
@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I am waiting for Seb to see if he wants his commits to be in this PR or if he doesn't mind my squashing

Oh, definitely squash 'em and cleanup the commit message accordingly

You can make Cory and I partner in crime with Co-authored-by: comments (GitHub takes those into account), e.g.;

Co-authored-by: Cory Snider <corhere@gmail.com>
Co-authored-by: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: Milos Gajdos <milosgajdos83@gmail.com>

@milosgajdos milosgajdos force-pushed the stat-head-object-first branch from 1370015 to 68169e9 Compare July 17, 2024 09:11
@github-actions github-actions bot added the dependencies Pull requests that update a dependency file label Jul 17, 2024
Stat always calls ListObjects when stat-ing S3 key.
Unfortauntely ListObjects is not a free call - both in terms of egress
and actual AWS costs (likely because of the egress).

This changes the behaviour of Stat such that we always attempt the
HeadObject call first and only ever fall through to ListObjects if the
HeadObject returns an AWS API error.

Note, that the official docs mention that the only error returned by
HEAD is NoSuchKey; experiments show that this is demonstrably wrong and
the AWS docs are simply outdated at the time of this commit.

HeadObject actually returns the following errors:
* NotFound: if the queried key does not exist
* NotFound: if the queried key contains subkeys i.e. it's a prefix
* BucketRegionError: if the bucket does not exist
* Forbidden: if Head operation is not allows via IAM/ACLs

Co-authored-by: Cory Snider <corhere@gmail.com>
Co-authored-by: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: Milos Gajdos <milosthegajdos@gmail.com>
@milosgajdos milosgajdos force-pushed the stat-head-object-first branch from 68169e9 to a18cc8a Compare July 17, 2024 09:17
@milosgajdos milosgajdos merged commit 753d64b into distribution:main Jul 17, 2024
16 checks passed
@milosgajdos milosgajdos deleted the stat-head-object-first branch July 17, 2024 09:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/storage/s3 area/storage dependencies Pull requests that update a dependency file
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0