8000 Make lambda hook deterministic by phobologic · Pull Request #372 · cloudtools/stacker · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Make lambda hook deterministic #372

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
May 8, 2017
Merged

Conversation

phobologic
Copy link
Member
@phobologic phobologic commented Apr 30, 2017

The old method would cause issues if the mtime of the files was
different, since it was using the hash of the zipfile (and the zipfile
includes headers for mtime for each file).

This relies on generating a hash of all the files to be uploaded, then
hashes all those hashes. As long as the content hasn't changed, it
should return the same hash each time.

This also stops relying on the ETAGs from s3, and instead just uses the
calculated hash to create the key name, and doesn't re-upload if the key
already exists.

Fixes #370

The old method would cause issues if the mtime of the files was
different, since it was using the hash of the zipfile (and the zipfile
includes headers for mtime for each file).

This relies on generating a hash of all the files to be uploaded, then
hashes all those hashes. As long as the content hasn't changed, it
should return the same hash each time.

This also stops relying on the ETAGs from s3, and instead just uses the
calculated hash to create the key name, and doesn't re-upload if the key
already exists.
@codecov
Copy link
codecov bot commented Apr 30, 2017

Codecov Report

Merging #372 into master will increase coverage by 0.08%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #372      +/-   ##
==========================================
+ Coverage   87.62%   87.71%   +0.08%     
==========================================
  Files          86       86              
  Lines        4315     4346      +31     
==========================================
+ Hits         3781     3812      +31     
  Misses        534      534
Impacted Files Coverage Δ
stacker/hooks/aws_lambda.py 94.82% <100%> (+0.33%) ⬆️
stacker/tests/hooks/test_lambda.py 98.34% <100%> (+0.25%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9d87c9b...0c2133b. Read the comment docs.

@phobologic
Copy link
Member Author

@danielkza for some reason it's not letting me add you as a reviewer, maybe after mentioning you it will?

for fname in files:
f = os.path.join(root, fname)
with open(f) as fd:
file_hashes.append(hashlib.md5(fd.read()).hexdigest())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a) It is probably a good idea to open the file with rb mode to avoid newline shenanigans.
b) This might not catch file renames if the files are still produced in the same order.

It might be necessary to sort files by their full path in the ZIP file, and prepend the path to the content before hashing.

with open(os.path.join(root, ALL_FILES[0]), "w") as fd:
fd.write("modified file data")
hash3 = _calculate_hash(ALL_FILES, root)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to add a test verifying that the same set of files with the same content, but different file names generate a different hash.

@phobologic
Copy link
Member Author

Thanks @danielkza - updated to respond to your comments!

@phobologic phobologic force-pushed the fix_lambda_hook_determinism branch from 9ef7766 to a7bf254 Compare April 30, 2017 23:52
file_hashes.append(hashlib.md5(fd.read()).hexdigest())
with open(f, "rb") as fd:
file_hashes.append("%s:%s" % (fname,
hashlib.md5(fd.read()).hexdigest()))
Copy link
Contributor
@danielkza danielkza May 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last thing (sorry that I forgot to mention earlier): it's probably a bit more efficient to use a single hash object and incrementally append to it. That makes it easy to use \0 to separate path and content, which is better as it is almost always disallowed in filenames, and not read the whole file to memory.

files_hash = hashlib.md5()
for fname in files:
    f = os.path.join(root, fname)
    with open(f, "rb") as fd:
        files_hash.update(fname + "\0")
        for chunk in iter(lambda: fd.read(4096), ''):
            files_hash.update(chunk)

return files_hash.hexdigest()

@danielkza
Copy link
Contributor

LGTM, thanks for bearing with my nitpicks :)

Copy link
Contributor
@ejholmes ejholmes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other than sorting, lgtm

str: A hash of the hashes of the given files.
"""
file_hash = hashlib.md5()
for fname in files:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if _find_files returns a sorted list? Should probably perform a sort on the files before iterating through them, since non-deterministic ordering will screw up the hash.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And maybe add a test for this as well.

@phobologic phobologic force-pushed the fix_lambda_hook_determinism branch from 1ed5c09 to 1 8000 6c9247 Compare May 3, 2017 17:21
@phobologic
Copy link
Member Author

@ejholmes updated to sort the file lists before calculating them and wrote a test to verify that if calculate gets a file list that is in different order the hashes still end up the same.

Copy link
Contributor
@ejholmes ejholmes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@phobologic phobologic merged commit bf83280 into master May 8, 2017
@phobologic phobologic deleted the fix_lambda_hook_determinism branch May 8, 2017 22:32
phrohdoh pushed a commit to phrohdoh/stacker that referenced this pull request Dec 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0