Make lambda hook deterministic #372

phobologic · 2017-04-30T23:07:59Z

The old method would cause issues if the mtime of the files was
different, since it was using the hash of the zipfile (and the zipfile
includes headers for mtime for each file).

This relies on generating a hash of all the files to be uploaded, then
hashes all those hashes. As long as the content hasn't changed, it
should return the same hash each time.

This also stops relying on the ETAGs from s3, and instead just uses the
calculated hash to create the key name, and doesn't re-upload if the key
already exists.

Fixes #370

The old method would cause issues if the mtime of the files was different, since it was using the hash of the zipfile (and the zipfile includes headers for mtime for each file). This relies on generating a hash of all the files to be uploaded, then hashes all those hashes. As long as the content hasn't changed, it should return the same hash each time. This also stops relying on the ETAGs from s3, and instead just uses the calculated hash to create the key name, and doesn't re-upload if the key already exists.

codecov · 2017-04-30T23:09:48Z

Codecov Report

Merging #372 into master will increase coverage by 0.08%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #372      +/-   ##
==========================================
+ Coverage   87.62%   87.71%   +0.08%     
==========================================
  Files          86       86              
  Lines        4315     4346      +31     
==========================================
+ Hits         3781     3812      +31     
  Misses        534      534

Impacted Files	Coverage Δ
stacker/hooks/aws_lambda.py	`94.82% <100%> (+0.33%)`	⬆️
stacker/tests/hooks/test_lambda.py	`98.34% <100%> (+0.25%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9d87c9b...0c2133b. Read the comment docs.

phobologic · 2017-04-30T23:17:55Z

@danielkza for some reason it's not letting me add you as a reviewer, maybe after mentioning you it will?

danielkza · 2017-04-30T23:23:15Z

stacker/hooks/aws_lambda.py

+    for fname in files:
+        f = os.path.join(root, fname)
+        with open(f) as fd:
+            file_hashes.append(hashlib.md5(fd.read()).hexdigest())


a) It is probably a good idea to open the file with rb mode to avoid newline shenanigans.
b) This might not catch file renames if the files are still produced in the same order.

It might be necessary to sort files by their full path in the ZIP file, and prepend the path to the content before hashing.

danielkza · 2017-04-30T23:24:34Z

stacker/tests/hooks/test_lambda.py

+            with open(os.path.join(root, ALL_FILES[0]), "w") as fd:
+                fd.write("modified file data")
+            hash3 = _calculate_hash(ALL_FILES, root)
+


It would be nice to add a test verifying that the same set of files with the same content, but different file names generate a different hash.

phobologic · 2017-04-30T23:42:51Z

Thanks @danielkza - updated to respond to your comments!

danielkza · 2017-05-01T00:18:42Z

stacker/hooks/aws_lambda.py

-            file_hashes.append(hashlib.md5(fd.read()).hexdigest())
+        with open(f, "rb") as fd:
+            file_hashes.append("%s:%s" % (fname,
+                                          hashlib.md5(fd.read()).hexdigest()))


One last thing (sorry that I forgot to mention earlier): it's probably a bit more efficient to use a single hash object and incrementally append to it. That makes it easy to use \0 to separate path and content, which is better as it is almost always disallowed in filenames, and not read the whole file to memory.

files_hash = hashlib.md5() for fname in files: f = os.path.join(root, fname) with open(f, "rb") as fd: files_hash.update(fname + "\0") for chunk in iter(lambda: fd.read(4096), ''): files_hash.update(chunk) return files_hash.hexdigest()

danielkza · 2017-05-01T00:58:17Z

LGTM, thanks for bearing with my nitpicks :)

ejholmes

other than sorting, lgtm

ejholmes · 2017-05-02T00:11:41Z

stacker/hooks/aws_lambda.py

+        str: A hash of the hashes of the given files.
+    """
+    file_hash = hashlib.md5()
+    for fname in files:


Do you know if _find_files returns a sorted list? Should probably perform a sort on the files before iterating through them, since non-deterministic ordering will screw up the hash.

And maybe add a test for this as well.

phobologic · 2017-05-03T17:22:32Z

@ejholmes updated to sort the file lists before calculating them and wrote a test to verify that if calculate gets a file list that is in different order the hashes still end up the same.

ejholmes

Looks great!

…rminism Make lambda hook deterministic

Remove unnecessary touch function

8cffc8b

phobologic mentioned this pull request Apr 30, 2017

Content hash in aws_lambda hook non-deterministic? #370

Closed

danielkza reviewed Apr 30, 2017

View reviewed changes

Update for @danielkza's comments

a7bf254

phobologic force-pushed the fix_lambda_hook_determinism branch from 9ef7766 to a7bf254 Compare April 30, 2017 23:52

danielkza reviewed May 1, 2017

View reviewed changes

Updating to use a single file hash

ef299e0

ejholmes requested changes May 2, 2017

View reviewed changes

Ensure ordering of directories/files @ejholmes

16c9247

phobologic force-pushed the fix_lambda_hook_determinism branch from 1ed5c09 to 1 8000 6c9247 Compare May 3, 2017 17:21

ejholmes approved these changes May 8, 2017

View reviewed changes

Merge branch 'master' into fix_lambda_hook_determinism

0c2133b

phobologic merged commit bf83280 into master May 8, 2017

phobologic deleted the fix_lambda_hook_determinism branch May 8, 2017 22:32

phrohdoh pushed a commit to phrohdoh/stacker that referenced this pull request Dec 18, 2018

Merge pull request cloudtools#372 from remind101/fix_lambda_hook_dete…

43fc04c

…rminism Make lambda hook deterministic

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make lambda hook deterministic #372

Make lambda hook deterministic #372

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Make lambda hook deterministic #372

Make lambda hook deterministic #372

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!