Add option to skip unpacking the same layer in parallel #33

dcantah · 2022-01-05T22:10:41Z

With the way things work right now, there's nothing stopping a parallel unpack of the exact
same layer to a snapshot. The first one to get committed will live on while the other(s) get garbage collected
so in the end things work out, but regardless of this it's wasted work. The real issue is that while unpack
should be pretty cheap on Linux, the opposite is true for the Windows and lcow formats. Kicking off
10 parallel pulls of the same image brings my 6 core machine to a halt and pushes 100% cpu utilization.
What all of this ends up causing is exponentially slower parallel pull times for images that either share layers,
or just pulling the same image.

I'm not sure if this is a "sound" way to approach this, or if there's possibly a much easier way to go about this change. I tried
to model it in a way that wouldn't disrupt things from a clients perspective, so the logic lives in the metadata snapshotter
layer. The gist of this change is if a new RemoteContext option is specified, the snapshotter now keeps track of what active
snapshots are "in progress". Any other snapshots that call Prepare with the same key as a snapshot that is already in progress
will now simply wait for one of two things to occur:

The first active snapshot it's waiting on gets removed via Remove (so it was never committed). For this case there
was likely an error during setup for the first snapshot/unpack, so any waiters continue as normal for this branch and create a new snapshot.
First active snapshot gets committed and will notify any snapshots currently waiting that commit has succeeded and
we can simply exit (as the layer now exists, so no need to create a new snapshot+unpack again). ErrAlreadyExists is returned from
all of the snapshots waiting to let the client know that there's already a snapshot that exists with this content.

Below are some numbers from testing this fix vs. Containerd built from main.:

Single Pull With This Commit
PS C:\Users\dcanter\Desktop\ctrd> $a=1..1 | %{start-job {C:\Users\dcanter\Desktop\ctrd\crictl.exe pull cplatpublic.azurecr.io/nanoserver_many_layers:latest}}; $a | wait-job | receive-job; $a | %{$.psendtime-$.psbegintime} | % totalseconds
Image is up to date for sha256:f86afbf779d83cf5aa2cc8377ea0b9f1cea140de22e1189270634d3a8ca4cf0e
23.2647532
PS C:\Users\dcanter\Desktop\ctrd> .\crictl.exe rmi cplatpublic.azurecr.io/nanoserver_many_layers:latest
Deleted: cplatpublic.azurecr.io/nanoserver_many_layers:latest

10 Parallel Pulls With This Commit
PS C:\Users\dcanter\Desktop\ctrd> $a=1..10 | %{start-job {C:\Users\dcanter\Desktop\ctrd\crictl.exe pull cplatpublic.azurecr.io/nanoserver_many_layers:latest}}; $a | wait-job | receive-job; $a | %{$.psendtime-$.psbegintime} | % totalseconds
Image is up to date for sha256:f86afbf779d83cf5aa2cc8377ea0b9f1cea140de22e1189270634d3a8ca4cf0e
Image is up to date for sha256:f86afbf779d83cf5aa2cc8377ea0b9f1cea140de22e1189270634d3a8ca4cf0e
Image is up to date for sha256:f86afbf779d83cf5aa2cc8377ea0b9f1cea140de22e1189270634d3a8ca4cf0e
Image is up to date for sha256:f86afbf779d83cf5aa2cc8377ea0b9f1cea140de22e1189270634d3a8ca4cf0e
Image is up to date for sha256:f86afbf779d83cf5aa2cc8377ea0b9f1cea140de22e1189270634d3a8ca4cf0e
Image is up to date for sha256:f86afbf779d83cf5aa2cc8377ea0b9f1cea140de22e1189270634d3a8ca4cf0e
Image is up to date for sha256:f86afbf779d83cf5aa2cc8377ea0b9f1cea140de22e1189270634d3a8ca4cf0e
Image is up to date for sha256:f86afbf779d83cf5aa2cc8377ea0b9f1cea140de22e1189270634d3a8ca4cf0e
Image is up to date for sha256:f86afbf779d83cf5aa2cc8377ea0b9f1cea140de22e1189270634d3a8ca4cf0e
Image is up to date for sha256:f86afbf779d83cf5aa2cc8377ea0b9f1cea140de22e1189270634d3a8ca4cf0e
25.3406896
25.1486873
25.0266887
24.8586885
24.7116929
24.5726943
24.4816912
24.375693
24.250694
24.1176895
PS C:\Users\dcanter\Desktop\ctrd> .\crictl.exe rmi cplatpublic.azurecr.io/nanoserver_many_layers:latest
Deleted: cplatpublic.azurecr.io/nanoserver_many_layers:latest

10 Parallel Pulls With Containerd Built Off Main:
PS C:\Users\dcanter\Desktop\ctrd> $a=1..10 | %{start-job {C:\Users\dcanter\Desktop\ctrd\crictl.exe pull cplatpublic.azurecr.io/nanoserver_many_layers:latest}}; $a | wait-job | receive-job; $a | %{$.psendtime-$.psbegintime} | % totalseconds
Image is up to date for sha256:f86afbf779d83cf5aa2cc8377ea0b9f1cea140de22e1189270634d3a8ca4cf0e
Image is up to date for sha256:f86afbf779d83cf5aa2cc8377ea0b9f1cea140de22e1189270634d3a8ca4cf0e
Image is up to date for sha256:f86afbf779d83cf5aa2cc8377ea0b9f1cea140de22e1189270634d3a8ca4cf0e
Image is up to date for sha256:f86afbf779d83cf5aa2cc8377ea0b9f1cea140de22e1189270634d3a8ca4cf0e
Image is up to date for sha256:f86afbf779d83cf5aa2cc8377ea0b9f1cea140de22e1189270634d3a8ca4cf0e
Image is up to date for sha256:f86afbf779d83cf5aa2cc8377ea0b9f1cea140de22e1189270634d3a8ca4cf0e
Image is up to date for sha256:f86afbf779d83cf5aa2cc8377ea0b9f1cea140de22e1189270634d3a8ca4cf0e
Image is up to date for sha256:f86afbf779d83cf5aa2cc8377ea0b9f1cea140de22e1189270634d3a8ca4cf0e
Image is up to date for sha256:f86afbf779d83cf5aa2cc8377ea0b9f1cea140de22e1189270634d3a8ca4cf0e
Image is up to date for sha256:f86afbf779d83cf5aa2cc8377ea0b9f1cea140de22e1189270634d3a8ca4cf0e
134.416318
134.2893209
134.1643238
134.027323
133.8813211
133.7333303
133.5593271
133.4253269
133.2963282
133.1643288

Signed-off-by: Daniel Canter dcanter@microsoft.com

dcantah · 2022-01-06T02:11:10Z

@msscotb @anmaxvl @ambarve @kevpar @katiewasnothere @helsaawy

As our fork doesn't contain the cri code, I needed to put the code in Cri where we actually make use of this (and the setting to flip in the config) here: kevpar/cri#16

metadata/snapshot.go

helsaawy

Does this have to be enabled by an annotation, or can this be changed by a containerd-wide setting, so clients have no choice?

client_opts.go

metadata/snapshot.go

dcantah · 2022-01-07T00:11:01Z

Does this have to be enabled by an annotation, or can this be changed by a containerd-wide setting, so clients have no choice?

Containerd wide setting was what I was thinking would be the only way to turn this on

kevpar · 2022-01-11T21:49:04Z

What's the rationale for putting this in our fork vs upstream?

unpacker.go

metadata/snapshot.go

unpacker.go

anmaxvl · 2022-01-26T20:17:57Z

other than my last unresolved comment, everything else looks good.

ambarve

LGTM

With the way things work right now, there's nothing stopping a parallel unpack of the exact same layer to a snapshot. The first one to get committed will live on while the other(s) get garbage collected so in the end things work out, but regardless of this it's wasted work. The real issue is that while unpack should be pretty cheap on Linux, the opposite is true for the Windows and lcow formats. Kicking off 10 parallel pulls of the same image brings my 6 core machine to a halt and pushes 100% cpu utilization. What all of this ends up causing is exponentially slower parallel pull times for images that either share layers, or just pulling the same image. I'm not sure if this is a "sound" way to approach this, or if there's possibly a much easier way to go about this change. I tried to model it in a way that wouldn't disrupt things from a clients perspective, so the logic lives in the metadata snapshotter layer. The gist of this change is if a new RemoteContext option is specified, the snapshotter now keeps track of what active snapshots are "in progress". Any other snapshots that call Prepare with the same key as a snapshot that is already in progress will now simply wait for one of two things to occur: 1. The first active snapshot it's waiting on gets removed via `Remove` (so it was never committed). For this case there was likely an error during setup for the first snapshot/unpack, so any waiters continue as normal for this branch and create a new snapshot. 2. First active snapshot gets committed and will notify any snapshots currently waiting that commit has succeeded and we can simply exit (as the layer now exists, so no need to create a new snapshot+unpack again). ErrAlreadyExists is returned from all of the snapshots waiting to let the client know that there's already a snapshot that exists with this content. Signed-off-by: Daniel Canter <dcanter@microsoft.com>

dcantah · 2022-03-08T00:43:09Z

Squashin' shortly

dcantah assigned anmaxvl and ambarve Jan 6, 2022

anmaxvl reviewed Jan 6, 2022

View reviewed changes

metadata/snapshot.go Outdated Show resolved Hide resolved

helsaawy reviewed Jan 7, 2022

View reviewed changes

client_opts.go Show resolved Hide resolved

metadata/snapshot.go Outdated Show resolved Hide resolved

metadata/snapshot.go Show resolved Hide resolved

dcantah mentioned this pull request Jan 7, 2022

Add DisableSameLayerUnpack option to cri config kevpar/cri#16

Merged

dcantah force-pushed the sameunpack-1.4 branch from cd213a1 to 72d16c5 Compare January 13, 2022 12:13

anmaxvl reviewed Jan 14, 2022

View reviewed changes

unpacker.go Outdated Show resolved Hide resolved

metadata/snapshot.go Show resolved Hide resolved

ambarve reviewed Jan 14, 2022

View reviewed changes

metadata/snapshot.go Show resolved Hide resolved

ambarve reviewed Jan 25, 2022

View reviewed changes

unpacker.go Show resolved Hide resolved

anmaxvl approved these changes Feb 16, 2022

View reviewed changes

ambarve approved these changes Feb 28, 2022

View reviewed changes

dcantah force-pushed the sameunpack-1.4 branch from fd9a560 to 3095bf1 Compare March 8, 2022 00:44

dcantah merged commit 14bbad6 into kevpar:fork/release/1.4 Mar 8, 2022

This was referenced Apr 5, 2022

Update fork main to v1.6 commits #37

Merged

Cherry pick LCOW layers integrity checking, skip unpack option, and log containerd's panic content #40

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add option to skip unpacking the same layer in parallel #33

Add option to skip unpacking the same layer in parallel #33

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Add option to skip unpacking the same layer in parallel #33

Add option to skip unpacking the same layer in parallel #33

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!