-
Notifications
You must be signed in to change notification settings - Fork 2
Add option to skip unpacking the same layer in parallel #33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@msscotb @anmaxvl @ambarve @kevpar @katiewasnothere @helsaawy As our fork doesn't contain the cri code, I needed to put the code in Cri where we actually make use of this (and the setting to flip in the config) here: kevpar/cri#16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this have to be enabled by an annotation, or can this be changed by a containerd-wide setting, so clients have no choice?
Containerd wide setting was what I was thinking would be the only way to turn this on |
What's the rationale for putting this in our fork vs upstream? |
cd213a1
to
72d16c5
Compare
other than my last unresolved comment, everything else looks good. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
With the way things work right now, there's nothing stopping a parallel unpack of the exact same layer to a snapshot. The first one to get committed will live on while the other(s) get garbage collected so in the end things work out, but regardless of this it's wasted work. The real issue is that while unpack should be pretty cheap on Linux, the opposite is true for the Windows and lcow formats. Kicking off 10 parallel pulls of the same image brings my 6 core machine to a halt and pushes 100% cpu utilization. What all of this ends up causing is exponentially slower parallel pull times for images that either share layers, or just pulling the same image. I'm not sure if this is a "sound" way to approach this, or if there's possibly a much easier way to go about this change. I tried to model it in a way that wouldn't disrupt things from a clients perspective, so the logic lives in the metadata snapshotter layer. The gist of this change is if a new RemoteContext option is specified, the snapshotter now keeps track of what active snapshots are "in progress". Any other snapshots that call Prepare with the same key as a snapshot that is already in progress will now simply wait for one of two things to occur: 1. The first active snapshot it's waiting on gets removed via `Remove` (so it was never committed). For this case there was likely an error during setup for the first snapshot/unpack, so any waiters continue as normal for this branch and create a new snapshot. 2. First active snapshot gets committed and will notify any snapshots currently waiting that commit has succeeded and we can simply exit (as the layer now exists, so no need to create a new snapshot+unpack again). ErrAlreadyExists is returned from all of the snapshots waiting to let the client know that there's already a snapshot that exists with this content. Signed-off-by: Daniel Canter <dcanter@microsoft.com>
Squashin' shortly |
With the way things work right now, there's nothing stopping a parallel unpack of the exact
same layer to a snapshot. The first one to get committed will live on while the other(s) get garbage collected
so in the end things work out, but regardless of this it's wasted work. The real issue is that while unpack
should be pretty cheap on Linux, the opposite is true for the Windows and lcow formats. Kicking off
10 parallel pulls of the same image brings my 6 core machine to a halt and pushes 100% cpu utilization.
What all of this ends up causing is exponentially slower parallel pull times for images that either share layers,
or just pulling the same image.
I'm not sure if this is a "sound" way to approach this, or if there's possibly a much easier way to go about this change. I tried
to model it in a way that wouldn't disrupt things from a clients perspective, so the logic lives in the metadata snapshotter
layer. The gist of this change is if a new RemoteContext option is specified, the snapshotter now keeps track of what active
snapshots are "in progress". Any other snapshots that call Prepare with the same key as a snapshot that is already in progress
will now simply wait for one of two things to occur:
Remove
(so it was never committed). For this case therewas likely an error during setup for the first snapshot/unpack, so any waiters continue as normal for this branch and create a new snapshot.
we can simply exit (as the layer now exists, so no need to create a new snapshot+unpack again). ErrAlreadyExists is returned from
all of the snapshots waiting to let the client know that there's already a snapshot that exists with this content.
Below are some numbers from testing this fix vs. Containerd built from main.:
Single Pull With This Commit$a | %{$ .psendtime-$.psbegintime} | % totalseconds
PS C:\Users\dcanter\Desktop\ctrd> $a=1..1 | %{start-job {C:\Users\dcanter\Desktop\ctrd\crictl.exe pull cplatpublic.azurecr.io/nanoserver_many_layers:latest}}; $a | wait-job | receive-job;
Image is up to date for sha256:f86afbf779d83cf5aa2cc8377ea0b9f1cea140de22e1189270634d3a8ca4cf0e
23.2647532
PS C:\Users\dcanter\Desktop\ctrd> .\crictl.exe rmi cplatpublic.azurecr.io/nanoserver_many_layers:latest
Deleted: cplatpublic.azurecr.io/nanoserver_many_layers:latest
10 Parallel Pulls With This Commit$a | %{$ .psendtime-$.psbegintime} | % totalseconds
PS C:\Users\dcanter\Desktop\ctrd> $a=1..10 | %{start-job {C:\Users\dcanter\Desktop\ctrd\crictl.exe pull cplatpublic.azurecr.io/nanoserver_many_layers:latest}}; $a | wait-job | receive-job;
Image is up to date for sha256:f86afbf779d83cf5aa2cc8377ea0b9f1cea140de22e1189270634d3a8ca4cf0e
Image is up to date for sha256:f86afbf779d83cf5aa2cc8377ea0b9f1cea140de22e1189270634d3a8ca4cf0e
Image is up to date for sha256:f86afbf779d83cf5aa2cc8377ea0b9f1cea140de22e1189270634d3a8ca4cf0e
Image is up to date for sha256:f86afbf779d83cf5aa2cc8377ea0b9f1cea140de22e1189270634d3a8ca4cf0e
Image is up to date for sha256:f86afbf779d83cf5aa2cc8377ea0b9f1cea140de22e1189270634d3a8ca4cf0e
Image is up to date for sha256:f86afbf779d83cf5aa2cc8377ea0b9f1cea140de22e1189270634d3a8ca4cf0e
Image is up to date for sha256:f86afbf779d83cf5aa2cc8377ea0b9f1cea140de22e1189270634d3a8ca4cf0e
Image is up to date for sha256:f86afbf779d83cf5aa2cc8377ea0b9f1cea140de22e1189270634d3a8ca4cf0e
Image is up to date for sha256:f86afbf779d83cf5aa2cc8377ea0b9f1cea140de22e1189270634d3a8ca4cf0e
Image is up to date for sha256:f86afbf779d83cf5aa2cc8377ea0b9f1cea140de22e1189270634d3a8ca4cf0e
25.3406896
25.1486873
25.0266887
24.8586885
24.7116929
24.5726943
24.4816912
24.375693
24.250694
24.1176895
PS C:\Users\dcanter\Desktop\ctrd> .\crictl.exe rmi cplatpublic.azurecr.io/nanoserver_many_layers:latest
Deleted: cplatpublic.azurecr.io/nanoserver_many_layers:latest
10 Parallel Pulls With Containerd Built Off Main:$a | %{$ .psendtime-$.psbegintime} | % totalseconds
PS C:\Users\dcanter\Desktop\ctrd> $a=1..10 | %{start-job {C:\Users\dcanter\Desktop\ctrd\crictl.exe pull cplatpublic.azurecr.io/nanoserver_many_layers:latest}}; $a | wait-job | receive-job;
Image is up to date for sha256:f86afbf779d83cf5aa2cc8377ea0b9f1cea140de22e1189270634d3a8ca4cf0e
Image is up to date for sha256:f86afbf779d83cf5aa2cc8377ea0b9f1cea140de22e1189270634d3a8ca4cf0e
Image is up to date for sha256:f86afbf779d83cf5aa2cc8377ea0b9f1cea140de22e1189270634d3a8ca4cf0e
Image is up to date for sha256:f86afbf779d83cf5aa2cc8377ea0b9f1cea140de22e1189270634d3a8ca4cf0e
Image is up to date for sha256:f86afbf779d83cf5aa2cc8377ea0b9f1cea140de22e1189270634d3a8ca4cf0e
Image is up to date for sha256:f86afbf779d83cf5aa2cc8377ea0b9f1cea140de22e1189270634d3a8ca4cf0e
Image is up to date for sha256:f86afbf779d83cf5aa2cc8377ea0b9f1cea140de22e1189270634d3a8ca4cf0e
Image is up to date for sha256:f86afbf779d83cf5aa2cc8377ea0b9f1cea140de22e1189270634d3a8ca4cf0e
Image is up to date for sha256:f86afbf779d83cf5aa2cc8377ea0b9f1cea140de22e1189270634d3a8ca4cf0e
Image is up to date for sha256:f86afbf779d83cf5aa2cc8377ea0b9f1cea140de22e1189270634d3a8ca4cf0e
134.416318
134.2893209
134.1643238
134.027323
133.8813211
133.7333303
133.5593271
133.4253269
133.2963282
133.1643288
Signed-off-by: Daniel Canter dcanter@microsoft.com