8000 Proposal: caching the context · Issue #9553 · moby/moby · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
Proposal: caching the context #9553
Closed
Closed
@a-ba

Description

@a-ba

The purpose here is to speed up build iterations, especially when working with large contexts ('large' means hundreds of MB, possibly a GB).

There are some ways to mitigate the problem such as #6579 (.dockerignore to prevent uploading unnecessary files) and #5369 (to avoid copying source/intermediate files into the resulting image), but the next bottleneck is the transfer of the useful context itself. The current implementation requires to upload and process it completely for every build, even if only few files were changed since the previous build.

There have been several suggestions to mount an external volume at build (#8394 #1191 #8757), but they raise security and reproductibility issues.

Proposal

The change consist of updating the POST /build operation to allow sending the context in two steps. The first step would only include the files metadata and the sha256 sum of their content. The second step would provide the actual content of files (if requested by the server).

A build could take place as follows:

  1. The client computes the sha256 sum of every regular file present in the context

  2. The client sends its /build request. The query includes an option to enable server-side caching (eg. cache_context=true). The body is a 'light' tarball of the context in which the content of regular files is replaced with their sha256 sum, except the Dockerfile which is transferred as is.

    POST /build&cache_context=true HTTP/1.1
    Content-Type: application/tar
    
    [...context tar archive, but with the content of regular files replaced
     with their sha256 sum...]
    
  3. In the response, the first chunk sent by the server includes a "miss" item listing all files that are missing from the cache. In case of 100% cache hits, the "miss" must be present and empty. Then the build proceeds normally.

    HTTP/1.1 200 OK
    Content-Type: application/json
    
    {"miss": [ ...list_of_missing_sha256_keys... ]}
    {"stream":"Step 1...
     ...}
    
  4. The client parses the response.

    • If the first chunk does not include a "miss" item, then the client must assume that caching is not supported. It must abort build and restart a new build with caching disabled.

    • If the "miss" item is empty, nothing particular happens.

    • If the "miss" item is not empty, then the client creates a tarball with the missing files indexed by their sha256sum and opens a second HTTP connection to upload it. In the tarball, the files should be stored in the same order as in the list sent by the server.

      POST /build/upload HTTP/1.1
      Content-Type: application/tar
      
      [...tarball of the missing files indexed by their sha256 sum...]
      
  5. The server must verifiy the sha256 sums of the files before adding them to the cache, and then performs the build.

    HTTP/1.1 200 OK
    

This change would allow several optimisations:

  • once a file has its sha256 sum validated by the server, it can be cached for subsequent builds.
  • when launching a new build with a slightly modified context, only the updated files need to be uploaded
  • ADD/COPY of cached files will not require re-hashing on server-side (the server can trust its internal cache)
  • if the storage driver is 'btrfs', all server-side copies (CACHE->CONTEXT and CONTEXT->CONTAINER) can rely on the copy-on-write abilities of the filesystem (cp --reflink)
  • in case of a rebuild, hashing files should be faster on client-side since they are likely still cached in memory by the OS (if needed it is even possible to cache the sums on client-side and re-hash the files only if their mtime is changed)
  • the server can start the build eagerly, before the context is fully uploaded
  • the server can parse the Dockerfile eagerly and build the list of cache misses accordingly:
    • so that files needed first are uploaded first
    • so that unused files are not uploaded at all

Implementation considerations:

  • before adding a file into the cache, the server MUST verify the sha256 sum (otherwise this would be a security hole)
  • the proposal does not address how the cache is managed. As for now we can just assume it is a basic LRU cache hosted by the docker engine. Sharing cache with a cluster of daemons or with an external storage facility is out of the scope.
  • the size of the cache sho 562E uld be configurable at runtime (eg. 'docker -d --context-cache=1G') the default should be rather small
  • instead of replacing the content of files with their sha256 sum in the tarball, it could be better to include the list of cached files as a file inside the archive, with a special name (eg. .docker-cached-files), the question is open

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/builderkind/featureFunctionality or other elements that the project doesn't currently have. Features are new and shiny

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0