8000 The cache limit is too small · Issue #6 · actions/cache · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

The cache limit is too small #6

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
smorimoto opened this issue Oct 31, 2019 · 115 comments
Closed

The cache limit is too small #6

smorimoto opened this issue Oct 31, 2019 · 115 comments
< 8000 /div>
Labels
enhancement New feature or request

Comments

@smorimoto
Copy link
Contributor
smorimoto commented Oct 31, 2019

I understand that Issue is not related to the code of this repository, but I would like to discuss with many people, so I open Issue here. (and I know community forums exist. but many people probably don't know yet.)

First, I really appreciate the GitHub team for adding the cache feature to GitHub Actions. That's great for us! But, In recent years, node_modules is too large. 200MB can't cover it. It's the same in other languages. For example, using esy to install the opam packages, it can easily exceed 800MB. Is there a way to increase the cache limit? or if individual cache limits are removed, it becomes a relatively realistic limit. I know that if the file size is too large, save/restore may insanely slow down. but it shouldn't be limited on the cache action side.

@cemo
Copy link
cemo commented Oct 31, 2019

Individual caches are limited to 200MB and a repository can have up to 2GB of caches. Once the 2GB limit is reached, older caches will be evicted based on when the cache was last accessed.

Yes please remove individiual cache limit. 2GB per repo is reasonable.

@dentuzhik
Copy link

For our middle-sized react-native project, archived node_modules directory is 260 MB and CocoaPods directory is 202 MB

@sidmitra
Copy link

Typically python virtualenv are also >500MB these days. These are cached to avoid re-compiling some modules each time like lxml, numpy etc.

@smorimoto
Copy link
Contributor Author

In my opinion, 2GB per repository is still small for medium ~ large project.

@smorimoto
Copy link
Contributor Author

The file size limit is defined here, so I think we can easily remove the individual limit. but, we can't do anything about the limit per repository. 2GB is probably enough for JavaScript developer, but not so good for a native developer like me.

@hazcod
Copy link
hazcod commented Oct 31, 2019

I would appreciate a higher limit (e.g. 5BG) for enterprise/paying customers as wel.

@smorimoto
Copy link
Contributor Author

Well, my guess is that you will be able to increase the cache limit by charging like Git LFS.

@chrispat
Copy link
Member

We are working on the long term plan for how we will enable larger limits. Charging for it like we do for packages or actions artifacts is something we are considering.

@lpil
Copy link
lpil commented Oct 31, 2019

The compiled deps for my small Rust project is 500 MB, the cache will need to be considerably larger to support Rust.

@andrewhampton
Copy link

For what it's worth, a project I'm on has a 766MB node_modules folder, but the caching works fine. It compresses the folder before caching, so I assume the 200MB limit is on the compressed asset.

@smorimoto
Copy link
Contributor Author

Well, I understand that there is such an example when compressed. But other existing CI providers do not limit individual cache size. (Although they recommend keep it under 500MB.) I think there is no need to limit it.

@joshmgross
Copy link
Member

so I assume the 200MB limit is on the compressed asset.

That's correct, the limit is after we tar and gzip the directory specified by path. I'll update the README to make that more clear

@joshmgross joshmgross added the enhancement New feature or request label Oct 31, 2019
@mrmckeb
Copy link
mrmckeb commented Nov 1, 2019

This is a great start, but we've also hit the limit before trying.

We have a monorepo, four apps. The total is around 600 MB.

We also need the ability to cache the node_modules for each of the packages. CircleCI took this approach - https://www.benpickles.com/articles/77-caching-yarn-workspaces-on-circleci

@lovasoa
Copy link
lovasoa commented Nov 1, 2019

Couldn't github implement a cross-repository deduplication system for cached assets, if storage costs are a problem ?

@OvermindDL1
Copy link

Couldn't github implement a cross-repository deduplication system for cached assets, if storage costs are a problem ?

Especially for a lot of patterns like specific paths in things like programming language compiled objects, node_modules subdirectories, etc... are all ripe for de-duplication in a very very efficient way if made in such a way that the patterns are known, and that can then be made generic.

@ad-m
Copy link
Contributor
ad-m commented Nov 1, 2019

It's easy to say about deduplication. However, it is more difficult to make an effective system in this area that will work efficiently for many small files. If deduplication will be performed at the data block level, this solution is ineffective in the case of data compression. If deduplication will be performed at the file level, it is easy to achieve a large communication overhead. In this way, the GitHub team opens up a huge problem, which should rather be the task of the team responsible for Azure Storage Blob service.

@tuler
Copy link
tuler commented Nov 1, 2019

On my first attempt to use this to cache docker layers I hit the file limit.

Cache size of 945984259 bytes is over the 200MB limit, not saving cache.

@tuler tuler mentioned this issue Nov 1, 2019
@ad-m
Copy link
Contributor
ad-m commented Nov 1, 2019

@tuler , have you tried to fork action? Is the limit verified also on the GitHub side?

@tuler
Copy link
tuler commented Nov 1, 2019

@tuler , have you tried to fork action? Is the limit verified also on the GitHub side?

The per file limit is here
The repo limit is on GitHub side.

Not sure that forking will help.

@ad-m
Copy link
Contributor
ad-m commented Nov 1, 2019

945984259 bytes is under 2 GB per repo limit.

I saw this code for per file limit, hence I wonder if the limit is also verified on the server side.

@joshmgross
Copy link
Member

Per file and repo limit are verified server-side. The per-file limit in the action is to avoid an unnecessary upload that the server will reject.

@smorimoto
Copy link
Contributor Author
smorimoto commented Nov 2, 2019

When do you think you can remove individual limit? (if you will do)

@shouze
Copy link
shouze commented Nov 4, 2019

On a react native project, .tgz cache file for yarn cache is 666Mib, so it worth having a limit ≈ 1Gib.

@chrispat chrispat mentioned this issue Nov 4, 2019
@ilyakooo0
Copy link

For reference: installing the latest haskell compiler, runtime and standard libraries (which is necessary for every compilation) takes up 1.59 GB

So even the 2 GB repo-wide limit would hardly fit an actual project cache (which probably has dozens of dependencies)

@OvermindDL1
Copy link

@imbsky tar is not slow, it does no compression or compaction, it's fairly identical to files stuffed end to end with a header. Usually gz or so is then used to compress that single file (often on the stream that tar directly outputs so it really becomes basically no-cost), and compression is the slow part.

kcgen pushed a commit to dosbox-staging/dosbox-staging that referenced this issue Jan 7, 2020
GitHub's ongoing issue of limitting the cache size has recently
been fixed (actions/cache#6), so this
PR create a combined Clang+GCC cache for separate 32-bit and 64-bit
architectures under Windows.
@smorimoto
Copy link
Contributor Author

Yeah, you're right. I meant "the current implementation calling the tar command is slow".

kcgen pushed a commit to dosbox-staging/dosbox-staging that referenced this issue Jan 7, 2020
GitHub's ongoing issue of limitting the cache size has recently
been fixed (actions/cache#6), so this
PR create a combined Clang+GCC cache for separate 32-bit and 64-bit
architectures under Windows.
@joshmgross joshmgross unpinned this issue Jan 7, 2020
dreamer pushed a commit to dosbox-staging/dosbox-staging that referenced this issue Jan 7, 2020
GitHub's ongoing issue of limitting the cache size has recently
been fixed (actions/cache#6), so this
PR create a combined Clang+GCC cache for separate 32-bit and 64-bit
architectures under Windows.
@smorimoto
Copy link
Contributor Author
smorimoto commented Jan 8, 2020

2GB seems relatively enough when running tests on only one operating system, but when running tests on three operating systems, 2GB is not enough and feels like a lot of cache is wasted each time. What do other people think?

@smorimoto
Copy link
Contributor Author
smorimoto commented Jan 8, 2020

Also, this is an issue that the cache limit is not enough, and I feel that this issue should not be closed just because cache actions can handle huge sizes. Because the two problems are completely different.

@dhadka
Copy link
Contributor
dhadka commented Jan 8, 2020

@chrispat Can you please comment on @imbsky's questions above regarding cache limits (if we have any plans to increase beyond the new 2 GB limit or offer a paid tier?)

@kcgen
Copy link
kcgen commented Jan 8, 2020

2GB seems relatively enough when running tests on only one operating system, but when running tests on three operating systems, 2GB is not enough and feels like a lot of cache is wasted each time. What do other people think?

Yes. We cannot cache the large and time-consuming Clang installation on macOS using MacPorts because it will evict our even heavier and more time-consuming Clang caches under 32bit and 64bit MSYS2 (native install is ~15 min vs 2 min to extract from cache).
Also, all three of the above /would/ fit if @joshmgross had access to use zstd on the VMs (+1 hoping the VM team adds it across the board!); but until then or a higher limit we can only cache two items.

@chrispat
Copy link
Member
chrispat commented Jan 9, 2020

@imbsky we are collecting data on cache usage across the service and evaulating that to determine of we can raise the individual repo limits. As far as paid options go, we already have paid storage of artifacts and we are looking at including cache storage as part of that overall offer.

@smorimoto
Copy link
Contributor Author

I see! That sounds good. 2GB is definitely better than before, so I will wait for a little more.

@thomaseizinger
Copy link

As far as paid options go, we already have paid storage of artifacts and we are looking at including cache storage as part of that overall offer.

Is there any news on that?

We are building a Rust project across 3 operating systems and each one would need a cache of 1.7GB. Meaning the caches invalidate themselves constantly, resulting in them not being useful.

@smorimoto
Copy link
Contributor Author
smorimoto commented Jan 4, 2021

I just opened this as a new discussion. It may change if there are many demands. #497

@ahdbilal ahdbilal reopened this Jul 27, 2021
@simonsan
Copy link

As far as paid options go, we already have paid storage of artifacts and we are looking at including cache storage as part of that overall offer.

Is there any news on that?

We are building a Rust project across 3 operating systems and each one would need a cache of 1.7GB. Meaning the caches invalidate themselves constantly, resulting in them not being useful.

I think what might be useful here is actually making a docker build and cache the intermediate docker image with cargo chef.

@Kurt-von-Laven
Copy link
Kurt-von-Laven commented Sep 7, 2021

For Node.js projects wishing to reduce the size of what is cached, I suggest looking into Yarn Zero-Installs. Yarn allows you to transparently check zipped versions of your dependencies into your repository directly. That empowers you to cache an empty file signifying that you have checked the cache against the canonical registry, which can dramatically improve the performance of your CI pipeline. Here is an example of the GitHub Actions steps required:

      - name: Cache the fact that we have checked the yarn cache.
        id: yarn-cache
        uses: actions/cache@v2.1.6
        with:
          path: .cacheChecked
          key: yarn-${{ runner.os }}-${{ hashFiles('yarn.lock') }}
          restore-keys: |
            yarn-${{ runner.os }}-
      - name: Install dependencies without refetching on cache hit.
        if: ${{ steps.yarn-cache.outputs.cache-hit == 'true' }}
        run: yarn install --immutable --immutable-cache
      - name: Install dependencies, refetching on cache miss for added security.
        if: ${{ steps.yarn-cache.outputs.cache-hit != 'true' }}
        run: |
          # See https://yarnpkg.com/features/zero-installs#does-it-have-security-implications
          yarn install --immutable --immutable-cache --check-cache
          touch .cacheChecked

@N-Usha
Copy link
N-Usha commented Nov 23, 2021

Happy to announce that today we shipped cache size increase from 5GB per repo to 10GB. 🚀 🚀

Hope you can now unlock many more scenarios to run GitHub Action workflows faster by caching even bigger dependencies and other commonly reused files from previous jobs :)

@vsvipul
Copy link
Contributor
vsvipul commented Dec 15, 2021

Closing this issue now as the size has been increased to 10 GB now.

@vsvipul vsvipul closed this as completed Dec 15, 2021
@simonsan
Copy link

Recap

I took part in this discussion in November 2019, so we came a long way.

  • From discussions increasing the cache size to 400 MiB, 1GiB, 2 GiB, 5 GiB, and now 10 GiB.
  • Discussions about reasonable defaults for the environments of the runners (which added zstd for better compression).
  • And also tips and tricks how to get reasonable caching with 200-400 MiB cache size for C++ projects.

I think it's reasonable to thank a few people at this stage:

  • Thanks to @kcgen for developing scripts to achieve good compression on that caching with your high-quality scripts.
  • Thanks to @smorimoto for sticking here and constantly asking and pushing up this topic to make it being heard.
  • And for sure also thanks to the Github Team to implement this amount of cache size for the users, not getting mad at us for getting on your nerves constantly always asking for more.

GHA came a long way and I think two years later has become one of the tools that people don't want to miss on this platform. Cheers.

@smorimoto
Copy link
Contributor Author

Thank you for your kind words. The GitHub team discussed this topic frequently with me outside of this thread as well to resolve this matter. They are always committed to improving the platform, and they are fantastic people who don't hesitate to tackle the hardest part of explaining to management. Kudos to you all!

@gstokkink
Copy link
gstokkink commented Jan 15, 2024

Is there any chance of increasing the cache limit beyond 10GB? For larger monorepos with many concurrent builds this is far too small, and they're being effectively being penalised for using one repository instead of multiple.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

0