-
Notifications
You must be signed in to change notification settings - Fork 1.3k
The cache limit is too small #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Yes please remove individiual cache limit. 2GB per repo is reasonable. |
For our middle-sized react-native project, archived node_modules directory is 260 MB and CocoaPods directory is 202 MB |
Typically python virtualenv are also >500MB these days. These are cached to avoid re-compiling some modules each time like lxml, numpy etc. |
In my opinion, 2GB per repository is still small for medium ~ large project. |
The file size limit is defined here, so I think we can easily remove the individual limit. but, we can't do anything about the limit per repository. 2GB is probably enough for JavaScript developer, but not so good for a native developer like me. |
I would appreciate a higher limit (e.g. 5BG) for enterprise/paying customers as wel. |
Well, my guess is that you will be able to increase the cache limit by charging like Git LFS. |
We are working on the long term plan for how we will enable larger limits. Charging for it like we do for packages or actions artifacts is something we are considering. |
The compiled deps for my small Rust project is 500 MB, the cache will need to be considerably larger to support Rust. |
For what it's worth, a project I'm on has a 766MB |
Well, I understand that there is such an example when compressed. But other existing CI providers do not limit individual cache size. (Although they recommend keep it under 500MB.) I think there is no need to limit it. |
That's correct, the limit is after we |
This is a great start, but we've also hit the limit before trying. We have a monorepo, four apps. The total is around 600 MB. We also need the ability to cache the |
Couldn't github implement a cross-repository deduplication system for cached assets, if storage costs are a problem ? |
Especially for a lot of patterns like specific paths in things like programming language compiled objects, node_modules subdirectories, etc... are all ripe for de-duplication in a very very efficient way if made in such a way that the patterns are known, and that can then be made generic. |
It's easy to say about deduplication. However, it is more difficult to make an effective system in this area that will work efficiently for many small files. If deduplication will be performed at the data block level, this solution is ineffective in the case of data compression. If deduplication will be performed at the file level, it is easy to achieve a large communication overhead. In this way, the GitHub team opens up a huge problem, which should rather be the task of the team responsible for Azure Storage Blob service. |
On my first attempt to use this to cache docker layers I hit the file limit.
|
@tuler , have you tried to fork action? Is the limit verified also on the GitHub side? |
945984259 bytes is under 2 GB per repo limit. I saw this code for per file limit, hence I wonder if the limit is also verified on the server side. |
Per file and repo limit are verified server-side. The per-file limit in the action is to avoid an unnecessary upload that the server will reject. |
When do you think you can remove individual limit? (if you will do) |
On a react native project, .tgz cache file for yarn cache is |
For reference: installing the latest haskell compiler, runtime and standard libraries (which is necessary for every compilation) takes up 1.59 GB So even the 2 GB repo-wide limit would hardly fit an actual project cache (which probably has dozens of dependencies) |
@imbsky |
GitHub's ongoing issue of limitting the cache size has recently been fixed (actions/cache#6), so this PR create a combined Clang+GCC cache for separate 32-bit and 64-bit architectures under Windows.
Yeah, you're right. I meant "the current implementation calling the tar command is slow". |
GitHub's ongoing issue of limitting the cache size has recently been fixed (actions/cache#6), so this PR create a combined Clang+GCC cache for separate 32-bit and 64-bit architectures under Windows.
GitHub's ongoing issue of limitting the cache size has recently been fixed (actions/cache#6), so this PR create a combined Clang+GCC cache for separate 32-bit and 64-bit architectures under Windows.
2GB seems relatively enough when running tests on only one operating system, but when running tests on three operating systems, 2GB is not enough and feels like a lot of cache is wasted each time. What do other people think? |
Also, this is an issue that the cache limit is not enough, and I feel that this issue should not be closed just because cache actions can handle huge sizes. Because the two problems are completely different. |
Yes. We cannot cache the large and time-consuming Clang installation on macOS using MacPorts because it will evict our even heavier and more time-consuming Clang caches under 32bit and 64bit MSYS2 (native install is ~15 min vs 2 min to extract from cache). |
@imbsky we are collecting data on cache usage across the service and evaulating that to determine of we can raise the individual repo limits. As far as paid options go, we already have paid storage of artifacts and we are looking at including cache storage as part of that overall offer. |
I see! That sounds good. 2GB is definitely better than before, so I will wait for a little more. |
Is there any news on that? We are building a Rust project across 3 operating systems and each one would need a cache of 1.7GB. Meaning the caches invalidate themselves constantly, resulting in them not being useful. |
I just opened this as a new discussion. It may change if there are many demands. #497 |
I think what might be useful here is actually making a docker build and cache the intermediate docker image with cargo chef. |
For Node.js projects wishing to reduce the size of what is cached, I suggest looking into Yarn Zero-Installs. Yarn allows you to transparently check zipped versions of your dependencies into your repository directly. That empowers you to cache an empty file signifying that you have checked the cache against the canonical registry, which can dramatically improve the performance of your CI pipeline. Here is an example of the GitHub Actions steps required: - name: Cache the fact that we have checked the yarn cache.
id: yarn-cache
uses: actions/cache@v2.1.6
with:
path: .cacheChecked
key: yarn-${{ runner.os }}-${{ hashFiles('yarn.lock') }}
restore-keys: |
yarn-${{ runner.os }}-
- name: Install dependencies without refetching on cache hit.
if: ${{ steps.yarn-cache.outputs.cache-hit == 'true' }}
run: yarn install --immutable --immutable-cache
- name: Install dependencies, refetching on cache miss for added security.
if: ${{ steps.yarn-cache.outputs.cache-hit != 'true' }}
run: |
# See https://yarnpkg.com/features/zero-installs#does-it-have-security-implications
yarn install --immutable --immutable-cache --check-cache
touch .cacheChecked |
Happy to announce that today we shipped cache size increase from 5GB per repo to 10GB. 🚀 🚀 Hope you can now unlock many more scenarios to run GitHub Action workflows faster by caching even bigger dependencies and other commonly reused files from previous jobs :) |
Closing this issue now as the size has been increased to 10 GB now. |
Recap I took part in this discussion in November 2019, so we came a long way.
I think it's reasonable to thank a few people at this stage:
GHA came a long way and I think two years later has become one of the tools that people don't want to miss on this platform. Cheers. |
Thank you for your kind words. The GitHub team discussed this topic frequently with me outside of this thread as well to resolve this matter. They are always committed to improving the platform, and they are fantastic people who don't hesitate to tackle the hardest part of explaining to management. Kudos to you all! |
Is there any chance of increasing the cache limit beyond 10GB? For larger monorepos with many concurrent builds this is far too small, and they're being effectively being penalised for using one repository instead of multiple. |
I understand that Issue is not related to the code of this repository, but I would like to discuss with many people, so I open Issue here. (and I know community forums exist. but many people probably don't know yet.)
First, I really appreciate the GitHub team for adding the cache feature to GitHub Actions. That's great for us! But, In recent years,
node_modules
is too large. 200MB can't cover it. It's the same in other languages. For example, using esy to install the opam packages, it can easily exceed 800MB. Is there a way to increase the cache limit? or if individual cache limits are removed, it becomes a relatively realistic limit. I know that if the file size is too large, save/restore may insanely slow down. but it shouldn't be limited on the cache action side.The text was updated successfully, but these errors were encountered: