-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Fix Copy-On-Write cause memory waste #608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Sentry implement COW in pma, but its not on 4K page, current COW process in pma will copy the range base HugePage, this will cause lots memory waste. Revise the COW process in pma base 4K page, make sentry COW consume the same memory as kernel.
memory waste could be reproduce with the following case. #include <sys/types.h> #define TEST_SIZE (1024*4096) int main() ptr = mmap(NULL, TEST_SIZE, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON, -1, 0); for (idx=0;idx<50;idx++) { printf("[parent] pid %d sleep start\n",getpid()); Inside runsc container, all process will use about 123M memory, inside runc container, only used 7M memory. execute free inside runsc container. #docker exec -it test free execute free inside runc container. #docker exec -it test free |
Breaking copy-on-write on a granularity greater than a single page is intentional. Sentry-handled page faults can be quite expensive; expanding COW-break significantly reduces their frequency in many cases. In fact, we previously switched from per-page COW-break to 2MB COW-break to fix a user-observed performance regression (from switching from whole-pma COW-break to per-page COW-break). Can you give more details about the workload you have that is affected by this? |
@nixprime I just run two of our container inside test environment (not all process has started), a runc and a runsc, after a while when the container is running stable(about 17 process is running inside the container), I got memory information from cgroup and proc about the containers. 1. Drop all cache echo 3 > /proc/sys/vm/drop_caches 2. cgroup memory.usage_in_bytes and memory.stat for runc ################### ################### 4. cgroup memory.usage_in_bytes and memory.stat for runsc ################### ################### 5. status of runsc sandbox and gofer process |
I also tried another way. runc centos container [root@6870c697c6d5 /]# free runsc centos container [root@058437aebf07 /]# free |
I think we may be stalled here. There are lots of reasons to avoid doing per-page copy-on-write (excessive performance overhead) but also good reasons to avoid wasting memory by doing large regions. Here is my proposal: What if each PMA tracked the total number of COW-faults, and used (1 << max(16, p.cowFaults+12)) as the amount to fault? This will turn the first 64k region into ~4 faults, and subsequent faults will do the MapUnit size. I think this should capture the simple bash use cases (not wasting such large regions) while avoiding a big performance cost. If there's still a lot of waste, there are pretty easy tweaks to experiment with here, e.g. (1 << max(16, min(12, p.cowFaults)) // This provides 12 single page faults before growing. I think we could probably come up with some good compromises here that will avoid high overheads due to faulting but also avoid wasting memory. |
Checking in on this. Is the proposal of interest? @nixprime |
It is not clear that linking COW-break granularity to PMAs (or VMAs) would be sufficient to avoid regressing workloads that are sensitive to this (the example we saw was a particular application's startup time); it would be the responsibility of someone proposing such a change to at least prove that it does not affect any of our benchmarks. |
Can we construct appropriate definitions for the benchmarks we care about here? |
Is this still active? Alternative proposal #2: We could just make COWUnitSize a parameter of the platform. Some platforms can handle faults much cheaper than others (e.g. KVM). The KVM platform could just use 4k as the COWUnitSize, and the others can use the same MapUnitSize as currently defined. |
This pull request is stale because it has been open 90 days with no activity. Remove the stale label or comment or this will be closed in 30 days. |
Sentry implement COW in pma, but its not on 4K page,
current COW process in pma will copy the range base
HugePage, this will cause lots memory waste.
Revise the COW process in pma base 4K page, make
sentry COW consume the same memory as kernel.