-
Notifications
You must be signed in to change notification settings - Fork 449
[REFACTOR] Cgroup Interface (cgroupv1 and cgroupv2 initialization) #2233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REFACTOR] Cgroup Interface (cgroupv1 and cgroupv2 initialization) #2233
Conversation
@NDStrahilevitz enrichment is still failing (I missed something but likely easy to fix) but I would like your opinion in this change (about the refactoring), considering comments from @yanivagman saying we should mount cgroupv2 (discussion at: #2200 (comment)). For your opinion: consider this is a draft, needs fixes and cleanup. I would like to hear from the overall idea mostly, and then I can have PR: #2200 merged and clean this one up so we can merge this and have cgroupv2 mounting done as requested. The reasoning for the refactor (including probes refactor) is that cgroup probes now will also need to have cgroup directory mounted (cgroupv2 in that case) and the mounting/unmounting should be re-used (and not only used by containers package). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome refactor Rafael, I really like the new cgroup package split.
Left some questions and notes, specifically the cgroupv2 requirement concerns me since this complete drop of support may leave some customers in a problem (this is relevant to the networking PR as well since we should fail the probes on unsupported systems). (Ok clarified this with @yanivagman, I was missing relevant knowledge on cgroupv2 support).
I do think it would be a good idea to just merge this before the networking PR, cherry-pick them on top of main, and then rebase the network PR, because while I tried going through it commit by commit, it would be easier if I had it clean from the netowrk PR diffs.
Yes, I know it got polluted, as the network code is a bit big. Will work in merging this one first (right on it). |
The test is failing in RHEL8 because I believe we currently have a bug at the capabilities logic. So we are keeping CAP_SYS_ADMIN in NEW kernels (shouldn't it be dropped ?): AND for OLD kernels (that seems correct): and dropping it for RHEL8 only (this is correct, but by accident): I believe the capabilities feature is currently broken. |
This PR is held until #2243 is merged. |
This PR is built on top of #2243 (which needs to be merged first). That PR takes care of the "new capabilities" feature (privilege rings) and this one uses it for mounting and unmounting cgroupv1 and cgroupv2 during tracee startup/shutdown time. TODO: need to fix integration tests for calling Tracee Clean properly (to umount temporary cgroupfs directories). |
This is a temporary mitigation so cgroupv1 directory can be mounted during docker container initialization. The full code, mounting cgroupv1, cgroupv2 and unmounting them at the end of the execution is being worked at PR: #2233.
This should be reviewed before #2200 because that PR needs this one first (and contains the same commits for the time being). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any obvious issues here, only substantial comments I have are about some naming and perhaps a different "construction" structure for the cgroup module, up to you if you find it better, but LGTM for merging, not going to block on style.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will finish reviewing this one later, but wanted to post some comments I already have
@NDStrahilevitz could you check the changes I made ? I believe I have addressed what you suggested. I preferred to use a "Cgroups" object to contain both "Cgroup" objects (cgroupv1 and cgroupv2). This way the initialization from the caller perspective is easier (and the cleanup as well). Let me know if you have any other comments pls. Note: I believe I address all Yaniv's comments in this version as well (mostly were about code duplication, which was removed in this new commit). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rafaeldtinoco Added a question about an if that seems would never work, and a couple of nits, feel free to ignore them if you don't agree.
@josedonizetti thanks for the review Jose, I have addressed most of your nits and suggestions and pushed a new commit. I'm using new commits because I'm rebasing the new-network PR on top of this one. |
Okay @yanivagman, I guess it is on you now. Let me know if you have any suggestions/concerns, please. Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I finished reviewing this one.
I like the refactor you've done, but there are a few points that might cause tracee to work on less environments. Please see my comments below.
Also, I think it would have been easier if you would have made one refactor PR (with no logic changes at all), and another one with your new suggested changes. We can merge the refactor quickly, and then discuss the required changes for the cgroup stuff. Your call if you want to do this split or not.
pkg/cgroup/cgroup.go
Outdated
|
||
DefaultCgroupController = "cpuset" | ||
|
||
TmpCgroup = "/tmp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: TmpCgroupPath might be a better name
Also, won't it be better if we use traceeInstallPath instead (set to /tmp/tracee by default)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this brings to a good topic... we should use default OS temp directory for temporary files (like these and PID files) AND, perhaps, rename "tmpdir" to something else (it is not a temporary directory, it is an artifacts output directory). No ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if we want to use a "global tmpdir", then we should have global variable to it (instead of having it as part of Tracee object). Or else we will have to pass it as an argument for all packages using it.
// discover the default cgroup being used | ||
defaultVersion, err := GetCgroupDefaultVersion() | ||
if err != nil { | ||
return nil, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will fail tracee from running. But is that what we really want?
We can keep running with reduced feature set (e.g. container enrichment might not work), assume cgroup v2 and log the error
This comment is true for most of the errors returned by this function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe for not to fail in the inexistence of a cgroupfs we should review containers.go as it might depend on its existence (the change would have to be coordinated among the two). I have opened #2371 to be checked after this PR is merged (it is blocking the new network code for sometime now).
- pkg/mount: MountOnce will make sure a given source and filesystem type are mounted just once: it will check if given source and fs type are already mounted and, if not, it will mount it (in a temporary directory) and manage it (umounting at its destruction). If already mounted, the filesystem is left untouched at object's destruction. - pkg/cgroup: 1. Identifiers (and mount if needed) cgroupv1 2. Identifiers (and mount if needed) cgroupv2 3. Identifies the default cgroup version being used 4. Discovers the default cgroup version hierarchy ID
- pkg/cgroup: abstract cgroupv1 and cgroupv2 - pkg/mount: mount cgroup dirs if needed - pkg/capabilities: raise privs when mount/umount are needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing all the comments @rafaeldtinoco
LGTM now
Description (git log)
This is a refactor triggered by comment #2200 (comment). Tracee needs to always initialize cgroupv1 and/or groupv2 when needed. Cgroupv2 is always needed for cgroup ebpf programs.
Type of change
How Has This Been Tested?
Note: Set bypass to false so capabilities are applied: