8000 [REFACTOR] Cgroup Interface (cgroupv1 and cgroupv2 initialization) by rafaeldtinoco · Pull Request #2233 · aquasecurity/tracee · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[REFACTOR] Cgroup Interface (cgroupv1 and cgroupv2 initialization) #2233

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Nov 22, 2022
Merged

[REFACTOR] Cgroup Interface (cgroupv1 and cgroupv2 initialization) #2233

merged 4 commits into from
Nov 22, 2022

Conversation

rafaeldtinoco
Copy link
Contributor
@rafaeldtinoco rafaeldtinoco commented Oct 13, 2022

Description (git log)

This is a refactor triggered by comment #2200 (comment). Tracee needs to always initialize cgroupv1 and/or groupv2 when needed. Cgroupv2 is always needed for cgroup ebpf programs.

Type of change

  • Bug fix (non-breaking change fixing an issue, preferable).
  • Quick fix (minor non-breaking change requiring no issue, use with care)
  • Code refactor (code improvement and/or code removal)
  • New feature (non-breaking change adding functionality).
  • Breaking change (cause existing functionality not to work as expected).

How Has This Been Tested?

sudo ./dist/tracee-ebpf --install-path /tmp/tracee --cache cache-type=mem --cache mem-cache-size=512 --output format:json --output option:parse-arguments --output option:detect-syscall --containers --trace container --trace comm=bash --trace follow --capabilities bypass=false | jq -c

Note: Set bypass to false so capabilities are applied:

# cat /proc/$(pidof tracee-ebpf)/status | grep Cap
CapInh: 0000000000000000
CapPrm: 000001ffffffffff
CapEff: 0000000000000000
CapBnd: 0000000000000000
CapAmb: 0000000000000000

@rafaeldtinoco rafaeldtinoco changed the title Cgroup initialization REFACTOR: Cgroup Interface (cgroupv1 and cgroupv2 initialization) Oct 13, 2022
@rafaeldtinoco
Copy link
Contributor Author
rafaeldtinoco commented Oct 13, 2022

@NDStrahilevitz enrichment is still failing (I missed something but likely easy to fix) but I would like your opinion in this change (about the refactoring), considering comments from @yanivagman saying we should mount cgroupv2 (discussion at: #2200 (comment)).

For your opinion: consider this is a draft, needs fixes and cleanup. I would like to hear from the overall idea mostly, and then I can have PR: #2200 merged and clean this one up so we can merge this and have cgroupv2 mounting done as requested.

The reasoning for the refactor (including probes refactor) is that cgroup probes now will also need to have cgroup directory mounted (cgroupv2 in that case) and the mounting/unmounting should be re-used (and not only used by containers package).

Copy link
Collaborator
@NDStrahilevitz NDStrahilevitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome refactor Rafael, I really like the new cgroup package split.
Left some questions and notes, specifically the cgroupv2 requirement concerns me since this complete drop of support may leave some customers in a problem (this is relevant to the networking PR as well since we should fail the probes on unsupported systems). (Ok clarified this with @yanivagman, I was missing relevant knowledge on cgroupv2 support).

I do think it would be a good idea to just merge this before the networking PR, cherry-pick them on top of main, and then rebase the network PR, because while I tried going through it commit by commit, it would be easier if I had it clean from the netowrk PR diffs.

@rafaeldtinoco
Copy link
Contributor Author

I do think it would be a good idea to just merge this before the networking PR, cherry-pick them on top of main, and then rebase the network PR, because while I tried going through it commit by commit, it would be easier if I had it clean from the netowrk PR diffs.

Yes, I know it got polluted, as the network code is a bit big. Will work in merging this one first (right on it).

@rafaeldtinoco
Copy link
Contributor Author

The test is failing in RHEL8 because I believe we currently have a bug at the capabilities logic.

So we are keeping CAP_SYS_ADMIN in NEW kernels (shouldn't it be dropped ?):

image

AND for OLD kernels (that seems correct):

image

and dropping it for RHEL8 only (this is correct, but by accident):

image

I believe the capabilities feature is currently broken.

@rafaeldtinoco
Copy link
Contributor Author

This PR is held until #2243 is merged.

@rafaeldtinoco rafaeldtinoco changed the title REFACTOR: Cgroup Interface (cgroupv1 and cgroupv2 initialization) [REFACTOR] Cgroup Interface (cgroupv1 and cgroupv2 initialization) Oct 18, 2022
@rafaeldtinoco rafaeldtinoco marked this pull request as ready for review October 18, 2022 04:12
@rafaeldtinoco rafaeldtinoco self-assigned this Oct 18, 2022
@rafaeldtinoco rafaeldtinoco added this to the v0.9.0-rc1 milestone Oct 18, 2022
@rafaeldtinoco
Copy link
Contributor Author
rafaeldtinoco commented Oct 19, 2022

This PR is built on top of #2243 (which needs to be merged first). That PR takes care of the "new capabilities" feature (privilege rings) and this one uses it for mounting and unmounting cgroupv1 and cgroupv2 during tracee startup/shutdown time.

TODO: need to fix integration tests for calling Tracee Clean properly (to umount temporary cgroupfs directories).

rafaeldtinoco added a commit that referenced this pull request Oct 24, 2022
This is a temporary mitigation so cgroupv1 directory can be mounted
during docker container initialization. The full code, mounting
cgroupv1, cgroupv2 and unmounting them at the end of the execution is
being worked at PR: #2233.
@yanivagman yanivagman modified the milestones: v0.9.0-rc1, v.0.10.0 Oct 26, 2022
@rafaeldtinoco
Copy link
Contributor Author

This should be reviewed before #2200 because that PR needs this one first (and contains the same commits for the time being).

Copy link
Collaborator
@NDStrahilevitz NDStrahilevitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any obvious issues here, only substantial comments I have are about some naming and perhaps a different "construction" structure for the cgroup module, up to you if you find it better, but LGTM for merging, not going to block on style.

Copy link
Collaborator
@yanivagman yanivagman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will finish reviewing this one later, but wanted to post some comments I already have

< 67F4 include-fragment data-nonce="v2:de7ca814-4848-e098-534d-f6d700f46cfa" data-view-component="true">
@rafaeldtinoco
Copy link
Contributor Author

@NDStrahilevitz could you check the changes I made ? I believe I have addressed what you suggested. I preferred to use a "Cgroups" object to contain both "Cgroup" objects (cgroupv1 and cgroupv2). This way the initialization from the caller perspective is easier (and the cleanup as well).

Let me know if you have any other comments pls.

Note: I believe I address all Yaniv's comments in this version as well (mostly were about code duplication, which was removed in this new commit).

Copy link
Contributor
@josedonizetti josedonizetti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rafaeldtinoco Added a question about an if that seems would never work, and a couple of nits, feel free to ignore them if you don't agree.

@rafaeldtinoco
Copy link
Contributor Author

@josedonizetti thanks for the review Jose, I have addressed most of your nits and suggestions and pushed a new commit. I'm using new commits because I'm rebasing the new-network PR on top of this one.

@rafaeldtinoco
Copy link
Contributor Author

Okay @yanivagman, I guess it is on you now. Let me know if you have any suggestions/concerns, please. Thanks.

Copy link
Collaborator
@yanivagman yanivagman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I finished reviewing this one.
I like the refactor you've done, but there are a few points that might cause tracee to work on less environments. Please see my comments below.

Also, I think it would have been easier if you would have made one refactor PR (with no logic changes at all), and another one with your new suggested changes. We can merge the refactor quickly, and then discuss the required changes for the cgroup stuff. Your call if you want to do this split or not.


DefaultCgroupController = "cpuset"

TmpCgroup = "/tmp"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: TmpCgroupPath might be a better name

Also, won't it be better if we use traceeInstallPath instead (set to /tmp/tracee by default)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this brings to a good topic... we should use default OS temp directory for temporary files (like these and PID files) AND, perhaps, rename "tmpdir" to something else (it is not a temporary directory, it is an artifacts output directory). No ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if we want to use a "global tmpdir", then we should have global variable to it (instead of having it as part of Tracee object). Or else we will have to pass it as an argument for all packages using it.

C273
// discover the default cgroup being used
defaultVersion, err := GetCgroupDefaultVersion()
if err != nil {
return nil, err
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will fail tracee from running. But is that what we really want?
We can keep running with reduced feature set (e.g. container enrichment might not work), assume cgroup v2 and log the error

This comment is true for most of the errors returned by this function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe for not to fail in the inexistence of a cgroupfs we should review containers.go as it might depend on its existence (the change would have to be coordinated among the two). I have opened #2371 to be checked after this PR is merged (it is blocking the new network code for sometime now).

- pkg/mount:

  MountOnce will make sure a given source and filesystem type are
  mounted just once: it will check if given source and fs type are
  already mounted and, if not, it will mount it (in a temporary
  directory) and manage it (umounting at its destruction). If
  already mounted, the filesystem is left untouched at object's
  destruction.

- pkg/cgroup:

  1. Identifiers (and mount if needed) cgroupv1
  2. Identifiers (and mount if needed) cgroupv2
  3. Identifies the default cgroup version being used
  4. Discovers the default cgroup version hierarchy ID
- pkg/cgroup: abstract cgroupv1 and cgroupv2
- pkg/mount: mount cgroup dirs if needed
- pkg/capabilities: raise privs when mount/umount are needed
Copy link
Collaborator
@yanivagman yanivagman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing all the comments @rafaeldtinoco
LGTM now

@rafaeldtinoco rafaeldtinoco merged commit 1ae9b1c into aquasecurity:main Nov 22, 2022
@rafaeldtinoco rafaeldtinoco deleted the cgroup-initialization branch November 22, 2022 12:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0