8000 Add support for kernel module split BTF by roblabla · Pull Request #1248 · aya-rs/aya · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Add support for kernel module split BTF #1248

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

roblabla
Copy link
@roblabla roblabla commented Apr 10, 2025

The kernel splits kernel modules into dedicated BTF files, using "split BTF mode". In split BTF mode, each split BTF may reference the "base BTF" (AKA the vmlinux BTF), but may not reference each-other. The way it works is fairly simple: you can combine a base BTF and split BTF by concatenating their strings and types together, creating a working merged BTF.

Where things get dicey is when merging a base BTF with multiple split BTFs. Because every split BTF will start from the same offset, their string offsets and type IDs will need to be "rebased" to a new location when merging it into the output BTF.

This commit adds support for merging one base BTF and multiple split BTF into a single merged BTF, allowing eBPFs to manipulate structs that come from kernel modules. It also reworks Btf::from_sys_fs to make use of this capacity.

Fixes #300


This change is Reviewable

Copy link
netlify bot commented Apr 10, 2025

Deploy Preview for aya-rs-docs ready!

Built without sensitive environment variables

Name Link
🔨 Latest commit 169153c
🔍 Latest deploy log https://app.netlify.com/sites/aya-rs-docs/deploys/67f7fc8c99243b0008b54cd4
😎 Deploy Preview https://deploy-preview-1248--aya-rs-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@mergify mergify bot added the aya-obj Relating to the aya-obj crate label Apr 10, 2025
@roblabla roblabla force-pushed the split-btf branch 2 times, most recently from 5402835 to 1d71a20 Compare April 10, 2025 10:38
@tamird
Copy link
Member
tamird commented Apr 10, 2025

Thanks for the PR! I notice there aren't any new tests. Have you thought about how we might test this?

@roblabla
Copy link
Author

Have you thought about how we might test this?

From a unit testing POV, we could have some simple checks making sure the logic to merge BTFs is correct. We could generate a base BTF and X split BTFs, merge them, and make sure that the merged btf is fully consistent.

From an integration test perspective, we could try loading an eBPF that depends on a kernel module. That's what my original use-case was. We could probably load an eBPF program that does something like:

return bpf_core_type_exists(struct btrfs_inode);

to make sure that the eBPF program actually has access to kernel module structs.


Would that be an acceptable testing strategy? I can work on implementing them.

@mergify mergify bot added the test A PR that improves test cases or CI label Apr 10, 2025
@roblabla roblabla force-pushed the split-btf branch 2 times, most recently from fb9813f to 86f6d4b Compare April 10, 2025 14:29
Copy link
Member
@tamird tamird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 7 files reviewed, 14 unresolved discussions


aya-obj/src/btf/btf.rs line 274 at r2 (raw file):

    pub fn merge_split_btfs(base_btf: &Btf, split_btfs: &[Btf]) -> Btf {
        // Create a Btf from all the btfs.
        let mut out_btf = base_btf.clone();

nit: pass base_btf by value, let the caller decide how to manage this cloning


aya-obj/src/btf/btf.rs line 276 at r2 (raw file):

        let mut out_btf = base_btf.clone();

        let mut rebase_info = BtfRebaseInfo {

instead of making this mut, consider creating rebase_info in the loop.


aya-obj/src/btf/btf.rs line 282 at r2 (raw file):

            types_new_offset: out_btf.types.types.len() as u32,
        };
        for btf in split_btfs {

can you destructure this? it's quite hard to see that all the proper fields are being used

Code quote:

btf 

aya-obj/src/btf/btf.rs line 327 at r2 (raw file):

    }

    /// Loads BTF metadata from `/sys/kernel/btf`.

please add back the trailing /


aya-obj/src/btf/btf.rs line 337 at r2 (raw file):

            .flatten()
            .filter_map(|v| v.ok())
            .filter(|v| v.file_name() != "vmlinux");

this is discarding errors. why is that OK?

Code quote:

        let dir_iter = std::fs::read_dir("/sys/kernel/btf")
            .ok()
            .into_iter()
            .flatten()
            .filter_map(|v| v.ok())
            .filter(|v| v.file_name() != "vmlinux");

aya-obj/src/btf/btf.rs line 341 at r2 (raw file):

            match entry.file_type() {
                Ok(v) if !v.is_file() => continue,
                Err(_err) => continue,

why?


aya-obj/src/btf/btf.rs line 346 at r2 (raw file):

            match Btf::parse_file(entry.path(), Endianness::default()) {
                Ok(v) => split_btfs.push(v),
                // Ignore errors - the goal is to enhance the base BTF with as

This isn't explaining why ignoring errors is ok.


aya-obj/src/btf/btf.rs line 352 at r2 (raw file):

        }

        if split_btfs.len() > 0 {

do you need this conditional? why?


aya-obj/src/btf/types.rs line 16 at r2 (raw file):

    /// Offsets below this value are considered to be part of the "base BTF",
    /// and as such should not be relocated.
    pub(crate) str_rebase_from: u32,

can we use usize for all of these? it's an internal type, i think usize would be more convenient


aya-obj/src/btf/types.rs line 21 at r2 (raw file):

    /// Indices below this value are considered to be part of the "base BTF",
    /// and as such should not be relocated.
    pub(crate) types_rebase_from: u32,

these can probably be pub (rather than pub(crate))


aya-obj/src/btf/types.rs line 23 at r2 (raw file):

    pub(crate) types_rebase_from: u32,
    /// The new starting offset for strings.
    pub(crate) str_new_offset: u32,

consider extracting an inner type and storing 2 of them: one for strings, one for types. This would reduce the duplication in the doc comments.


aya-obj/src/btf/types.rs line 30 at r2 (raw file):

impl BtfRebaseInfo {
    fn rebase_str(&self, str_offset: u32) -> u32 {
        if str_offset < self.str_rebase_from {

this can be a bit pithier:

match str_offset.checked_sub(self.str_rebase_from) {
None => str_offset,
Some(str_offset) => self.str_new_offset + str_offset
}

this should be a method on the type I suggested extracting above. Then you can also write let Self { rebase_from, new_offset } = self at the top instead of writing self.xxx.


aya-obj/src/btf/types.rs line 91 at r2 (raw file):

    }

    pub(crate) fn rebase(&self, rebase_info: &BtfRebaseInfo) -> Self {

in all these methods: could you destructure self?


aya-obj/src/btf/types.rs line 92 at r2 (raw file):

    pub(crate) fn rebase(&self, rebase_info: &BtfRebaseInfo) -> Self {
        Fwd {

can this be Self { instead of Fwd? here and everywhere.

@roblabla roblabla force-pushed the split-btf branch 2 times, most recently from 640e446 to aeea582 Compare April 10, 2025 15:15
Copy link
Author
@roblabla roblabla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 7 files reviewed, 14 unresolved discussions (waiting on @tamird)


aya-obj/src/btf/btf.rs line 337 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

this is discarding errors. why is that OK?

So, if we successfully read /sys/kernel/btf/vmlinux and fail on some other module BTF, it'd kinda suck to be completely locked out of using the working BTFs just because one module doesn't work. Especially since the vast majority of eBPFs out there won't need those split BTF to load properly. So I made this function ignore the broken modules/skip the BTF it cannot read.

I can definitely change it to be stricter.


aya-obj/src/btf/btf.rs line 341 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

why?

Same as above.


aya-obj/src/btf/btf.rs line 346 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

This isn't explaining why ignoring errors is ok.

Same as above, I can put the reasoning in the comment.


aya-obj/src/btf/btf.rs line 352 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

do you need this conditional? why?

This is to avoid the cloning done in merge_split_btfs. It's a simple (if perhaps premature) optimization.


aya-obj/src/btf/types.rs line 16 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

can we use usize for all of these? it's an internal type, i think usize would be more convenient

u32 is the datatype used all over the place in BtfType, so using a u32 makes a whole lot more sense here. If I use usize, then I'll end up needing to do casting all over the place, which would definitely be less convenient.

Copy link
Member
@tamird tamird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 7 files reviewed, 13 unresolved discussions (waiting on @roblabla)


aya-obj/src/btf/btf.rs line 337 at r2 (raw file):

Previously, roblabla (Robin Lambertz) wrote…

So, if we successfully read /sys/kernel/btf/vmlinux and fail on some other module BTF, it'd kinda suck to be completely locked out of using the working BTFs just because one module doesn't work. Especially since the vast majority of eBPFs out there won't need those split BTF to load properly. So I made this function ignore the broken modules/skip the BTF it cannot read.

I can definitely change it to be stricter.

I think we should be stricter. There may be certain kinds of errors we'd want to allow and silence, but the default position is that we don't discard errors.


aya-obj/src/btf/btf.rs line 352 at r2 (raw file):

Previou 10000 sly, roblabla (Robin Lambertz) wrote…

This is to avoid the cloning done in merge_split_btfs. It's a simple (if perhaps premature) optimization.

See my other comment; if you make that function take its argument by value you'd avoid needing this special case.


aya-obj/src/btf/types.rs line 16 at r2 (raw file):

Previously, roblabla (Robin Lambertz) wrote…

u32 is the datatype used all over the place in BtfType, so using a u32 makes a whole lot more sense here. If I use usize, then I'll end up needing to do casting all over the place, which would definitely be less convenient.

OK.

@roblabla roblabla force-pushed the split-btf branch 2 times, most recently from 89513cc to ef3d529 Compare April 10, 2025 16:53
Copy link
Author
@roblabla roblabla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 7 files reviewed, 12 unresolved discussions (waiting on @tamird)


aya-obj/src/btf/btf.rs line 276 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

instead of making this mut, consider creating rebase_info in the loop.

Done.


aya-obj/src/btf/btf.rs line 282 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

can you destructure this? it's quite hard to see that all the proper fields are being used

Done.


aya-obj/src/btf/btf.rs line 327 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

please add back the trailing /

Done.


aya-obj/src/btf/btf.rs line 337 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

I think we should be stricter. There may be certain kinds of errors we'd want to allow and silence, but the default position is that we don't discard errors.

Done.


aya-obj/src/btf/btf.rs line 341 at r2 (raw file):

Previously, roblabla (Robin Lambertz) wrote…

Same as above.

Done.


aya-obj/src/btf/btf.rs line 346 at r2 (raw file):

Previously, roblabla (Robin Lambertz) wrote…

Same as above, I can put the reasoning in the comment.

Done, doesn't ignore errors anymore.


aya-obj/src/btf/btf.rs line 352 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

See my other comment; if you make that function take its argument by value you'd avoid needing this special case.

Done.


aya-obj/src/btf/types.rs line 21 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

these can probably be pub (rather than pub(crate))

Done.


aya-obj/src/btf/types.rs line 23 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

consider extracting an inner type and storing 2 of them: one for strings, one for types. This would reduce the duplication in the doc comments.

Done.


aya-obj/src/btf/types.rs line 30 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

this can be a bit pithier:

match str_offset.checked_sub(self.str_rebase_from) {
None => str_offset,
Some(str_offset) => self.str_new_offset + str_offset
}

this should be a method on the type I suggested extracting above. Then you can also write let Self { rebase_from, new_offset } = self at the top instead of writing self.xxx.

Done.


aya-obj/src/btf/types.rs line 91 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

in all these methods: could you destructure self?

Done.


aya-obj/src/btf/types.rs line 92 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

can this be Self { instead of Fwd? here and everywhere.

Done.

The kernel splits kernel modules into dedicated BTF files, using "split
BTF mode". In split BTF mode, each split BTF may reference the "base
BTF" (AKA the vmlinux BTF), but may not reference each-other. The way
it works is fairly simple: you can combine a base BTF and split BTF
by concatenating their strings and types together, creating a working
merged BTF.

Where things get dicey is when merging a base BTF with multiple split
BTFs. Because every split BTF will start from the same offset, their
string offsets and type IDs will need to be "rebased" to a new location
when merging it into the output BTF.

This commit adds support for merging one base BTF and multiple split BTF
into a single merged BTF, allowing eBPFs to manipulate structs that come
from kernel modules. It also reworks Btf::from_sys_fs to make use of
this capacity.
@dave-tucker
Copy link
Member

I've added myself to review this too - not sure I'll have time this week but 🤞
Believe it or not I also started working on this before KubeCon so I have plenty of context to add here.

At cursory glance:

  1. We shouldn't be loading all the module BTFs in from_sys_fs() since that's default. There's a recent issue about Aya memory usage and this will get out of hand pretty quickly. Instead we should be loading either as opt-in for specific kmods (via BpfLoader) or by detecting that we need to via checking which modules we need through kallsyms when attaching probes.
  2. We need relocation tests to ensure that relocations are working correctly for structs etc.. that are in module BTF
  3. I'm also not convinced that mutating the Base BTF is the best approach either. In my branch I opted to instead create a composite type CompositeBTF which comprised of a base BTF + module BTF. That's closer to how it's implemented in libbpf, but also makes it a little easier to test things out - for example, you can easily write unit tests for type resolution to show that you check module btf types before searching in the base btf.

@roblabla
Copy link
Author

We shouldn't be loading all the module BTFs in from_sys_fs() since that's default. There's a recent issue about Aya memory usage and this will get out of hand pretty quickly. Instead we should be loading either as opt-in for specific kmods (via BpfLoader)

I feel pretty strongly that since from_sys_fs is the default, it should be optimized for making things "just work", and so it should instead be opt-out if the user wants to reduce memory usage. For context, what prompted me to work on this is that I found out the following code failed silently:

if (bpf_core_type_exists(struct btrfs_inode)) {
    // do something
} else {
    // do something else
}

Ergo this would always take the else branch, due to Aya not loading the kernel module BTFs. This is a silent failure that is pretty unintuitive and hard to debug.

Also, your mileage may vary here, but from what I can see, vmlinux is usually the biggest BTF by multiple orders of magnitude, with the module BTF being much, much smaller. On my current kernel, the /sys/kernel/btf folder is 5.8MiB, where vmlinux takes 4.9MiB of those. So the additional module BTF is on the order of 10%. This of course will vary from distro to distro, but I don't think an extra MiB or two of memory in exchange for making sure all eBPFs program work without fuss is a bad default ^^.

or by detecting that we need to via checking which modules we need through kallsyms when attaching probes.

That would probably be the best of both worlds here, yes.

We need relocation tests to ensure that relocations are working correctly for structs etc.. that are in module BTF

Yes, I'm working on those.

I'm also not convinced that mutating the Base BTF is the best approach either. In my branch I opted to instead create a composite type CompositeBTF which comprised of a base BTF + module BTF. That's closer to how it's implemented in libbpf

So I tried something along those lines first, but quickly gave up as I realized it required changing quite a few abstraction in the relocation code. In particular, it means that type_id is no longer a unique identifier of a type. And similarly, getting the strings associated with a type requires knowing where th 8000 at type originated from. None of those are necessarily hard problems, but the changes ended up being more involved than I wanted, so I figured merging the BTFs back into a traditional single-file BTF was just easier.

, but also makes it a little easier to test things out - for example, you can easily write unit tests for type resolution to show that you check module btf types before searching in the base btf.

I'm not sure I understand what this means. For starters, I don't understand why you'd want to check module BTF types before searching the base BTF? As far as I understand, you shouldn't have any conflicts here (if you did, that would cause ambiguity - what if you had conflicts betwen different module BTFs?). Generally speaking, I think testing this approach is rather straightforward: you can just create a base BTF object and multiple "split" BTFs, merge them, and make sure everything still resolves properly.

@dave-tucker
Copy link
Member

We shouldn't be loading all the module BTFs in from_sys_fs() since that's default. There's a recent issue about Aya memory usage and this will get out of hand pretty quickly. Instead we should be loading either as opt-in for specific kmods (via BpfLoader)

I feel pretty strongly that since from_sys_fs is the default, it should be optimized for making things "just work", and so it should instead be opt-out if the user wants to reduce memory usage. For context, what prompted me to work on this is that I found out the following code failed silently:

if (bpf_core_type_exists(struct btrfs_inode)) {
    // do something
} else {
    // do something else
}

Ergo this would always take the else branch, due to Aya not loading the kernel module BTFs. This is a silent failure that is pretty unintuitive and hard to debug.

I agree that the silent failure there in unfortunate - in most cases I've seen related to missing split-BTF we poison the instruction and then the program fails to load, but obviously here that's not possible.

The fact remains though with the API you propose (changing from_sys_fs to load all the module BTF) there is no option to opt-out of that at all. This is what I was proposing, which also describes the changes to the default behaviour:
dave-tucker@88b98ec#diff-7705012b274301237c1a099d2fa7eb4691aa4d262cc909a344fc9c9eb7d18d07R351-R373

Also, your mileage may vary here, but from what I can see, vmlinux is usually the biggest BTF by multiple orders of magnitude, with the module BTF being much, much smaller. On my current kernel, the /sys/kernel/btf folder is 5.8MiB, where vmlinux takes 4.9MiB of those. So the additional module BTF is on the order of 10%. This of course will vary from distro to distro, but I don't think an extra MiB or two of memory in exchange for making sure all eBPFs program work without fuss is a bad default ^^.

or by detecting that we need to via checking which modules we need through kallsyms when attaching probes.

That would probably be the best of both worlds here, yes.

Feel free to pull in the first commit from https://github.com/dave-tucker/aya/tree/split-btf which adds kallsysms parsing.

We need relocation tests to ensure that relocations are working correctly for structs etc.. that are in module BTF

Yes, I'm working on those.

I'm also not convinced that mutating the Base BTF is the best approach either. In my branch I opted to instead create a composite type CompositeBTF which comprised of a base BTF + module BTF. That's closer to how it's implemented in libbpf

So I tried something along those lines first, but quickly gave up as I realized it required changing quite a few abstraction in the relocation code. In particular, it means that type_id is no longer a unique identifier of a type. And similarly, getting the strings associated with a type requires knowing where that type originated from. None of those are necessarily hard problems, but the changes ended up being more involved than I wanted, so I figured merging the BTFs back into a traditional single-file BTF was just easier.

type_id's not being unique across modules is a "feature" of module BTF unfortunately.
either way, the type_id is still unique across any composite btf (base + module).

, but also makes it a little easier to test things out - for example, you can easily write unit tests for type resolution to show that you check module btf types before searching in the base btf.

I'm not sure I understand what this means. For starters, I don't understand why you'd want to check module BTF types before searching the base BTF? As far as I understand, you shouldn't have any conflicts here (if you did, that would cause ambiguity - what if you had conflicts betwen different module BTFs?). Generally speaking, I think testing this approach is rather straightforward: you can just create a base BTF object and multiple "split" BTFs, merge them, and make sure everything still resolves properly.

AIUI you need this for forward declarations. The base BTF could declare FWD foo where STRUCT foo resides in a module BTF. When resolving type by name you'd want to find STRUCT foo from the module BTF first since the main BTF doesn't know what type it's supposed to be. It may be totally valid for 2 modules to have different declarations of the same FWD decl - which comes back to why having single CompositeBTF might be better than merging all the BTFs together in to one.

Basically, I'd rather we stick as close to what libbpf does as possible.

@roblabla
Copy link
Author

AIUI you need this for forward declarations. The base BTF could declare FWD foo where STRUCT foo resides in a module BTF. When resolving type by name you'd want to find STRUCT foo from the module BTF first since the main BTF doesn't know what type it's supposed to be. It may be totally valid for 2 modules to have different declarations of the same FWD decl - which comes back to why having single CompositeBTF might be better than merging all the BTFs together in to one.

This still feels off to me. Rather than this, wouldn't it make more sense to keep looking if we hit a FWD to see if we can find a STRUCT? This way, the behavior is consistent whether we are in a split BTF scenario or not.

@dave-tucker
Copy link
Member

AIUI you need this for forward declarations. The base BTF could declare FWD foo where STRUCT foo resides in a module BTF. When resolving type by name you'd want to find STRUCT foo from the module BTF first since the main BTF doesn't know what type it's supposed to be. It may be totally valid for 2 modules to have different declarations of the same FWD decl - which comes back to why having single CompositeBTF might be better than merging all the BTFs together in to one.

This still feels off to me. Rather than this, wouldn't it make more sense to keep looking if we hit a FWD to see if we can find a STRUCT? This way, the behavior is consistent whether we are in a split BTF scenario or not.

🤷 ultimately it comes back to 👇

I'd rather we stick as close to what libbpf does as possible

Or follow prior-art... the same pattern is used in cilium/ebpf too: https://github.com/cilium/ebpf/pull/1300/files#diff-82332939f914df35856d29498006ce845e5341b42f3b550179a4fa1c8a7dddcf

It seems having Merge() for BTF was considered and then discounted in cilium/ebpf#705
The discussion calls out a few cases that we need to also consider so it's worth a read.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aya-obj Relating to the aya-obj crate test A PR that improves test cases or CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for kernel module "split" BTFs
3 participants
0