8000 GitHub - rs3lab/KFlex
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
This repository was archived by the owner on Mar 12, 2025. It is now read-only.

rs3lab/KFlex

Repository files navigation

UNMAINTAINED

Code in this repo and patches for the kernel are unmaintained. Please use upstream equivalents instead for your usecase, and use the upstream kernel for comparison purposes:

  • Heaps: BPF arenas. Only max 4GB size is supported, > 4GB would need a different sandboxing scheme, but 4GB should be enough for most users. This relies on LLVM 19 or greater to emit proper addr_space_cast instructions when compiling the BPF program using arenas.
  • Loop termination: cond_break macro, timed cond_break. The timed variant (on x86 and arm64) uses rdtsc-sampling on the local CPU instead of reading from an address repeatedly, as described in the "Discussion" section's "Faster extension stall recovery." paragraph. Soon, this will be integrated with cancellations to terminate loops stuck for a long period.
  • Spin Locks: Spin Locks for BPF arenas. The kflex_spin_lock in the paper was a MCS-lock variant, this is the upstream version based on the qspinlock algorithm.
  • IRQ save/restore: bpf_local_irq_{save,restore} kfuncs. Necessary for spin locks and for implementing an interrupt-safe memory allocator.
  • Preemption disable/enable: bpf_preempt_{disable,enable}. Necessary for spin locks and for preemption-safe memory allocator. Should typically be combined with IRQ save/restore. BPF programs by default only protect against CPU migration, hence both preemption and IRQ protection are necessary.
  • Cancellations: WIP patches can be applied on top of bpf-next. There are discussions on implementing cancellations differently, without unwinding and using a "fast-execute" approach of speeding up execution until the end of the program. Either way, once implemented, this primitive will allow terminating the execution of programs stuck inside the kernel.

Fast, Flexible, and Practical Kernel Extensions

The ability to safely extend OS kernel functionality is a long-standing goal in OS design, with the widespread use of the eBPF framework in Linux and Windows demonstrating the benefits of such extensibility. However, existing solutions for kernel extensibility (including eBPF) are limited and constrain users either in the extent of functionality that they can offload to the kernel or the performance overheads incurred by their extensions.

We present KFlex: a new approach to kernel extensibility that strikes an improved balance between the expressivity and performance of kernel extensions. To do so, KFlex separates the safety of kernel-owned resources (e.g., kernel memory) from the safety of extension-specific resources (e.g., extension memory). This separation enables KFlex to use distinct, bespoke mechanisms to enforce each safety property—automated verification and lightweight runtime checks, respectively—which enables the offload of diverse functionality while incurring low runtime overheads.

We realize KFlex in the context of Linux. We demonstrate that KFlex enables users to offload functionality that cannot be offloaded today and provides significant end-to-end performance benefits for applications. Several of KFlex’s proposed mechanisms have been upstreamed into the Linux kernel mainline, with efforts ongoing for full integration.

The paper is publicly available at this link.

Build Instructions

Dependencies

For Ubuntu, install the following dependencies:

$ sudo apt install build-essential libgtest-dev libgcc-13-dev \
    libstdc++-13-dev libelf-dev zlib1g-dev gcc clang cmake ninja-build \
    bear libbenchmark-dev pkg-config

Latest LLVM

To build KFlex, latest LLVM (> 18.0) is needed. Build instructions to build from source are provided below.

You need ninja, cmake and gcc-c++ as build requisites for LLVM. Once you have that set up, proceed with building the latest LLVM and clang version from the git repositories::

$ git clone https://github.com/llvm/llvm-project.git
$ mkdir -p llvm-project/llvm/build
$ cd llvm-project/llvm/build
$ cmake .. -G "Ninja" -DLLVM_TARGETS_TO_BUILD="BPF;X86" \
        -DLLVM_ENABLE_PROJECTS="clang"    \
        -DCMAKE_BUILD_TYPE=Release        \
        -DLLVM_BUILD_RUNTIME=OFF
$ ninja

Building KFlex

$ git submodule update --init
$ BPFTOOL=../bpftool CLANG=/path/to/llvm/clone/llvm-project/llvm/build/bin/clang ./build.sh

Building the kernel

Running applications using KFlex requires a custom kernel, that is included as a submodule. Use your distribution's .config to build the kernel by copying it into kernel source, and make olddefconfig to apply it to v6.9. Then, run the following commands to install it:

$ make -j$(nproc)
$ sudo make modules_install
$ sudo make install

Running Applications

The default port used is 6969. The ifindex is the index of the network interface to which programs will be attached.

Memcached offload for GETS/SETS

$ ./ffkx --kmemcached --ifindex <NR>

A message will be printed once the offload is initialized. Then, use memtier-benchmark, memcaslap, or any other memcache protocol aware client to send requests. An example client invocation where config is a memcaslap config with SETS:GETS ratio is:

memcaslap -s <hostname>:6969 -F config -U -T 128 -c 128 -S 1s -t 30s

Redis offload for GETS/SETS

$ ./ffkx --kmemcache --ifindex <NR>

A message will be printed once the offload is initialized. Then, use memtier-benchmark, redis-benchmark, or any Redis protocol aware client to send requests. An example client invocation with SETS:GETS ratio is:

memtier_benchmark -s <hostname> -p 6969 --protocol=redis -d 64 -n 10000 -t 64 --ratio 10:90

Redis offload for ZADD

$ ./ffkx --kredis --ifindex <NR>

A message will be printed once the offload is initialized. Then, use redis-benchmark, or any Redis protocol aware client to send ZADD requests. An example client invocation is:

redis-benchmark --threads 64 -h <hostname> -p 6969 -r 1000000 -n 2000000 zadd f__rand_int__f __rand_int__ ele:rand__rand_int__:__rand_int__

Data Structures

Simply run the integrated test suite built with the source, which will automate everything and print results to stdout.

$ ./ffkx-bench

All results will be printed using Google Benchmark, and --benchmark_output can be used to output to different formats for post processing.

Guard Emissions

After running the data structure benchmarks, the kernel's dmesg log will be populated with the statistics. The program names will be cut off in the message, but the order and example output is given below.

$ sudo dmesg | grep ffkx_
...
[709104.842956] prog=bench_ffkx_link range_analysis_call=4 elided=4 # Linked List Update
[709104.843231] prog=bench_ffkx_link range_analysis_call=1 elided=1 # Linked List Lookup
[709104.843443] prog=bench_ffkx_link range_analysis_call=2 elided=2 # Linked List Delete
[709104.848522] prog=bench_ffkx_rbtr range_analysis_call=15 elided=15 # RBTree Update
[709104.849048] prog=bench_ffkx_rbtr range_analysis_call=2 elided=2 # RBTree Lookup
[709104.851461] prog=bench_ffkx_rbtr range_analysis_call=29 elided=25 # RBTree Delete
[709104.852080] prog=bench_ffkx_hash range_analysis_call=2 elided=0 # Hashmap Init
[709104.901533] prog=bench_ffkx_hash range_analysis_call=10 elided=8 # Hashmap Update
[709104.902205] prog=bench_ffkx_hash range_analysis_call=4 elided=3 # Hashmap Lookup
[709104.902805] prog=bench_ffkx_hash range_analysis_call=3 elided=2 # Hashmap Delete
[709104.904134] prog=bench_ffkx_skip range_analysis_call=15 elided=10 # Skiplist Update
[709104.904470] prog=bench_ffkx_skip range_analysis_call=3 elided=2 # Skiplist Lookup
[709104.904660] prog=bench_ffkx_skip range_analysis_call=9 elided=4 # Skiplist Delete
[709104.905863] prog=bench_ffkx_coun range_analysis_call=0 elided=0 # Countminsketch
[709104.905926] prog=bench_ffkx_coun range_analysis_call=0 elided=0 # Countsketch
...

For more details, see the webpage.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0