8000 Thoughts on NUMA · Issue #5 · emmericp/ixy · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ < 8000 div data-turbo-body class="logged-out env-production page-responsive" style="word-wrap: break-word;">
Skip to content
Thoughts on NUMA #5
Open
Open
@emmericp

Description

@emmericp

NUMA is really important for performance. There are two things to consider: thread-pinning and memory-pinning. Thread pinning is trivial and can be done with the usual affinity mask. The best way to pin memory is by linking against libnuma.
A dependency, eeww. But a simple dependency (just a wrapper for a few syscalls) that I'd see on a level with libpthread; a necessary evil.

Let's look at a forwarding application on a NUMA system with NICs connected to both CPUs.
It will typically have at least one thread per NIC that handles incoming packets and forwards them somewhere. It might need to cross a NUMA-boundary to do so.
In our experience, it's most efficient to pin both the thread and packet memory to the CPU node to which NIC receiving packets is connected. Sending from the wrong node is not as bad as receiving to the wrong node. Also, we (usually) can't know where to send the packets when receiving them, so we can't pin the memory correctly for that.

How to implement this?

  • read numa_node in NIC's sysfs directory to figure out where it's connected to
  • use libnuma to set a memory policy before allocating memory for it
  • pin the thread correctly

Sounds easy, right?
But is it worth implementing it? What do we gain beside added complexity?
Sure, this is obviously a must-have feature for a real-world high-performance driver.

But we've decided against implementing it for now.
Almost everyone will just look at the code and that NUMA stuff is not particularly interesting compared to the rest and it just adds noise.

That doesn't mean you can't use ixy on a NUMA system.
We obviously want to run some benchmarks and performance tests with different NUMA scenarios and we are just going to use the numactl command for that:

 numactl --strict --membind=0 --cpunodebind=0 ./ixy-pktgen <id> <id>

That works just fine with the current memory allocator and allows us to benchmark all relevant scenarios on a NUMA system with NICs attached to both nodes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0