8000 Preliminary support for NVIDIA Jetson boards by dmitriy-philimonov · Pull Request #1692 · htop-dev/htop · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Preliminary support for NVIDIA Jetson boards #1692

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dmitriy-philimonov
Copy link
@dmitriy-philimonov dmitriy-philimonov commented May 3, 2025

NVIDIA Jetson device is an insdustrial Linux based embedded aarch64 platfrom with powerful builtin GPU, which is used for AI tasks, mostly for CV purposes.

The support is provided via --enable-nvidia-jetson switch in the configure script.

All the source code related to the NVIDIA Jetson is placed in the linux/NvidiaJetson.{h,c} source files and hidden by 'NVIDIA_JETSON' C preprocessor define. So, for x86_64 platforms the source code stays unchanged.

Additional functionality added by this commit:

  1. Fix for the CPU temperature reading. The Jetson device is not supported by libsensors. The CPU has 8 cores with only one CPU temperature sensor for all of them located in the thermal zone file. libsensor might be compiled in or turned off. The additional care was taken to provide successfull build with/without libsensors.
  2. The Jetson GPU Meter was added: current load, frequency and temperature.

== Technical details ==

The code tries to find out the correct sensors during the application startup. As an example, the sensors location for NVIDIA Jetson Orin are the following:

  • CPU temperature: /sys/devices/virtual/thermal/thermal_zone0/type
  • GPU temperature: /sys/devices/virtual/thermal/thermal_zone1/type
  • GPU frequency: /sys/class/devfreq/17000000.gpu/cur_freq
  • GPU curr load: /sys/class/devfreq/17000000.gpu/device/load

Measure:

  • The GPU frequency is provided in Hz, shown in MHz.
  • The CPU/GPU temperatures are provided in Celsius multipled by 1000 (milli Celsius), shown in Cesius

P.S. The GUI shows all temperatures for NVIDIA Jetson with additional precision comparing to the default x86_64 platform.

== NVIDIA Jetson models ==

Tested for NVIDIA Jetson Orin and Xavier boards.

@Explorer09
Copy link
Contributor

I fear the option of --enable-nvidia-jetson will make future board-specific customizations add similar configure options. That would make things unmaintainable.

@Explorer09
Copy link
Contributor

Another problem is the conflict with #1620, which is an attempt to unify the GPU meter structure to one interface.

RichString_appendAscii(out, CRT_colors[METER_VALUE], buffer);

RichString_appendAscii(out, CRT_colors[METER_TEXT], " temp:");
xSnprintf(buffer, sizeof(buffer), "%.1f°C", this->values[JETSON_GPU_TEMP]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use CRT_degreeSign rather than hard-code a degree sign here.

Copy link
Author
@dmitriy-philimonov dmitriy-philimonov May 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Additionally supported Fahrenheit.

content[0] = tolower(content[0]);
content[1] = tolower(content[1]);
content[2] = tolower(content[2]);
content[3] = tolower(content[3]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why case conversion? Is there any reason for the letter case to vary?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, even Jetson Xavier and Jetson Orin has different sensor names. NVIDIA breaks backward compatibility here.

@dmitriy-philimonov
Copy link
Author
dmitriy-philimonov commented May 3, 2025

Another problem is the conflict with #1620, which is an attempt to unify the GPU meter structure to one interface.

I looked through 'main' branch implementation of the GpuMeter. If I understand correctly, it collects information about the GPU usage from each running process.

NVIDIA Jetson has a different approach - it provides a separate GPU statistics via sysfs / custom nvgpu driver.

Since all the NVIDIA Jetson specific code is hidden under the C define 'NVIDIA_JETSON', there should be no code collisions. Semantically, the switch 'NVIDIA_JETSON' for GPU might turn off all the future code in #1620 and turning on the Jetson specific GPU code (anyway, all the data is already collected by the nvgpu driver).

You could merge the final version of the #1620 first, then I'll figure out how to reuse it correctly, on the next big holidays :)

@dmitriy-philimonov
Copy link
Author

I fear the option of --enable-nvidia-jetson will make future board-specific customizations add similar configure options. That would make things unmaintainable.

The main purpose of the commit was to minimize the interference with the major code base for the default x86_64 platform. Honestly, I do not want to compile in the nvidia jetson board specific code anywhere else.

What approach would you recommend here?

@BenBE BenBE added the Linux 🐧 Linux related issues label May 3, 2025
@dmitriy-philimonov dmitriy-philimonov force-pushed the nvidia-jetson branch 2 times, most recently from 53914a0 to 68ddb34 Compare May 3, 2025 15:41
@Explorer09
Copy link
Contributor
Explorer09 commented May 3, 2025

I fear the option of --enable-nvidia-jetson will make future board-specific customizations add similar configure options. That would make things unmaintainable.

The main purpose of the commit was to minimize the interference with the major code base for the default x86_64 platform. Honestly, I do not want to compile in the nvidia jetson board specific code anywhere else.

What approach would you recommend here?

There are two ideas that came in my mind.

  1. The more ideal one: Make the board identifier part of the machine type, so we can have --host=aarch64-nvidiajetson-linux-gnu. But that requires your toolchain to be configured with the same machine type identifier, which is sometimes not feasible.
  2. The less ideal, but easier approach: name the the configure option as --with-board=nvidia_jetson. This assumes that htop would accept patches for additional board customizations, and I don't know the maintainers' attitude on this.

Update: Oh no. Nvidia didn't use a unique machine type for their GCC cross-toolchain. Reference

@dmitriy-philimonov
Copy link
Author

I fear the option of --enable-nvidia-jetson will make future board-specific customizations add similar configure options. That would make things unmaintainable.

The main purpose of the commit was to minimize the interference with the major code base for the default x86_64 platform. Honestly, I do not want to compile in the nvidia jetson board specific code anywhere else.
What approach would you recommend here?

There are two ideas that came in my mind.

  1. The more ideal one: Make the board identifier part of the machine type, so we can have --host=aarch64-nvidiajetson-linux-gnu. But that requires your toolchain to be configured with the same machine type identifier, which is sometimes not feasible.
  2. The less ideal, but easier approach: name the the configure option as --with-board=nvidia_jetson. This assumes that htop would accept patches for additional board customizations, and I don't know the maintainers' attitude on this.

Update: Oh no. Nvidia didn't use a unique machine type for their GCC cross-toolchain. Reference

@BenBE , as a maintainer, are you agree? If so, I will fix according to the idea №2.

@BenBE
Copy link
8000
Member
BenBE commented May 4, 2025

There's some internal discussion still going on. We're still discussing which direction we'd like to move forward in.

@dmitriy-philimonov dmitriy-philimonov force-pushed the nvidia-jetson branch 2 times, most recently from f836498 to 0f0f553 Compare June 13, 2025 20:32
NVIDIA Jetson device is an insdustrial Linux based embedded aarch64
platfrom with powerful builtin GPU, which is used for AI tasks,
mostly for CV purposes.

The support is provided via --enable-nvidia-jetson switch in the
configure script.

All the source code related to the NVIDIA Jetson is placed in the
linux/NvidiaJetson.{h,c} source files and hidden by 'NVIDIA_JETSON'
C preprocessor define. So, for x86_64 platforms the
source code stays unchanged.

Additional functionality added by this commit:
1. Fix for the CPU temperature reading. The Jetson device is not
supported by libsensors. The CPU has 8 cores with only one CPU
temperature sensor for all of them located in the thermal zone file.
libsensor might be compiled in or turned off. The additional care was
taken to provide successfull build with/without libsensors.
2. The Jetson GPU Meter was added: current load, frequency and
temperature.
3. The exact GPU memory allocated by each process is loaded from the
nvgpu kernel driver via sysfs and merged to the LinuxProcess data
(field LinuxProcess::gpu_mem). The field "GPU_MEM" visualizes this
field. For root user only.
4. Additional filter for processes which use GPU right now via hot
key 'g', the help is supplied. For root user only.

== Technical details ==

The code tries to find out the correct sensors during the application
startup. As an example, the sensors location for NVIDIA Jetson Orin
are the following:
- CPU temperature: /sys/devices/virtual/thermal/thermal_zone0/type
- GPU temperature: /sys/devices/virtual/thermal/thermal_zone1/type
- GPU frequency: /sys/class/devfreq/17000000.gpu/cur_freq
- GPU curr load: /sys/class/devfreq/17000000.gpu/device/load

Measure:
- The GPU frequency is provided in Hz, shown in MHz.
- The CPU/GPU temperatures are provided in Celsius multipled by 1000
  (milli Celsius), shown in Cesius

P.S. The GUI shows all temperatures for NVIDIA Jetson with additional
precision comparing to the default x86_64 platform.

If htop starts with root privileges (effective user id is 0), the
experimental code activates. It reads the fixed sysfs file
/sys/kernel/debug/nvmap/iovmm/clients with the following content, e.g.:
```
CLIENT                        PROCESS      PID        SIZE
user                         gpu_burn     7979   23525644K
user                      gnome_shell     8119       5800K
user                             Xorg     2651      17876K
total                                            23549320K
```
Unfortunately, the /sys/kernel/debug/* files are allowed to read only for
the root user, that's why the restriction applies.

The patch also adds a separate field 'GPU_MEM', which reads data from
the added LinuxProcess::gpu_mem field. The field stores memory allocated for GPU
in kilobytes. It is populated by the function NvidiaJetson_LoadGpuProcessTable
(the implementation is located in NvidiaJetson.c), which is called at the end of
the function Machine_scanTables.

Additionally, the new Action is added: actionToggleGpuFilter, which is activated by
'g' hot key (the help is updated appropriately). The GpuFilter shows only the
processes which currently utilize GPU (i.e. highly extended nvmap/iovmm/clients table).
It is achieved by the filtering machinery associated with ProcessTable::pidMatchList.
The code below constructs GPU_PID_MATCH_LIST hash table, then actionToggleGpuFilter
either stores it to the ProcessTable::pidMatchList or restores old value of
ProcessTable::pidMatchList.

The separate LinuxProcess's PROCESS_FLAG_LINUX_GPU_JETSON (or something ...) flag isn't
added for GPU_MEM, because currently the functionality of population LinuxProcess::gpu_mem
is shared with the GPU consumers filter construction.
So, even if GPU_MEM field is not activated, the filter showing GPU consumers should work.
This kind of architecture is chosen intentially since it saves memory for the hash table
GPU_PID_MATCH_LIST (which is now actually a set), and therefore increases performance.
All other approaches convert GPU_PID_MATCH_LIST to a true key/value storage (key = pid,
value = gpu memory allocated) with further merge code.

== NVIDIA Jetson models  ==

Tested for NVIDIA Jetson Orin and Xavier boards.
@dmitriy-philimonov
Copy link
Author

Changes:

  1. Rebased. Honestly, I've left all GPU-related code unchanged. It utilized the different kernel API, might work with nvidia jetson one day, who knows?
  2. Additionally pushed the per process GPU memory allocation functionality right into the LinuxProcess class / GPU_MEM field in main screen. Marked it as experimental, because it works with root privileges only. In short, it reads the special sysfs file inside kernet/debug directory which is published by nvgpu nvidia driver, where the dictionary {pid -> gpu_memory} is published.

Added the Action for this functionality. Pressing 'g' hot key the main screen shows only the processes which uses GPU right now. Having the GPU_MEM field, you see the current GPU load per process. Useful, I guess. Hope, you'll utilize the same approach in your future development.

@dmitriy-philimonov
Copy link
Author
dmitriy-philimonov commented Jun 13, 2025

I've left all the deep details in both: the commit message and the NvidiaJetson.c file. Have a look, please. @BenBE

Finally, with "Jetson GPU" Meter and "g" hot key applied, with GPU_MEM field, the "htop" looks like this:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Linux 🐧 Linux related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0