-
-
Notifications
You must be signed in to change notification settings - Fork 487
Preliminary support for NVIDIA Jetson boards #1692
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
I fear the option of |
Another problem is the conflict with #1620, which is an attempt to unify the GPU meter structure to one interface. |
linux/NvidiaJetson.c
Outdated
RichString_appendAscii(out, CRT_colors[METER_VALUE], buffer); | ||
|
||
RichString_appendAscii(out, CRT_colors[METER_TEXT], " temp:"); | ||
xSnprintf(buffer, sizeof(buffer), "%.1f°C", this->values[JETSON_GPU_TEMP]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use CRT_degreeSign
rather than hard-code a degree sign here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Additionally supported Fahrenheit.
content[0] = tolower(content[0]); | ||
content[1] = tolower(content[1]); | ||
content[2] = tolower(content[2]); | ||
content[3] = tolower(content[3]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why case conversion? Is there any reason for the letter case to vary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, even Jetson Xavier and Jetson Orin has different sensor names. NVIDIA breaks backward compatibility here.
I looked through 'main' branch implementation of the GpuMeter. If I understand correctly, it collects information about the GPU usage from each running process. NVIDIA Jetson has a different approach - it provides a separate GPU statistics via sysfs / custom nvgpu driver. Since all the NVIDIA Jetson specific code is hidden under the C define 'NVIDIA_JETSON', there should be no code collisions. Semantically, the switch 'NVIDIA_JETSON' for GPU might turn off all the future code in #1620 and turning on the Jetson specific GPU code (anyway, all the data is already collected by the nvgpu driver). You could merge the final version of the #1620 first, then I'll figure out how to reuse it correctly, on the next big holidays :) |
The main purpose of the commit was to minimize the interference with the major code base for the default x86_64 platform. Honestly, I do not want to compile in the nvidia jetson board specific code anywhere else. What approach would you recommend here? |
53914a0
to
68ddb34
Compare
There are two ideas that came in my mind.
Update: Oh no. Nvidia didn't use a unique machine type for their GCC cross-toolchain. Reference |
@BenBE , as a maintainer, are you agree? If so, I will fix according to the idea №2. |
There's some internal discussion still going on. We're still discussing which direction we'd like to move forward in. |
f836498
to
0f0f553
Compare
NVIDIA Jetson device is an insdustrial Linux based embedded aarch64 platfrom with powerful builtin GPU, which is used for AI tasks, mostly for CV purposes. The support is provided via --enable-nvidia-jetson switch in the configure script. All the source code related to the NVIDIA Jetson is placed in the linux/NvidiaJetson.{h,c} source files and hidden by 'NVIDIA_JETSON' C preprocessor define. So, for x86_64 platforms the source code stays unchanged. Additional functionality added by this commit: 1. Fix for the CPU temperature reading. The Jetson device is not supported by libsensors. The CPU has 8 cores with only one CPU temperature sensor for all of them located in the thermal zone file. libsensor might be compiled in or turned off. The additional care was taken to provide successfull build with/without libsensors. 2. The Jetson GPU Meter was added: current load, frequency and temperature. 3. The exact GPU memory allocated by each process is loaded from the nvgpu kernel driver via sysfs and merged to the LinuxProcess data (field LinuxProcess::gpu_mem). The field "GPU_MEM" visualizes this field. For root user only. 4. Additional filter for processes which use GPU right now via hot key 'g', the help is supplied. For root user only. == Technical details == The code tries to find out the correct sensors during the application startup. As an example, the sensors location for NVIDIA Jetson Orin are the following: - CPU temperature: /sys/devices/virtual/thermal/thermal_zone0/type - GPU temperature: /sys/devices/virtual/thermal/thermal_zone1/type - GPU frequency: /sys/class/devfreq/17000000.gpu/cur_freq - GPU curr load: /sys/class/devfreq/17000000.gpu/device/load Measure: - The GPU frequency is provided in Hz, shown in MHz. - The CPU/GPU temperatures are provided in Celsius multipled by 1000 (milli Celsius), shown in Cesius P.S. The GUI shows all temperatures for NVIDIA Jetson with additional precision comparing to the default x86_64 platform. If htop starts with root privileges (effective user id is 0), the experimental code activates. It reads the fixed sysfs file /sys/kernel/debug/nvmap/iovmm/clients with the following content, e.g.: ``` CLIENT PROCESS PID SIZE user gpu_burn 7979 23525644K user gnome_shell 8119 5800K user Xorg 2651 17876K total 23549320K ``` Unfortunately, the /sys/kernel/debug/* files are allowed to read only for the root user, that's why the restriction applies. The patch also adds a separate field 'GPU_MEM', which reads data from the added LinuxProcess::gpu_mem field. The field stores memory allocated for GPU in kilobytes. It is populated by the function NvidiaJetson_LoadGpuProcessTable (the implementation is located in NvidiaJetson.c), which is called at the end of the function Machine_scanTables. Additionally, the new Action is added: actionToggleGpuFilter, which is activated by 'g' hot key (the help is updated appropriately). The GpuFilter shows only the processes which currently utilize GPU (i.e. highly extended nvmap/iovmm/clients table). It is achieved by the filtering machinery associated with ProcessTable::pidMatchList. The code below constructs GPU_PID_MATCH_LIST hash table, then actionToggleGpuFilter either stores it to the ProcessTable::pidMatchList or restores old value of ProcessTable::pidMatchList. The separate LinuxProcess's PROCESS_FLAG_LINUX_GPU_JETSON (or something ...) flag isn't added for GPU_MEM, because currently the functionality of population LinuxProcess::gpu_mem is shared with the GPU consumers filter construction. So, even if GPU_MEM field is not activated, the filter showing GPU consumers should work. This kind of architecture is chosen intentially since it saves memory for the hash table GPU_PID_MATCH_LIST (which is now actually a set), and therefore increases performance. All other approaches convert GPU_PID_MATCH_LIST to a true key/value storage (key = pid, value = gpu memory allocated) with further merge code. == NVIDIA Jetson models == Tested for NVIDIA Jetson Orin and Xavier boards.
0f0f553
to
109046f
Compare
Changes:
Added the Action for this functionality. Pressing 'g' hot key the main screen shows only the processes which uses GPU right now. Having the GPU_MEM field, you see the current GPU load per process. Useful, I guess. Hope, you'll utilize the same approach in your future development. |
I've left all the deep details in both: the commit message and the NvidiaJetson.c file. Have a look, please. @BenBE Finally, with "Jetson GPU" Meter and "g" hot key applied, with GPU_MEM field, the "htop" looks like this: |
NVIDIA Jetson device is an insdustrial Linux based embedded aarch64 platfrom with powerful builtin GPU, which is used for AI tasks, mostly for CV purposes.
The support is provided via --enable-nvidia-jetson switch in the configure script.
All the source code related to the NVIDIA Jetson is placed in the linux/NvidiaJetson.{h,c} source files and hidden by 'NVIDIA_JETSON' C preprocessor define. So, for x86_64 platforms the source code stays unchanged.
Additional functionality added by this commit:
== Technical details ==
The code tries to find out the correct sensors during the application startup. As an example, the sensors location for NVIDIA Jetson Orin are the following:
Measure:
P.S. The GUI shows all temperatures for NVIDIA Jetson with additional precision comparing to the default x86_64 platform.
== NVIDIA Jetson models ==
Tested for NVIDIA Jetson Orin and Xavier boards.