8000 fix(mm): reintroduce explicit virtual to physical address translation for device memory by mkroening · Pull Request #1815 · hermit-os/kernel · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

fix(mm): reintroduce explicit virtual to physical address translation for device memory #1815

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jul 9, 2025

Conversation

mkroening
Copy link
Member
@mkroening mkroening commented Jul 4, 2025

Before #1609, #1669, and #1670, we were mapping frames and flushing on each device memory allocation, which was expensive.
Those PRs changed the initial memory mappings by ensuring an identity mapping of all physical memory.
That allowed for (basically) no-op device memory allocation while reducing TLB pressure.

Before #1712, we were walking the page table on every virtual to physical address translation for device communication, which was not costly, but still slower than necessary.

This PR reintroduces explicit virtual to physical address translation when handling device memory, but does so while avoiding the performance pitfalls of the past.
We now have the option to map the complete physical memory a second time at an offset.
This is currently done on cfg!(careful) to ensure all devices use the device allocator not only for memory management but also for address translation.
Eventually, this allows us to mark one of the mappings as private and the other as public, making flexible device communication as cheap as possible.

Copy link
Contributor
@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark Results

This comment was automatically generated by github-action-benchmark.

Misc

Benchmark Current: 1f191b0 Previous: b5b5c19 Performance Ratio
micro_benchmarks Build Time 74.81 s 74.72 s 1.00
micro_benchmarks File Size 0.97 MB 0.97 MB 1.00
Scheduling time - 1 thread 66.33 ticks (±2.44 ticks) 66.83 ticks (±3.10 ticks) 0.99
Scheduling time - 2 threads 34.48 ticks (±1.50 ticks) 35.64 ticks (±3.40 ticks) 0.97
Micro - Time for syscall (getpid) 16.14 ticks (±1.18 ticks) 15.71 ticks (±1.36 ticks) 1.03
Memcpy speed - (built_in) block size 4096 74027.79 MByte/s (±51036.26 MByte/s) 73338.41 MByte/s (±50542.56 MByte/s) 1.01
Memcpy speed - (built_in) block size 1048576 41178.80 MByte/s (±28611.10 MByte/s) 41237.34 MByte/s (±28631.07 MByte/s) 1.00
Memcpy speed - (built_in) block size 16777216 25742.90 MByte/s (±20751.74 MByte/s) 26159.73 MByte/s (±21200.42 MByte/s) 0.98
Memset speed - (built_in) block size 4096 74199.08 MByte/s (±51151.48 MByte/s) 73371.89 MByte/s (±50563.99 MByte/s) 1.01
Memset speed - (built_in) block size 1048576 41460.06 MByte/s (±28810.35 MByte/s) 41509.80 MByte/s (±28815.09 MByte/s) 1.00
Memset speed - (built_in) block size 16777216 26386.40 MByte/s (±21135.66 MByte/s) 26803.32 MByte/s (±21575.74 MByte/s) 0.98
Memcpy speed - (rust) block size 4096 62001.44 MByte/s (±43372.14 MByte/s) 66444.83 MByte/s (±46279.47 MByte/s) 0.93
Memcpy speed - (rust) block size 1048576 40883.33 MByte/s (±28399.62 MByte/s) 41301.42 MByte/s (±28653.77 MByte/s) 0.99
Memcpy speed - (rust) block size 16777216 25689.12 MByte/s (±20708.36 MByte/s) 26198.19 MByte/s (±21210.46 MByte/s) 0.98
Memset speed - (rust) block size 4096 62710.83 MByte/s (±43829.52 MByte/s) 66823.77 MByte/s (±46541.34 MByte/s) 0.94
Memset speed - (rust) block size 1048576 41121.21 MByte/s (±28560.50 MByte/s) 41550.11 MByte/s (±28821.87 MByte/s) 0.99
Memset speed - (rust) block size 16777216 26318.36 MByte/s (±21079.12 MByte/s) 26858.59 MByte/s (±21598.51 MByte/s) 0.98
alloc_benchmarks Build Time 74.58 s 72.59 s 1.03
alloc_benchmarks File Size 0.92 MB 0.92 MB 1.00
Allocations - Allocation success 100.00 % 100.00 % 1
Allocations - Deallocation success 70.03 % (±0.26 %) 70.01 % (±0.26 %) 1.00
Allocations - Pre-fail Allocations 100.00 % 100.00 % 1
Allocations - Average Allocation time 11030.05 Ticks (±188.57 Ticks) 11052.09 Ticks (±196.08 Ticks) 1.00
Allocations - Average Allocation time (no fail) 11030.05 Ticks (±188.57 Ticks) 11052.09 Ticks (±196.08 Ticks) 1.00
Allocations - Average Deallocation time 818.63 Ticks (±15.74 Ticks) 833.82 Ticks (±17.80 Ticks) 0.98
mutex_benchmark Build Time 73.99 s 73.82 s 1.00
mutex_benchmark File Size 0.97 MB 0.97 MB 1.00
Mutex Stress Test Average Time per Iteration - 1 Threads 14.06 ns (±0.54 ns) 14.18 ns (±0.65 ns) 0.99
Mutex Stress Test Average Time per Iteration - 2 Threads 16.56 ns (±1.60 ns) 16.92 ns (±1.06 ns) 0.98
Misc
Benchmark Current: 1f191b0 Previous: 885734c Performance Ratio
micro_benchmarks Build Time 75.17 s 92.18 s 0.82
micro_benchmarks File Size 0.97 MB 0.97 MB 1.00
Scheduling time - 1 thread 67.77 ticks (±2.86 ticks) 67.38 ticks (±3.72 ticks) 1.01
Scheduling time - 2 threads 36.34 ticks (±2.05 ticks) 34.93 ticks (±1.69 ticks) 1.04
Micro - Time for syscall (getpid) 16.09 ticks (±1.56 ticks) 15.86 ticks (±1.09 ticks) 1.01
Memcpy speed - (built_in) block size 4096 73392.17 MByte/s (±50748.98 MByte/s) 72968.84 MByte/s (±50564.82 MByte/s) 1.01
Memcpy speed - (built_in) block size 1048576 40972.89 MByte/s (±28537.54 MByte/s) 41555.93 MByte/s (±28865.47 MByte/s) 0.99
Memcpy speed - (built_in) block size 16777216 26388.97 MByte/s (±21907.17 MByte/s) 26104.96 MByte/s (±21970.37 MByte/s) 1.01
Memset speed - (built_in) block size 4096 73439.26 MByte/s (±50782.33 MByte/s) 72990.31 MByte/s (±50579.80 MByte/s) 1.01
Memset speed - (built_in) block size 1048576 41213.39 MByte/s (±28699.74 MByte/s) 41811.90 MByte/s (±29042.09 MByte/s) 0.99
Memset speed - (built_in) block size 16777216 27055.54 MByte/s (±22283.63 MByte/s) 26890.08 MByte/s (±22397.84 MByte/s) 1.01
Memcpy speed - (rust) block size 4096 61712.80 MByte/s (±42993.06 MByte/s) 64378.81 MByte/s (±44953.84 MByte/s) 0.96
Memcpy speed - (rust) block size 1048576 41212.17 MByte/s (±28649.36 MByte/s) 41277.27 MByte/s (±28662.33 MByte/s) 1.00
Memcpy speed - (rust) block size 16777216 25937.96 MByte/s (±21466.33 MByte/s) 26660.93 MByte/s (±22238.57 MByte/s) 0.97
Memset speed - (rust) block size 4096 61924.57 MByte/s (±43141.43 MByte/s) 64487.90 MByte/s (±45017.04 MByte/s) 0.96
Memset speed - (rust) block size 1048576 41472.32 MByte/s (±28826.55 MByte/s) 41532.86 MByte/s (±28836.12 MByte/s) 1.00
Memset speed - (rust) block size 16777216 26582.99 MByte/s (±21833.10 MByte/s) 27388.51 MByte/s (±22664.91 MByte/s) 0.97
alloc_benchmarks Build Time 74.80 s 88.27 s 0.85
alloc_benchmarks File Size 0.92 MB 0.92 MB 1.00
Allocations - Allocation success 100.00 % 100.00 % 1
Allocations - Deallocation success 69.99 % (±0.30 %) 69.97 % (±0.35 %) 1.00
Allocations - Pre-fail Allocations 100.00 % 100.00 % 1
Allocations - Average Allocation time 14591.74 Ticks (±292.45 Ticks) 13449.07 Ticks (±258.41 Ticks) 1.08
Allocations - Average Allocation time (no fail) 14591.74 Ticks (±292.45 Ticks) 13449.07 Ticks (±258.41 Ticks) 1.08
Allocations - Average Deallocation time 1119.40 Ticks (±252.37 Ticks) 852.92 Ticks (±69.83 Ticks) 1.31
mutex_benchmark Build Time 75.34 s 91.20 s 0.83
mutex_benchmark File Size 0.97 MB 0.97 MB 1.00
Mutex Stress Test Average Time per Iteration - 1 Threads 14.08 ns (±1.07 ns) 14.26 ns (±0.48 ns) 0.99
Mutex Stress Test Average Time per Iteration - 2 Threads 24.14 ns (±15.01 ns) 20.98 ns (±14.98 ns) < 8000 code class="notranslate">1.15
General
Benchmark Current: 1f191b0 Previous: 885734c Performance Ratio
startup_benchmark Build Time 72.41 s 69.30 s 1.04
startup_benchmark File Size 0.85 MB 0.86 MB 1.00
Startup Time - 1 core 0.99 s (±0.06 s) 0.93 s (±0.04 s) 1.07
Startup Time - 2 cores 0.99 s (±0.03 s) 0.93 s (±0.04 s) 1.06
Startup Time - 4 cores 0.99 s (±0.03 s) 0.93 s (±0.04 s) 1.06
multithreaded_benchmark Build Time 75.05 s 68.22 s 1.10
multithreaded_benchmark File Size 0.96 MB 0.96 MB 1.00
Multithreaded Pi Efficiency - 2 Threads 89.71 % (±9.87 %) 86.87 % (±8.71 %) 1.03
Multithreaded Pi Efficiency - 4 Threads 61.14 % (±6.88 %) 61.17 % (±6.32 %) 1.00
Multithreaded Pi Efficiency - 8 Threads 43.44 % (±2.88 %) 41.25 % (±5.92 %) 1.05

@mkroening mkroening marked this pull request as ready for review July 4, 2025 17:31
@mkroening mkroening force-pushed the virt_to_phys branch 3 times, most recently from c88bd46 to 6bcb80e Compare July 6, 2025 12:45
@mkroening mkroening self-assigned this Jul 6, 2025
@mkroening mkroening force-pushed the virt_to_phys branch 2 times, most recently from 1bdcaf7 to 6fd2fe2 Compare July 7, 2025 13:03
@mkroening mkroening force-pushed the virt_to_phys branch 3 times, most recently from 24d2cf1 to 48186db Compare July 7, 2025 13:38
@mkroening mkroening force-pushed the virt_to_phys branch 2 times, most recently from adc369a to 707d6f5 Compare July 7, 2025 15:45
@mkroening mkroening force-pushed the virt_to_phys branch 2 times, most recently from 1f191b0 to 6bd1362 Compare July 9, 2025 09:02
@mkroening mkroening added this pull request to the merge queue Jul 9, 2025
Merged via the queue into main with commit 946922a Jul 9, 2025
30 checks passed
@mkroening mkroening deleted the virt_to_phys branch July 9, 2025 12:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0