arch/rv32i: separate kernel and app trap handlers #4009

lschuermann · 2024-05-28T22:28:29Z

Pull Request Overview

This commit re-architects Tock's RV32I trap handler, to cleanly separate the kernel- and process-handlers. It defines an "interface" for custom trap handlers of custom process implementations.

This is a third attempt at separating out the kernel- and app-trap handlers on RV32I, and a follow-up to #3847 and #3864. The motivation is much the same, however this attempt is subtly different in concept, and not so subtly in implementation, and as such warrants its own PR.

Tock's core kernel design permits process implementations with system call interfaces other than the ones shipped upstream in the arch/cortex-m and arch/rv32i crates. However, in practice, the UserspaceKernelBoundary implementation for RV32I has always been tightly coupled to the generic kernel RISC-V trap handler. This has made it difficult to build an alternative process
implementation (e.g., as part of Encapsulated Functions), while reusing much of Tock's RISC-V implementation.

By defining an "interface" that the global trap handler exposes (i.e., specifying how a custom trap handler can be registered, and which registers get clobbered), we allow foreign process implementations to hook into the rv32i arch crate's assembly, and can move all process-specific assembly into SysCall::switch_to_process. All process-specific logic is now contained in one assembly block that can be read top-to-bottom. The global trap handler no longer relies on the stack layout of that function, or other data structures in syscall.rs.

Because the kernel trap handler no longer contains any application logic and expects the app to set its own trap handler, implementing a new syscall ABI becomes as simple as temporarily swapping out the trap handler.

The precise interface is documented on the _start_trap symbol in arch/rv32i/src/lib.rs.

Testing Strategy

This pull request was tested by using it with Encapsulated Functions, running regular Tock processes, and the LiteX Sim CI.

TODO or Help Wanted

N/A

Documentation Updated

Updated the relevant files in /docs, or no updates are required.

Formatting

Ran make prepush.

lschuermann · 2024-05-28T22:48:16Z

My apologies for opening yet another PR, with yet another approach for this. At long last, this approach (which was in fact my original design 🙃) seems to work reliably, across all platforms. A quick summary of the differences between this, #3864 and #3847:

arch/rv32i: separate kernel and app trap handlers to simply assembly flow and allow foreign process implementations #3847 tried to introduce the ability to "hook" custom trap handlers, for instance when running process code, simply by overwriting mtvec. This is great, as it's as efficient as it gets, and frees up mscratch to retain any arbitrary context a process implementation may want to retain.

However, thanks to the fragmented RISC-V ecosystem, there are platforms which do not support non-vectored interrupt handlers. Yet other platforms do not support vectored interrupt handlers. This means that it is impossible for any code that writes to mtvec to be portable between chips, making this approach a non-starter.
arch/rv32i: separate kernel and app trap handlers; specify trap handler "interface" #3864 instead uses mscratch as a selector for which trap handler to execute. If it contains 0, it executes the kernel trap handler. Otherwise, it interprets the value of mscratch as an address, and continues trap handler execution there.

Unfortunately, this approach does not permit passing any additional "state" to the subsequent trap handler. Consider a system with two instances of the kernel running on two CPU cores. When a trap fires, it jumps to the address passed in mscratch. The process-implementation's trap handler code then needs to store process state, but it does not know where that state is stored. Using a global symbol could work, but to support multi-core environments, that symbol would then need to be indexed by the hart ID, making this code very complex and convoluted.

Finally, this approach is much closer to the previous implementation. mscratch does not directly pass the trap handler address to be executed, but is a pointer to a datastructure with a well-defined layout:

tock/arch/rv32i/src/lib.rs

Lines 208 to 213 in da704eb

    
               /// mscratch           0               1               2               3 
        
               ///  \->|--------------------------------------------------------------| 
        
               ///     | scratch word, overwritten with s1 register contents          | 
        
               ///     |--------------------------------------------------------------| 
        
               ///     | trap handler address, continue trap handler execution here   | 
        
               ///     |--------------------------------------------------------------|

Notably, this "datastructure" can simply be placed at the current stack pointer, as is done with the current implementation.

The behavior of the global trap handler is then well-defined:

tock/arch/rv32i/src/lib.rs

Lines 220 to 228 in da704eb

    
               /// 1. atomically swap s0 and the mscratch CSR, 
        
               /// 
        
               /// 2. execute the default kernel trap handler if s0 now contains `0` 
        
               /// (meaning that the mscratch CSR contained `0` before entering this trap 
        
               /// handler), 
        
               /// 
        
               /// 3. otherwise, save s1 to `0*4(s0)`, and finally 
        
               /// 
        
               /// 4. load the address at `1*4(s0)` into s1, and jump to it.

Thus, once it jumps to the process-implementation specific trap handler, the machine state is as follows:

pc: set to pre-trap 4*4(mscratch)
mscratch: set to pre-trap s0
s0: set to pre-trap mscratch
0*4(mscratch): set to pre-trap s1
s1: set to pre-trap 4*4(mscratch) (jump-target)

Because both s0 and s1 are stored, no state is lost. And because s0 contains the original value of mscratch, this can be used as a hart-specific context pointer, atomically set while hooking the trap handler. This subsequent handler simply has to do some cleanup to revert the machine to its original state before returning. For processes, this typically involves saving this state. For the kernel we circumvent this by special-casing it with mscratch = 0 and a fixed branch, retaining its current efficiency.

lschuermann · 2024-05-28T22:50:36Z

chips/earlgrey/src/chip.rs

+            j {start_trap}
+            j {start_trap}
+    ",
+    start_trap = sym rv32i::_start_trap,


These changes are tangential, but we should avoid referencing symbol names directly across crates. Instead, we should use proper Rust function exports and re-introduce them as locally-scoped symbols in Rust's asm! blocks.

lschuermann · 2024-05-28T22:51:37Z

arch/rv32i/src/syscall.rs

-                        state.pc += 4;
+                        state.pc = state.pc.wrapping_add(4);


Also tangential. This should be functionally equivalent on release builds, but would have panicked in debug builds. Overflowing seems like the right thing to do here.

Thinking about this a bit more, this seems to have an explicit assumption that there is not a compressed environment call instruction. If that existed (potentially as part of a future extension), we'd need to increment by 2 bytes here instead. We should probably document this.

This will be nightmare to debug if it hits us. Should we just fix is now.

This C code is enough to get the instruction length

static inline int insn_len(uint16_t opcode) { return (opcode & 3) == 3 ? 4 : 2; }

So it should be pretty easy to check

I tried to implement this, and while the logic you propose is indeed very simple, it requires us to know the opcode that our platform is current executing.

However, that's not trivial. RISC-V provides us with the mtinst CSR, but this CSR (where hardware implements it) will contain the transformed, expanded instruction, in case we're executing a compressed instruction. Thus we cannot use this.

Alternatively, we could try and deference the PC. However, I'm having a hard time reasoning about the safety of that -- when the application is able to do an ecall, presumably it has access to this instruction. But maybe there's other edge cases where its not always safe for the kerrnel to dereference the PC in this particular location, such as with Kernel MPU. And in either case, this would be an additional load & compare on the already expensive system call critical path, for a potential future instruction that does not yet exist.

Based on this, I'd rather postpone making these changes until we have an implementation where this is a problem. Notably, this will never break the kernel, it just might jump to an improper instruction offset or skip a compressed instruction in userspace.

Interesting points. I didn't think about mtinst containing the expanded construction.

I suspect de-referencing the PC should be fine, but I agree about the overhead. I agree with postponing, let's see if it is ever an issue

bradjc

High level looks good.

arch/rv32i/src/lib.rs

bradjc · 2024-06-05T15:15:23Z

This looks good to me, and despite the nature of the change (i.e. assembly) I think this is actually a pretty minor patch. In effect this only moves the app-state saving assembly to syscall.rs.

Also, adding a data structure for handling traps feels very risc-v to me, as the architecture provides only minimal hw support for context switches, and building your own abstractions on top is expected.

arch/rv32i/src/syscall.rs

lschuermann · 2024-06-05T16:23:26Z

arch/rv32i/src/syscall.rs

-                        state.pc += 4;
+                        state.pc = state.pc.wrapping_add(4);


Thinking about this a bit more, this seems to have an explicit assumption that there is not a compressed environment call instruction. If that existed (potentially as part of a future extension), we'd need to increment by 2 bytes here instead. We should probably document this.

arch/rv32i/src/lib.rs

lschuermann · 2024-06-06T20:59:15Z

I'll address @alistair23's comments and reflow the text, then I think this is good to go.

alevy · 2024-06-14T02:08:02Z

@lschuermann ping

This commit re-architects Tock's RV32I trap handler, to cleanly separate the kernel- and process-handlers. It defines an "interface" for custom trap handlers of custom process implementations. Tock's core kernel design permits process implementations with system call interfaces other than the ones shipped upstream in the arch/cortex-m and arch/rv32i crates. However, in practice, the UserspaceKernelBoundary implementation for RV32I has always been tightly coupled to the generic kernel RISC-V trap handler. This has made it difficult to build an alternative process implementation (e.g., as part of Encapsulated Functions), while reusing much of Tock's RISC-V implementation. By defining an "interface" that the global trap handler exposes (i.e., specifying how a custom trap handler can be registered, and which registers get clobbered), we allow foreign process implementations to hook into the rv32i arch crate's assembly, and can move all process-specific assembly into SysCall::switch_to_process. All process-specific logic is now contained in one assembly block that can be read top-to-bottom. The global trap handler no longer relies on the stack layout of that function, or other data structures in syscall.rs. Because the kernel trap handler no longer contains any application logic and expects the app to set its own trap handler, implementing a new syscall ABI becomes as simple as temporarily swapping out the trap handler. The precise interface is documented on the `_start_trap` symbol in `arch/rv32i/src/lib.rs`.

lschuermann · 2024-06-14T16:40:00Z

@alevy Sorry, I struggled with implementing @alistair23's proposed change for the reasons outlined here: #4009 (comment)

In the end I just rebased and reflowed the comments, so nothing materially changed. I think this is good to go now.

github-actions bot added risc-v RISC-V architecture WG-OpenTitan In the purview of the OpenTitan working group. labels May 28, 2024

lschuermann mentioned this pull request May 28, 2024

arch/rv32i: separate kernel and app trap handlers; specify trap handler "interface" #3864

Closed

2 tasks

lschuermann commented May 28, 2024

View reviewed changes

github-actions bot assigned bradjc May 29, 2024

bradjc reviewed May 30, 2024

View reviewed changes

bradjc previously approved these changes Jun 5, 2024

View reviewed changes

arch/rv32i/src/lib.rs Outdated Show resolved Hide resolved

arch/rv32i/src/lib.rs Outdated Show resolved Hide resolved

arch/rv32i/src/lib.rs Outdated Show resolved Hide resolved

arch/rv32i/src/lib.rs Outdated Show resolved Hide resolved

lschuermann commented Jun 5, 2024

View reviewed changes

lschuermann dismissed bradjc’s stale review via d9a1b81 June 5, 2024 16:31

bradjc previously approved these changes Jun 5, 2024

View reviewed changes

alevy added the waiting-on-author label Jun 7, 2024

lschuermann force-pushed the dev/riscv-separate-kernel-app-trap-handlers branch from d9a1b81 to f58cb7e Compare June 14, 2024 16:27

lschuermann dismissed bradjc’s stale review via c1f2ce6 June 14, 2024 16:29

lschuermann force-pushed the dev/riscv-separate-kernel-app-trap-handlers branch from f58cb7e to c1f2ce6 Compare June 14, 2024 16:29

bradjc approved these changes Jun 14, 2024

View reviewed changes

lschuermann removed the waiting-on-author label Jun 14, 2024

alevy added this pull request to the merge queue Jun 14, 2024

Merged via the queue into tock:master with commit 0222998 Jun 14, 2024
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

arch/rv32i: separate kernel and app trap handlers #4009

arch/rv32i: separate kernel and app trap handlers #4009

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

arch/rv32i: separate kernel and app trap handlers #4009

arch/rv32i: separate kernel and app trap handlers #4009

Uh oh!

Conversation

Pull Request Overview

Testing Strategy

TODO or Help Wanted

Documentation Updated

Formatting

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!