-
Notifications
You must be signed in to change notification settings - Fork 747
arch/rv32i: separate kernel and app trap handlers #4009
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
arch/rv32i: separate kernel and app trap handlers #4009
Conversation
My apologies for opening yet another PR, with yet another approach for this. At long last, this approach (which was in fact my original design 🙃) seems to work reliably, across all platforms. A quick summary of the differences between this, #3864 and #3847:
Finally, this approach is much closer to the previous implementation. Lines 208 to 213 in da704eb
Notably, this "datastructure" can simply be placed at the current stack pointer, as is done with the current implementation. The behavior of the global trap handler is then well-defined: Lines 220 to 228 in da704eb
Thus, once it jumps to the process-implementation specific trap handler, the machine state is as follows:
Because both |
j {start_trap} | ||
j {start_trap} | ||
", | ||
start_trap = sym rv32i::_start_trap, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes are tangential, but we should avoid referencing symbol names directly across crates. Instead, we should use proper Rust function exports and re-introduce them as locally-scoped symbols in Rust's asm!
blocks.
state.pc += 4; | ||
state.pc = state.pc.wrapping_add(4); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also tangential. This should be functionally equivalent on release builds, but would have panicked in debug builds. Overflowing seems like the right thing to do here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this a bit more, this seems to have an explicit assumption that there is not a compressed environment call instruction. If that existed (potentially as part of a future extension), we'd need to increment by 2 bytes here instead. We should probably document this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be nightmare to debug if it hits us. Should we just fix is now.
This C code is enough to get the instruction length
static inline int insn_len(uint16_t opcode)
{
return (opcode & 3) == 3 ? 4 : 2;
}
So it should be pretty easy to check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to implement this, and while the logic you propose is indeed very simple, it requires us to know the opcode that our platform is current executing.
However, that's not trivial. RISC-V provides us with the mtinst
CSR, but this CSR (where hardware implements it) will contain the transformed, expanded instruction, in case we're executing a compressed instruction. Thus we cannot use this.
Alternatively, we could try and deference the PC. However, I'm having a hard time reasoning about the safety of that -- when the application is able to do an ecall
, presumably it has access to this instruction. But maybe there's other edge cases where its not always safe for the kerrnel to dereference the PC in this particular location, such as with Kernel MPU. And in either case, this would be an additional load & compare on the already expensive system call critical path, for a potential future instruction that does not yet exist.
Based on this, I'd rather postpone making these changes until we have an implementation where this is a problem. Notably, this will never break the kernel, it just might jump to an improper instruction offset or skip a compressed instruction in userspace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting points. I didn't think about mtinst
containing the expanded construction.
I suspect de-referencing the PC should be fine, but I agree about the overhead. I agree with postponing, let's see if it is ever an issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
High level looks good.
This looks good to me, and despite the nature of the change (i.e. assembly) I think this is actually a pretty minor patch. In effect this only moves the app-state saving assembly to syscall.rs. Also, adding a data structure for handling traps feels very risc-v to me, as the architecture provides only minimal hw support for context switches, and building your own abstractions on top is expected. |
state.pc += 4; | ||
state.pc = state.pc.wrapping_add(4); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this a bit more, this seems to have an explicit assumption that there is not a compressed environment call instruction. If that existed (potentially as part of a future extension), we'd need to increment by 2 bytes here instead. We should probably document this.
I'll address @alistair23's comments and reflow the text, then I think this is good to go. |
@lschuermann ping |
d9a1b81
to
f58cb7e
Compare
This commit re-architects Tock's RV32I trap handler, to cleanly separate the kernel- and process-handlers. It defines an "interface" for custom trap handlers of custom process implementations. Tock's core kernel design permits process implementations with system call interfaces other than the ones shipped upstream in the arch/cortex-m and arch/rv32i crates. However, in practice, the UserspaceKernelBoundary implementation for RV32I has always been tightly coupled to the generic kernel RISC-V trap handler. This has made it difficult to build an alternative process implementation (e.g., as part of Encapsulated Functions), while reusing much of Tock's RISC-V implementation. By defining an "interface" that the global trap handler exposes (i.e., specifying how a custom trap handler can be registered, and which registers get clobbered), we allow foreign process implementations to hook into the rv32i arch crate's assembly, and can move all process-specific assembly into SysCall::switch_to_process. All process-specific logic is now contained in one assembly block that can be read top-to-bottom. The global trap handler no longer relies on the stack layout of that function, or other data structures in syscall.rs. Because the kernel trap handler no longer contains any application logic and expects the app to set its own trap handler, implementing a new syscall ABI becomes as simple as temporarily swapping out the trap handler. The precise interface is documented on the `_start_trap` symbol in `arch/rv32i/src/lib.rs`.
f58cb7e
to
c1f2ce6
Compare
@alevy Sorry, I struggled with implementing @alistair23's proposed change for the reasons outlined here: #4009 (comment) In the end I just rebased and reflowed the comments, so nothing materially changed. I think this is good to go now. |
Pull Request Overview
This commit re-architects Tock's RV32I trap handler, to cleanly separate the kernel- and process-handlers. It defines an "interface" for custom trap handlers of custom process implementations.
This is a third attempt at separating out the kernel- and app-trap handlers on RV32I, and a follow-up to #3847 and #3864. The motivation is much the same, however this attempt is subtly different in concept, and not so subtly in implementation, and as such warrants its own PR.
Tock's core kernel design permits process implementations with system call interfaces other than the ones shipped upstream in the arch/cortex-m and arch/rv32i crates. However, in practice, the UserspaceKernelBoundary implementation for RV32I has always been tightly coupled to the generic kernel RISC-V trap handler. This has made it difficult to build an alternative process
implementation (e.g., as part of Encapsulated Functions), while reusing much of Tock's RISC-V implementation.
By defining an "interface" that the global trap handler exposes (i.e., specifying how a custom trap handler can be registered, and which registers get clobbered), we allow foreign process implementations to hook into the rv32i arch crate's assembly, and can move all process-specific assembly into SysCall::switch_to_process. All process-specific logic is now contained in one assembly block that can be read top-to-bottom. The global trap handler no longer relies on the stack layout of that function, or other data structures in syscall.rs.
Because the kernel trap handler no longer contains any application logic and expects the app to set its own trap handler, implementing a new syscall ABI becomes as simple as temporarily swapping out the trap handler.
The precise interface is documented on the
_start_trap
symbol inarch/rv32i/src/lib.rs
.Testing Strategy
This pull request was tested by using it with Encapsulated Functions, running regular Tock processes, and the LiteX Sim CI.
TODO or Help Wanted
N/A
Documentation Updated
/docs
, or no updates are required.Formatting
make prepush
.