8000 arch/rv32i: separate kernel and app trap handlers by lschuermann · Pull Request #4009 · tock/tock · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

arch/rv32i: separate kernel and app trap handlers #4009

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

lschuermann
Copy link
Member

Pull Request Overview

This commit re-architects Tock's RV32I trap handler, to cleanly separate the kernel- and process-handlers. It defines an "interface" for custom trap handlers of custom process implementations.

This is a third attempt at separating out the kernel- and app-trap handlers on RV32I, and a follow-up to #3847 and #3864. The motivation is much the same, however this attempt is subtly different in concept, and not so subtly in implementation, and as such warrants its own PR.

Tock's core kernel design permits process implementations with system call interfaces other than the ones shipped upstream in the arch/cortex-m and arch/rv32i crates. However, in practice, the UserspaceKernelBoundary implementation for RV32I has always been tightly coupled to the generic kernel RISC-V trap handler. This has made it difficult to build an alternative process
implementation (e.g., as part of Encapsulated Functions), while reusing much of Tock's RISC-V implementation.

By defining an "interface" that the global trap handler exposes (i.e., specifying how a custom trap handler can be registered, and which registers get clobbered), we allow foreign process implementations to hook into the rv32i arch crate's assembly, and can move all process-specific assembly into SysCall::switch_to_process. All process-specific logic is now contained in one assembly block that can be read top-to-bottom. The global trap handler no longer relies on the stack layout of that function, or other data structures in syscall.rs.

Because the kernel trap handler no longer contains any application logic and expects the app to set its own trap handler, implementing a new syscall ABI becomes as simple as temporarily swapping out the trap handler.

The precise interface is documented on the _start_trap symbol in arch/rv32i/src/lib.rs.

Testing Strategy

This pull request was tested by using it with Encapsulated Functions, running regular Tock processes, and the LiteX Sim CI.

TODO or Help Wanted

N/A

Documentation Updated

  • Updated the relevant files in /docs, or no updates are required.

Formatting

  • Ran make prepush.

@github-actions github-actions bot added risc-v RISC-V architecture WG-OpenTitan In the purview of the OpenTitan working group. labels May 28, 2024
@lschuermann
Copy link
Member Author

My apologies for opening yet another PR, with yet another approach for this. At long last, this approach (which was in fact my original design 🙃) seems to work reliably, across all platforms. A quick summary of the differences between this, #3864 and #3847:

  • arch/rv32i: separate kernel and app trap handlers to simply assembly flow and allow foreign process implementations #3847 tried to introduce the ability to "hook" custom trap handlers, for instance when running process code, simply by overwriting mtvec. This is great, as it's as efficient as it gets, and frees up mscratch to retain any arbitrary context a process implementation may want to retain.

    However, thanks to the fragmented RISC-V ecosystem, there are platforms which do not support non-vectored interrupt handlers. Yet other platforms do not support vectored interrupt handlers. This means that it is impossible for any code that writes to mtvec to be portable between chips, making this approach a non-starter.

  • arch/rv32i: separate kernel and app trap handlers; specify trap handler "interface" #3864 instead uses mscratch as a selector for which trap handler to execute. If it contains 0, it executes the kernel trap handler. Otherwise, it interprets the value of mscratch as an address, and continues trap handler execution there.

    Unfortunately, this approach does not permit passing any additional "state" to the subsequent trap handler. Consider a system with two instances of the kernel running on two CPU cores. When a trap fires, it jumps to the address passed in mscratch. The process-implementation's trap handler code then needs to store process state, but it does not know where that state is stored. Using a global symbol could work, but to support multi-core environments, that symbol would then need to be indexed by the hart ID, making this code very complex and convoluted.

Finally, this approach is much closer to the previous implementation. mscratch does not directly pass the trap handler address to be executed, but is a pointer to a datastructure with a well-defined layout:

tock/arch/rv32i/src/lib.rs

Lines 208 to 213 in da704eb

/// mscratch 0 1 2 3
/// \->|--------------------------------------------------------------|
/// | scratch word, overwritten with s1 register contents |
/// |--------------------------------------------------------------|
/// | trap handler address, continue trap handler execution here |
/// |--------------------------------------------------------------|

Notably, this "datastructure" can simply be placed at the current stack pointer, as is done with the current implementation.

The behavior of the global trap handler is then well-defined:

tock/arch/rv32i/src/lib.rs

Lines 220 to 228 in da704eb

/// 1. atomically swap s0 and the mscratch CSR,
///
/// 2. execute the default kernel trap handler if s0 now contains `0`
/// (meaning that the mscratch CSR contained `0` before entering this trap
/// handler),
///
/// 3. otherwise, save s1 to `0*4(s0)`, and finally
///
/// 4. load the address at `1*4(s0)` into s1, and jump to it.

Thus, once it jumps to the process-implementation specific trap handler, the machine state is as follows:

  • pc: set to pre-trap 4*4(mscratch)
  • mscratch: set to pre-trap s0
  • s0: set to pre-trap mscratch
  • 0*4(mscratch): set to pre-trap s1
  • s1: set to pre-trap 4*4(mscratch) (jump-target)

Because both s0 and s1 are stored, no state is lost. And because s0 contains the original value of mscratch, this can be used as a hart-specific context pointer, atomically set while hooking the trap handler. This subsequent handler simply has to do some cleanup to revert the machine to its original state before returning. For processes, this typically involves saving this state. For the kernel we circumvent this by special-casing it with mscratch = 0 and a fixed branch, retaining its current efficiency.

j {start_trap}
j {start_trap}
",
start_trap = sym rv32i::_start_trap,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are tangential, but we should avoid referencing symbol names directly across crates. Instead, we should use proper Rust function exports and re-introduce them as locally-scoped symbols in Rust's asm! blocks.

Comment on lines -424 to +629
state.pc += 4;
state.pc = state.pc.wrapping_add(4);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also tangential. This should be functionally equivalent on release builds, but would have panicked in debug builds. Overflowing seems like the right thing to do here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this a bit more, this seems to have an explicit assumption that there is not a compressed environment call instruction. If that existed (potentially as part of a future extension), we'd need to increment by 2 bytes here instead. We should probably document this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be nightmare to debug if it hits us. Should we just fix is now.

This C code is enough to get the instruction length

static inline int insn_len(uint16_t opcode)
{
    return (opcode & 3) == 3 ? 4 : 2;
}

So it should be pretty easy to check

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to implement this, and while the logic you propose is indeed very simple, it requires us to know the opcode that our platform is current executing.

However, that's not trivial. RISC-V provides us with the mtinst CSR, but this CSR (where hardware implements it) will contain the transformed, expanded instruction, in case we're executing a compressed instruction. Thus we cannot use this.

Alternatively, we could try and deference the PC. However, I'm having a hard time reasoning about the safety of that -- when the application is able to do an ecall, presumably it has access to this instruction. But maybe there's other edge cases where its not always safe for the kerrnel to dereference the PC in this particular location, such as with Kernel MPU. And in either case, this would be an additional load & compare on the already expensive system call critical path, for a potential future instruction that does not yet exist.

Based on this, I'd rather postpone making these changes until we have an implementation where this is a problem. Notably, this will never break the kernel, it just might jump to an improper instruction offset or skip a compressed instruction in userspace.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting points. I didn't think about mtinst containing the expanded construction.

I suspect de-referencing the PC should be fine, but I agree about the overhead. I agree with postponing, let's see if it is ever an issue

Copy link
Contributor
@bradjc bradjc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High level looks good.

bradjc
bradjc previously approved these changes Jun 5, 2024
@bradjc
Copy link
Contributor
bradjc commented Jun 5, 2024

This looks good to me, and despite the nature of the change (i.e. assembly) I think this is actually a pretty minor patch. In effect this only moves the app-state saving assembly to syscall.rs.

Also, adding a data structure for handling traps feels very risc-v to me, as the architecture provides only minimal hw support for context switches, and building your own abstractions on top is expected.

Comment on lines -424 to +629
state.pc += 4;
state.pc = state.pc.wrapping_add(4);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this a bit more, this seems to have an explicit assumption that there is not a compressed environment call instruction. If that existed (potentially as part of a future extension), we'd need to increment by 2 bytes here instead. We should probably document this.

bradjc
bradjc previously approved these changes Jun 5, 2024
@lschuermann
Copy link
Member Author

I'll address @alistair23's comments and reflow the text, then I think this is good to go.

@alevy
Copy link
Member
alevy commented Jun 14, 2024

@lschuermann ping

@lschuermann lschuermann force-pushed the dev/riscv-separate-kernel-app-trap-handlers branch from d9a1b81 to f58cb7e Compare June 14, 2024 16:27
This commit re-architects Tock's RV32I trap handler, to cleanly
separate the kernel- and process-handlers. It defines an "interface"
for custom trap handlers of custom process implementations.

Tock's core kernel design permits process implementations with system
call interfaces other than the ones shipped upstream in the
arch/cortex-m and arch/rv32i crates. However, in practice, the
UserspaceKernelBoundary implementation for RV32I has always been
tightly coupled to the generic kernel RISC-V trap handler. This has
made it difficult to build an alternative process
implementation (e.g., as part of Encapsulated Functions), while
reusing much of Tock's RISC-V implementation.

By defining an "interface" that the global trap handler exposes (i.e.,
specifying how a custom trap handler can be registered, and which
registers get clobbered), we allow foreign process implementations to
hook into the rv32i arch crate's assembly, and can move all
process-specific assembly into SysCall::switch_to_process. All
process-specific logic is now contained in one assembly block that can
be read top-to-bottom. The global trap handler no longer relies on the
stack layout of that function, or other data structures in syscall.rs.

Because the kernel trap handler no longer contains any application
logic and expects the app to set its own trap handler, implementing a
new syscall ABI becomes as simple as temporarily swapping out the trap
handler.

The precise interface is documented on the `_start_trap` symbol in
`arch/rv32i/src/lib.rs`.
@lschuermann lschuermann force-pushed the dev/riscv-separate-kernel-app-trap-handlers branch from f58cb7e to c1f2ce6 Compare June 14, 2024 16:29
@lschuermann
Copy link
Member Author
lschuermann commented Jun 14, 2024

@alevy Sorry, I struggled with implementing @alistair23's proposed change for the reasons outlined here: #4009 (comment)

In the end I just rebased and reflowed the comments, so nothing materially changed. I think this is good to go now.

@alevy alevy added this pull request to the merge queue Jun 14, 2024
Merged via the queue into tock:master with commit 0222998 Jun 14, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
risc-v RISC-V architecture WG-OpenTitan In the purview of the OpenTitan working group.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0