-
Notifications
You must be signed in to change notification settings - Fork 698
Riscv, kernel trapping, and "double" faults #1374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Yes, that is correct. There should be no faults at all generated while in kernel mode or in the idle thread. There are some delayed faults on ARM that cannot be masked. IIRC,
Yes
We should put a big comment somewhere that says that. Currently there isn't because that knowledge was usually passed along from the current kernel engineers. Using the global variable |
Is there a more up to date version of https://sel4.systems/Info/Docs/seL4-spec.pdf anywhere? (even just a single platform is useful, because there is arch-independent things). It makes it a bit easier to ctrl+f through if it's in a single document.
I'm not sure whether you're saying RISC-V shouldn't have an equivalent, or if RISC-V doesn't have an equivalent because it has no delayed faults like on ARM. If it's the first, I would argue that it should, because it was useful for catching a kernel bug in the past. At the very least in debug mode would make sense to me, as it would have me time debugging (and probably another person a few months ago with the same code). |
True, it keeps IRQs disabled while in kernel mode.
True, but like Gerwin said, ARM has asynchronous faults (SError) which theoretically can arrive much later. Though I don't think it can arrive late enough for a task switch to complete in practice. Also, we do have an SMP bug somewhere which can cause faults for the idle thread, at least with Clang.
It's still good to know if it happens while the idle task is running, as that shouldn't happen.
It could be the same. It's simply that unicore code came first and was kept unmodified when SMP was added, to avoid invalidating any proofs. |
There currently isn't but there probably should be. I've made a TODO item for myself to make it at least a CI artefact, so that it should become available for all proof configs that are built regularly. Then it would also be relatively easy to update during releases. This is the abstract spec, though, not the Haskell code. The Haskell has comments, the formal spec not so much (it does have some explanations, but the narrative was already there in Haskell so it wasn't duplicated).
I was just trying to explain why RISC-V doesn't currently have it, but AARCH64 does. It's not an oversight or bug, but it does sound like it would be useful for debugging to have it. If it's in assembly and basically a new entry point, it's not going to be covered by the proofs, so we should be extra careful to make sure it either only exists in debug mode or that it cannot interfere with normal operation. |
The Haskell is literate haskell, so it's LaTeX. Is there not an equivalent PDF for that? (I might have confused sel4-spec with the haskell spec).
Yep, that's what I was going for :) |
Uh oh!
There was an error while loading. Please reload this page.
Whilst modifying a RISC-V build of seL4, I was getting messages that looked like this:
The kernel would then continue as if nothing had happened.
This was eventually traced down to the modifications corrupting the stack, eventually resulting in a memory access fault within the kernel memory, which got attributed to the idle thread. 1
I am under the impression that seL4 should not have nested exceptions (in normal operation).
It appears that the AArch64 traps.S code calls a
kernelDataAbort()
function if there is a trap taken whilst within the kernel, as perseL4/src/arch/arm/64/traps.S
Lines 103 to 130 in 1732c05
In contrast, the RISC-V traps.S assumes that any trap is from u-mode. I'm not quite sure how traps within s-mode end up back here (perhaps openSBI forwards them back?), but this is what I was observing. It looks like the linux kernel stores a 0 in
sscratch
whilst in the kernel, otherwise it stores it stores the tp (~-> sp), of which we could do a very similar thing: 2. Unless of course there's an architectural way to determine this (mcause
? do we have access to that in s-mode?)We should never get "faults" from the idle thread, since it's a pseudo-thread managed by the kernel?
(This is more of a query) The fact that there is a function called
handleDoubleFault
that gets invoked byhandleFault()
whensendFaultIPC
fails:seL4/src/kernel/faulthandler.c
Lines 58 to 60 in 1732c05
handleNoFaultHandler
makes more sense. Maybe it would make sense to rename this function, or, at the very least, add a doc comment explaining what this function is doing.handleFault()
setcurrent_fault
global, which is then immediately read back byhandleFault()
? Most cases seem to be immediately settingcurrent_fault
before calling. There are a few cases where this does not do so immediately beforehand, but would it not be clearer for this to be two separate codepaths (if it's even necessary?)In conclusion (unless there is something wrong in the above), what I think should be adjusted (and can probably help with):
kernelDataAbort()
as necessary (please advise how), orsscratch
that we can check upon a fault to see if we came from the kernel, and then aborthandleDoubleFault
such that its meaning / origin is more clear.Footnotes
It was in particular when switching to the idle thread, so I think there was nothing that would stop an exception inside s-mode from being attributed to an arbitrary thread. ↩
We seem to store the current thread registers on the core-local stack (SMP) otherwise in
ksCurThread
's registers (not-SMP). I'm not particularly sure why this couldn't be the same in SMP and non-SMP cases? But irregardless... ↩The text was updated successfully, but these errors were encountered: