8000 feature(smp): improve hart booting, TLS setup and per-core stack initialization by mrm-develop · Pull Request #1678 · hermit-os/kernel · GitHub

More Web Proxy on the site http://driver.im/

feature(smp): improve hart booting, TLS setup and per-core stack initialization #1678

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

mrm-develop wants to merge 13 commits into hermit-os:main from mrm-develop:feature/smp_tls

Contributor

mrm-develop commented

This pull request improves the RISC-V boot process, multi-core startup, and stack/TLS setup for better stability, performance, and future extensibility.

Changes:

In mod.rs, the function boot_next_processor now includes detailed logging to show the current hart mask and which hart will be started next. It also checks the return value of sbi_rt::hart_start to log any startup errors. The memory ordering for atomic variables like CURRENT_STACK_ADDRESS and CPU_ONLINE was adjusted for safer multi-core synchronization.

In start.rs, a check was added to skip invalid secondary harts early. A new PerCpuData struct was introduced, aligned to 64 bytes to match cache lines. This structure allows each core to track its state individually. The boot hart now correctly sets up its TLS pointer (tp) using inline assembly if TLS info is available. The hart mask is calculated using FDT parsing, with each CPU's status checked to ensure it is enabled before adding it to the mask.

In scheduler.rs, the TLS setup now uses the alignment field provided in tls_info. Memory is allocated with proper alignment and zero-initialized. Stacks are allocated with extra pages to separate kernel, user, and interrupt stacks. The kernel stack is marked with 0xdeadbeef at the top to help debugging stack overflows. The TaskStacks type now properly handles cloning and deallocation. This also ensures safe memory reuse and cleanup when tasks end.

Marco Ruppert-Maas added 2 commits

April 8, 2025 22:39


          feature(smp): improve hart booting, TLS setup and per-core stack init…

a2dfe1b

…ialization


          fixed clippy issues

c067c15

mkroening self-assigned this

mkroening self-requested a review

April 9, 2025 07:47

Marco Ruppert-Maas added 3 commits

April 14, 2025 21:28


          fixed more clippy issues

ee94477


          fixed another clippy issue

dc8e3c1


          another clippy issue

c2a4a71

mkroening requested changes

View reviewed changes

Member

mkroening left a comment •

Thanks for the PR! Sorry for the late review!

I have left a few comments.

Generally, I think using Orderin::Relaxed in the cases where you changed it, would be fine. Why do you think, we need to change it? We don't synchronize other memory accesses with it.

If you need assistance with git or GitHub regarding interactive rebasing, feel free to reach out! :)

src/arch/riscv64/kernel/mod.rs Outdated Show resolved Hide resolved

src/arch/riscv64/kernel/mod.rs Outdated Show resolved Hide resolved

src/arch/riscv64/kernel/mod.rs Outdated Show resolved Hide resolved

src/arch/riscv64/kernel/mod.rs Outdated Show resolved Hide resolved

src/arch/riscv64/kernel/start.rs Outdated Show resolved Hide resolved

src/arch/riscv64/kernel/scheduler.rs Outdated

Member

mkroening

Can you split the TaskTLS changes into another PR that we can merge seperately?

src/arch/riscv64/kernel/start.rs Outdated

Comment on lines 93 to 102

+              		// Optimized HART_MASK calculation
+              		let mut hart_mask = 0u64;
+              		for cpu in fdt.cpus() {
+              			if let Some(cpu_id) = cpu.property("reg").and_then(|p| p.as_usize()) {
+              				if cpu
+              					.property("status")
+              					.and_then(|p| p.as_str())
+              					.is_some_and(|s| s != "disabled\u{0}")
+              				{
+              					hart_mask |= 1 << cpu_id;

Member

mkroening

How is this optimized? Isn't this the same code as before but in a more functional style? I honestly found the code more readable before.

Contributor Author

mrm-develop

You're right, "optimized" is misleading here, as this is just a more functional style without any performance improvement. I'll update the comment to avoid giving the wrong impression.
If you want the old code back, just send me an answer, I will change it.

Member

mkroening

I think changing it back would be better. Your functional style avoids a few unwraps, but unwrapping here is not a bad thing, I think. If there is an issue with the FDT in your code, we would just silently skip initializing the hart mask.

src/arch/riscv64/kernel/start.rs Outdated

Comment on lines 75 to 82

+              	// Optimized Hart-ID validation
+              	if CPU_ONLINE.load(Ordering::Acquire) > 0 {
+              		// Faster check for Secondary-HARTs
+              		if (HART_MASK.load(Ordering::Relaxed) & (1 << hart_id)) == 0 {
+              			error!("Invalid hart ID: {hart_id}");
+              			processor::halt();
+              		}
+              	}

Member

mkroening

What does optimized mean here? And what do we check for? And why would we want to halt the processor in that case?

Contributor Author

mrm-develop

Right again, this isn’t really an optimization. The purpose here is to validate that the current HART is part of the expected HART mask (from the device tree). If not, we halt to prevent undefined behavior from unexpected or spurious CPUs starting. I’ll update the comment to outline it is sanity check, not an optimization.

Member

mkroening

Hmm, but nothing bad would happen if an unexpected hart starts, right? Are there real cases where harts start unexpectedly? What does “faster check for Secondary-HARTS” compare to? We don't check the main hart, right? This also just checks if we are one of the harts in the hart mask, not the specific hart that was requested in sbi_rt::hart_start.

src/arch/riscv64/kernel/start.rs Show resolved Hide resolved

mrm-develop and others added 4 commits

May 31, 2025 20:23


          Merge branch 'hermit-os:main' into feature/smp_tls

a324844


          feature(smp): implemented feedback from pr review

0ecdac0


          feature(smp): made comments clearer.

a3b7441


          clippy issues

42a69a3

Contributor Author

mrm-develop commented

You can check again :)

mkroening reviewed

View reviewed changes

src/arch/riscv64/kernel/mod.rs Outdated Show resolved Hide resolved

mkroening reviewed

View reviewed changes

src/arch/riscv64/kernel/start.rs Outdated Show resolved Hide resolved

mkroening reviewed

View reviewed changes

src/arch/riscv64/kernel/start.rs Outdated Show resolved Hide resolved

mkroening reviewed

View reviewed changes

src/arch/riscv64/kernel/start.rs Outdated Show resolved Hide resolved

mrm-develop and others added 3 commits

June 14, 2025 22:22


          Merge branch 'hermit-os:main' into feature/smp_tls

e263dbc


          feature(smp): implemented feedback from second pr review

dbcf948


          clippy issue resolved

d8c0723

mrm-develop requested a review from mkroening

June 16, 2025 19:57


          Merge branch 'hermit-os:main' into feature/smp_tls

53129bd

mkroening requested changes

View reviewed changes

src/arch/riscv64/kernel/scheduler.rs

    
            @@ -281,7 +281,7 @@ impl TaskTLS {
          
              		let tls_start = VirtAddr::new(tls_info.start);

              		// Yes, it does, so we have to allocate TLS memory.

              		// Allocate enough space for the given size and one more variable of type usize, which holds the tls_pointer.

              		let tls_allocation_size = tls_size.align_up(32usize); // + mem::size_of::<usize>();

              		let tls_allocation_size = tls_size.align_up(tls_info.align as usize);

Member

mkroening

I am not convinced that this is correct. According to the comment above, this increases the size of the allocation size to hold a pointer. Aligning up by 32 bytes is a bad way to do this, but aligning the size up by the required alignment is more wrong than before.

The memory is overaligned by 128 bytes anyway further down.

What do you think?

This was referenced Jun 23, 2025

refactor(riscv64): miscellaneous improvements #1783

Merged

timespec: use i64 for tv_nsec as in POSIX #1778

Closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

0