8000 `cpu_relax` primitive by TheNumbat · Pull Request #4226 · oxcaml/oxcaml · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

cpu_relax primitive #4226

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: domain-rt4
Choose a base branch
from
Open

cpu_relax primitive #4226

wants to merge 8 commits into from

Conversation

TheNumbat
Copy link
Contributor

Makes Domain.cpu_relax a primitive instead of a runtime call.

  • The lambda & flambda primitives imply an arch-specific pause instruction, and if poll insertion is disabled, also a polling point.
  • In cmm, the primitive is translated to either Crelax or a sequence of Crelax and Cpoll, depending on Config.poll_insertion.
  • In the rest of the backend, Crelax is treated like Copaque that returns void.
  • In bytecode, the primitive is translated directly to a call to caml_ml_domain_cpu_relax, which now also checks pending actions if poll insertion is disabled.

This changes the behavior of cpu_relax, as caml_ml_domain_cpu_relax was previously implemented as a pause instruction plus a interrupt_pending check, ignoring pending actions or external interrupts. As far as I can tell, the only way to set interrupt_pending also calls interrupt_domain, which will trigger the target domain's young_limit check upon the next poll point, so polling should be sufficient.

I've added tests that cpu_relax can be used for spin-blocking between systhreads in the initial domain and on another domain.
The assembly output for the following function is the same for runtime4, runtime5, and runtime5 with poll insertion:

fun () ->
  Atomic.set flag true;
  while Atomic.get flag do
     Domain.cpu_relax ()
  done

Runtime4

000000000007f160 <camlCpu_relax__fn$5bcpu_relax.ml$3a10$2c30$2d$2d140$5d_0_1_code>:
   7f160:	48 83 ec 08          	sub    $0x8,%rsp
   7f164:	48 8d 05 fd bd 0e 00 	lea    0xebdfd(%rip),%rax        # 16af68 <camlCpu_relax+0x8>
   7f16b:	48 8b 58 10          	mov    0x10(%rax),%rbx
   7f16f:	b8 03 00 00 00       	mov    $0x3,%eax
   7f174:	48 87 03             	xchg   %rax,(%rbx)
   7f177:	48 8d 05 ea bd 0e 00 	lea    0xebdea(%rip),%rax        # 16af68 <camlCpu_relax+0x8>
   7f17e:	48 8b 40 10          	mov    0x10(%rax),%rax
   7f182:	48 8b 00             	mov    (%rax),%rax
   7f185:	48 83 f8 01          	cmp    $0x1,%rax
   7f189:	75 0d                	jne    7f198 <camlCpu_relax__fn$5bcpu_relax.ml$3a10$2c30$2d$2d140$5d_0_1_code+0x38>
   7f18b:	b8 01 00 00 00       	mov    $0x1,%eax
   7f190:	48 83 c4 08          	add    $0x8,%rsp
   7f194:	c3                   	ret
   7f195:	0f 1f 00             	nopl   (%rax)
   7f198:	f3 90                	pause
   7f19a:	4d 3b 3e             	cmp    (%r14),%r15
   7f19d:	76 02                	jbe    7f1a1 <camlCpu_relax__fn$5bcpu_relax.ml$3a10$2c30$2d$2d140$5d_0_1_code+0x41>
   7f19f:	eb d6                	jmp    7f177 <camlCpu_relax__fn$5bcpu_relax.ml$3a10$2c30$2d$2d140$5d_0_1_code+0x17>
   7f1a1:	e8 aa ff ff ff       	call   7f150 <camlCpu_relax__code_begin>
   7f1a6:	eb f7                	jmp    7f19f <camlCpu_relax__fn$5bcpu_relax.ml$3a10$2c30$2d$2d140$5d_0_1_code+0x3f>
   7f1a8:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
   7f1af:	00

Runtime5 (without poll insertion)

0000000000083060 <camlCpu_relax__fn$5bcpu_relax.ml$3a10$2c30$2d$2d140$5d_0_1_code>:
   83060:	48 83 ec 08          	sub    $0x8,%rsp
   83064:	48 8d 05 fd 6e 10 00 	lea    0x106efd(%rip),%rax        # 189f68 <camlCpu_relax+0x8>
   8306b:	48 8b 58 10          	mov    0x10(%rax),%rbx
   8306f:	b8 03 00 00 00       	mov    $0x3,%eax
   83074:	48 87 03             	xchg   %rax,(%rbx)
   83077:	48 8d 05 ea 6e 10 00 	lea    0x106eea(%rip),%rax        # 189f68 <camlCpu_relax+0x8>
   8307e:	48 8b 40 10          	mov    0x10(%rax),%rax
   83082:	48 8b 00             	mov    (%rax),%rax
   83085:	48 83 f8 01          	cmp    $0x1,%rax
   83089:	75 0d                	jne    83098 <camlCpu_relax__fn$5bcpu_relax.ml$3a10$2c30$2d$2d140$5d_0_1_code+0x38>
   8308b:	b8 01 00 00 00       	mov    $0x1,%eax
   83090:	48 83 c4 08          	add    $0x8,%rsp
   83094:	c3                   	ret
   83095:	0f 1f 00             	nopl   (%rax)
   83098:	f3 90                	pause
   8309a:	4d 3b 3e             	cmp    (%r14),%r15
   8309d:	76 02                	jbe    830a1 <camlCpu_relax__fn$5bcpu_relax.ml$3a10$2c30$2d$2d140$5d_0_1_code+0x41>
   8309f:	eb d6                	jmp    83077 <camlCpu_relax__fn$5bcpu_relax.ml$3a10$2c30$2d$2d140$5d_0_1_code+0x17>
   830a1:	e8 aa ff ff ff       	call   83050 <camlCpu_relax__code_begin>
   830a6:	eb f7                	jmp    8309f <camlCpu_relax__fn$5bcpu_relax.ml$3a10$2c30$2d$2d140$5d_0_1_code+0x3f>
   830a8:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
   830af:	00

Runtime5 (with poll insertion)

0000000000083080 <camlCpu_relax__fn$5bcpu_relax.ml$3a10$2c30$2d$2d140$5d_0_1_code>:
   83080:	48 83 ec 08          	sub    $0x8,%rsp
   83084:	48 8d 05 dd 9e 10 00 	lea    0x109edd(%rip),%rax        # 18cf68 <camlCpu_relax+0x8>
   8308b:	48 8b 58 10          	mov    0x10(%rax),%rbx
   8308f:	b8 03 00 00 00       	mov    $0x3,%eax
   83094:	48 87 03             	xchg   %rax,(%rbx)
   83097:	48 8d 05 ca 9e 10 00 	lea    0x109eca(%rip),%rax        # 18cf68 <camlCpu_relax+0x8>
   8309e:	48 8b 40 10          	mov    0x10(%rax),%rax
   830a2:	48 8b 00             	mov    (%rax),%rax
   830a5:	48 83 f8 01          	cmp    $0x1,%rax
   830a9:	75 0d                	jne    830b8 <camlCpu_relax__fn$5bcpu_relax.ml$3a10$2c30$2d$2d140$5d_0_1_code+0x38>
   830ab:	b8 01 00 00 00       	mov    $0x1,%eax
   830b0:	48 83 c4 08          	add    $0x8,%rsp
   830b4:	c3                   	ret
   830b5:	0f 1f 00             	nopl   (%rax)
   830b8:	f3 90                	pause
   830ba:	4d 3b 3e             	cmp    (%r14),%r15
   830bd:	76 02                	jbe    830c1 <camlCpu_relax__fn$5bcpu_relax.ml$3a10$2c30$2d$2d140$5d_0_1_code+0x41>
   830bf:	eb d6                	jmp    83097 <camlCpu_relax__fn$5bcpu_relax.ml$3a10$2c30$2d$2d140$5d_0_1_code+0x17>
   830c1:	e8 aa ff ff ff       	call   83070 <camlCpu_relax__code_begin>
   830c6:	eb f7                	jmp    830bf <camlCpu_relax__fn$5bcpu_relax.ml$3a10$2c30$2d$2d140$5d_0_1_code+0x3f>
   830c8:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
   830cf:	00

@TheNumbat TheNumbat added flambda2 Prerequisite for, or part of, flambda2 backend runtime lambda Lambda language changes stdlib labels Jun 28, 2025
@TheNumbat TheNumbat requested review from xclerc and stedolan June 28, 2025 02:15
Copy link
Contributor
@stedolan stedolan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a huge fan of the naming in the backend (Relax rather than Cpu_relax makes me think of branch relaxation before it makes me think of pause/yield/etc), but lgtm generally.

| Op Opaque ->
(* Assume arbitrary side effects from Iopaque *)
| Op (Opaque | Relax) ->
(* Assume arbitrary side effects from Opaque / Relax *)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this right? (I'm honestly not sure either way)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not entirely sure either, I went with the most conservative option since we don't want loads/stores reordered across it. But maybe atomic load/stores will already not be reordered? cc @xclerc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed this to only kill loads

| Ppoll ->
Some alloc_heap
| Ppoll -> Some alloc_heap
| Pcpu_relax -> if Config.poll_insertion then None else Some alloc_heap
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly Pcpu_relax should match Ppoll in the non-poll-insertion case, but I don't understand why either of them is marked as Some alloc_heap. What's this function used for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears to only be used for checking whether a primitive stack-allocates, so the heap cases don't matter. There's also a CR @mshinwell saying to check this case, so it could probably be removed (in another pr)

@stedolan
Copy link
Contributor

Do you know why the compiler seems not to have managed to CSE the load of the immutable field flag from the module block in your code sample?

@TheNumbat
Copy link
Contributor Author

I didn't call it Cpu_relax in the backend since the semantics differ from the frontend, but maybe Pause would be better?

@TheNumbat
Copy link
Contributor Author

I checked treating Pause as pure for CSE doesn't effect the flag load--may be because flag is actually coming from a static closure environment here, not a module

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend flambda2 Prerequisite for, or part of, flambda2 lambda Lambda language changes runtime stdlib
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0