8000 [LA64_DYNAREC]Add basic avx support for la64. by phorcys · Pull Request #2745 · ptitSeb/box64 · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[LA64_DYNAREC]Add basic avx support for la64. #2745

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

phorcys
Copy link
Contributor
@phorcys phorcys commented Jun 14, 2025
 basic infra for la64 avx
 some basic ops for avx 
   VMOVDQU/VMOVDQA/VMOVUPS/VMOVAPS/VMOVUPD/VMOVAPD 
   VZEROUPPER/VZEROALL 
   VMOVD/VMOVSD/VMOVSS 
   VINSERTF128/VINSERTI128/VEXTRACTF128/VEXTRACTI128 
   VBROADCASTSS/VBROADCASTSD/VBROADCASTF128

This pullrequest add basic AVX support for la64.

LoongArch LASX is 256bits SIMD, this patch use native 256-width reg/inst to imp avx insts.
Box64 interpreter has split YMM storage(x64emu_t.ymm[] only keep upper 128bits of YMM).
In this patch, 256bits AVX reg push/pop can't used xvld/xvst,
but use vld,vld+xvpermi.q or vst,xvpermi.q+vst instead.

unlike arm64's ymm zero trace method, this patch use avxcache_t to store avx reg width and upper 128bits zero-fill states.

reg conv:

conv additional transform
SSE->AVX clear ssecache
AVX->SSE purge upper 128bits, clear avxcache
AVX128 -> AVX256 land upper 128bits zero-fill if needed
AVX256 -> AVX128 no special transform.

Relay on x64 compiler's auto vzeroup/vzeroall generate. No AVX-SSE Transition Penalties imp yet.

@ksco
Copy link
Collaborator
ksco commented Jun 14, 2025

👍 Thank you for working on this. At first glance, it looks promising. But it's still a big PR, so I need to find a couple of hours to review, which will be early next week.

@ksco ksco self-requested a review June 14, 2025 09:18
@ksco
Copy link
Collaborator
ksco commented Jun 14, 2025

I should've started the AVX support on RV64 too...

@phorcys
Copy link
Contributor Author
phorcys commented Jun 14, 2025

feel free for review this .
the state sync in this patch is a little complicated,I'm not sure that I wrote everything right.
especially when BOX64_DYNAREC_TEST=1.

@ptitSeb
Copy link
Owner
ptitSeb commented Jun 14, 2025

That's an impressive PR! Thank you for your involvement in box64 developpement.

Copy link
Collaborator
@ksco ksco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this looks good. I left some comments. Please do smaller PRs next time!

@phorcys phorcys requested a review from ksco June 17, 2025 15:01
Copy link
Contributor Author
@phorcys phorcys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated code.

} else {
VINSGR2VR_W(q0, ed, 0);
}
YMM0(gd);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I means YMM0 vs XVXOR_V both set high 128bits to zero. both causing a pending store(store high 128) in current code. So they are same now.

XVXOR_V would be better when reg is reused in next in-block insts as 256-width.
And to use XVXOR_V , you need to mark reg as 256-width, or cleared high bits will be ignored in current code.
Maybe we can do this opt later when most avx code imp landed.

VEXTRINS_D(q0, q2, 0);
}
} else {
VXOR_V(q0, q0, q0);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VMOVD's comment.

@ksco
Copy link
Collaborator
ksco commented Jun 20, 2025

If this is ready for another round, please use the re-request review button.

@phorcys
Copy link
Contributor Author
phorcys commented Jun 20, 2025

If this is ready for another round, please use the re-request review button.
图片

You didn't push "request change", so there is no "request review" button

Copy link
Collaborator
@ksco ksco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another round.

} else {
VINSGR2VR_W(q0, ed, 0);
}
YMM0(gd);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I don't understand. My suggestion here is to change line 153 to XVXOR_V and change line 159 to zero_upper = 0, so we don't have to clear the upper 128 bits again later.

VEXTRINS_D(q0, q2, 0);
}
} else {
VXOR_V(q0, q0, q0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you just repeat the comment? I can't find that easily via the web UI...

@ksco
Copy link
Collaborator
ksco commented Jun 20, 2025

You didn't push "request change", so there is no "request review" button

Sorry, I must have selected "Comment" instead of "Request changes".

@phorcys phorcys requested a review from ksco June 20, 2025 08:56
@ksco
Copy link
Collaborator
ksco commented Jun 20, 2025

So there are only a few pending comments left. After we resolved all that, I'll do a full review again. @phorcys

    * basic infra for avx
    * some basic ops for avx
	VMOVDQU/VMOVDQA/VMOVUPS/VMOVAPS/VMOVUPD/VMOVAPD
	VZEROUPPER/VZEROALL
	VMOVD/VMOVSD/VMOVSS
	VINSERTF128/VINSERTI128/VEXTRACTF128/VEXTRACTI128
	VBROADCASTSS/VBROADCASTSD/VBROADCASTF128
@phorcys
Copy link
Contributor Author
phorcys commented Jun 20, 2025

I've fixed all the fix required, please review them ag when you have free time. Thanks. @ksco

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0