-
-
Notifications
You must be signed in to change notification settings - Fork 339
[LA64_DYNAREC]Add basic avx support for la64. #2745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
👍 Thank you for working on this. At first glance, it looks promising. But it's still a big PR, so I need to find a couple of hours to review, which will be early next week. |
I should've started the AVX support on RV64 too... |
feel free for review this . |
That's an impressive PR! Thank you for your involvement in box64 developpement. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this looks good. I left some comments. Please do smaller PRs next time!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated code.
} else { | ||
VINSGR2VR_W(q0, ed, 0); | ||
} | ||
YMM0(gd); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I means YMM0 vs XVXOR_V both set high 128bits to zero. both causing a pending store(store high 128) in current code. So they are same now.
XVXOR_V would be better when reg is reused in next in-block insts as 256-width.
And to use XVXOR_V , you need to mark reg as 256-width, or cleared high bits will be ignored in current code.
Maybe we can do this opt later when most avx code imp landed.
VEXTRINS_D(q0, q2, 0); | ||
} | ||
} else { | ||
VXOR_V(q0, q0, q0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VMOVD's comment.
If this is ready for another round, please use the re-request review button. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another round.
} else { | ||
VINSGR2VR_W(q0, ed, 0); | ||
} | ||
YMM0(gd); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I don't understand. My suggestion here is to change line 153 to XVXOR_V
and change line 159 to zero_upper = 0
, so we don't have to clear the upper 128 bits again later.
VEXTRINS_D(q0, q2, 0); | ||
} | ||
} else { | ||
VXOR_V(q0, q0, q0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you just repeat the comment? I can't find that easily via the web UI...
Sorry, I must have selected "Comment" instead of "Request changes". |
So there are only a few pending comments left. After we resolved all that, I'll do a full review again. @phorcys |
* basic infra for avx * some basic ops for avx VMOVDQU/VMOVDQA/VMOVUPS/VMOVAPS/VMOVUPD/VMOVAPD VZEROUPPER/VZEROALL VMOVD/VMOVSD/VMOVSS VINSERTF128/VINSERTI128/VEXTRACTF128/VEXTRACTI128 VBROADCASTSS/VBROADCASTSD/VBROADCASTF128
I've fixed all the fix required, please review them ag when you have free time. Thanks. @ksco |
This pullrequest add basic AVX support for la64.
LoongArch LASX is 256bits SIMD, this patch use native 256-width reg/inst to imp avx insts.
Box64 interpreter has split YMM storage(x64emu_t.ymm[] only keep upper 128bits of YMM).
In this patch, 256bits AVX reg push/pop can't used xvld/xvst,
but use vld,vld+xvpermi.q or vst,xvpermi.q+vst instead.
unlike arm64's ymm zero trace method, this patch use avxcache_t to store avx reg width and upper 128bits zero-fill states.
reg conv:
Relay on x64 compiler's auto vzeroup/vzeroall generate. No AVX-SSE Transition Penalties imp yet.