-
-
Notifications
You must be signed in to change notification settings - Fork 341
[LA64_DYNAREC] Added more 660F opcodes #2127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Sorry for introduced regression: https://github.com/ptitSeb/box64/actions/runs/12229956103/job/34110518062?pr=2127 I am checking... Thanks, |
Change title to |
Note I won't be able to review this PR at least for today, so take your time to fix the bugs ;) |
206dfc5
to
55ec355
Compare
src/dynarec/la64/dynarec_la64_660f.c
Outdated
GETEX(q1, 0, 0); | ||
v0 = fpu_get_scratch(dyn); | ||
v1 = fpu_get_scratch(dyn); | ||
VMULWEV_W_H(v0, q0, q1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest, this seems a bit too much of instructions to implement this, maybe xvexth.w.h
is helpful here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LASX is there, so it's a waste to have the upper 128bit does nothing :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EV
in VMULWEV_W_H
means Even digit and OD
means Odd digit, so there might not be a chance to move v1 into v0's upper 128bit:
VMULWEV_W_H(v0, q0, q1);
VMULWOD_W_H(v1, q0, q1);
XVEXTH_W_H(v0, v1);
XVSRAI_W(v0, v0, 14);
XVADDI_WU(v0, v0, 1);
XVSRANI_H_W(q0, v0, 1);
Please point out my fault.
Thanks,
Leslie Zhai
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VEXT2XV_W_H(v0, q0);
VEXT2XV_W_H(v1, q1);
XVMUL_W(v0, v0, v1);
XVSRLI_W(v0, v0, 14);
XVADDI_WU(v0, v0, 1);
XVSRLNI_H_W(v0, v0, 0);
XVPICKOD_D(v0, v0, v0);
Not sure if this is correct, but the idea here is to use the high 128 bit of LASX.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe this to save 1 more instruction? Like I said, I'm not sure, but I do think it's doable.
VEXT2XV_W_H(v0, q0);
VEXT2XV_W_H(v1, q1);
XVMUL_W(v0, v0, v1);
XVSRLNI_H_W(v0, v0, 14);
XVADDI_HU(v0, v0, 1);
XVPICKOD_D(v0, v0, v0);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo?
-XVPICKOD_D(v0, v0, v0);
+XVPICKOD_D(q0, v0, v0);
sse_intrinsics pmulhrsw failed.
src/dynarec/la64/dynarec_la64_660f.c
Outdated
GETEX(q1, 0, 0); | ||
GETGX_empty(q0); | ||
v0 = fpu_get_scratch(dyn); | ||
VREPLGR2VR_D(v0, xZR); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use VXOR_V(v0, v0, v0);
, which has determined latency and IPC, and it's already used everywhere.
src/dynarec/la64/dynarec_la64_660f.c
Outdated
GETEX(q1, 0, 0); | ||
GETGX_empty(q0); | ||
v0 = fpu_get_scratch(dyn); | ||
VREPLGR2VR_D(v0, xZR); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
Co-authored-by: Yang Liu <liuyang22@iscas.ac.cn>
Thanks 🍻 |
Hi,
dav1d benchmark:
before:
after:
Please review my patch.
Thanks,
Leslie Zhai