[LA64_DYNAREC] Added more 660F opcodes #2127

xiangzhai · 2024-12-09T06:08:20Z

Hi,

./dav1d -i Chimera-AV1-8bit-480x270-552kbps.ivf --muxer null

before:

Decoded 8929/8929 frames (100.0%) - 3.67/23.98 fps (0.15x)

real	40m35.251s
user	40m20.151s

after:

Decoded 8929/8929 frames (100.0%) - 89.06/23.98 fps (3.71x)

real	1m40.299s
user	1m39.578s

Please review my patch.

Thanks,
Leslie Zhai

xiangzhai · 2024-12-09T06:12:07Z

Sorry for introduced regression: https://github.com/ptitSeb/box64/actions/runs/12229956103/job/34110518062?pr=2127

I am checking...

Thanks,
Leslie Zhai

ksco · 2024-12-09T06:15:21Z

Change title to Added more 660F opcodes or just Added more opcodes.

ksco · 2024-12-09T06:16:19Z

Note I won't be able to review this PR at least for today, so take your time to fix the bugs ;)

ksco · 2024-12-09T15:26:03Z

src/dynarec/la64/dynarec_la64_660f.c

+                    GETEX(q1, 0, 0);
+                    v0 = fpu_get_scratch(dyn);
+                    v1 = fpu_get_scratch(dyn);
+                    VMULWEV_W_H(v0, q0, q1);


To be honest, this seems a bit too much of instructions to implement this, maybe xvexth.w.h is helpful here?

LASX is there, so it's a waste to have the upper 128bit does nothing :)

EV in VMULWEV_W_H means Even digit and OD means Odd digit, so there might not be a chance to move v1 into v0's upper 128bit:

VMULWEV_W_H(v0, q0, q1); VMULWOD_W_H(v1, q0, q1); XVEXTH_W_H(v0, v1); XVSRAI_W(v0, v0, 14); XVADDI_WU(v0, v0, 1); XVSRANI_H_W(q0, v0, 1);

Please point out my fault.

Thanks,
Leslie Zhai

VEXT2XV_W_H(v0, q0); VEXT2XV_W_H(v1, q1); XVMUL_W(v0, v0, v1); XVSRLI_W(v0, v0, 14); XVADDI_WU(v0, v0, 1); XVSRLNI_H_W(v0, v0, 0); XVPICKOD_D(v0, v0, v0);

Not sure if this is correct, but the idea here is to use the high 128 bit of LASX.

Or maybe this to save 1 more instruction? Like I said, I'm not sure, but I do think it's doable.

VEXT2XV_W_H(v0, q0); VEXT2XV_W_H(v1, q1); XVMUL_W(v0, v0, v1); XVSRLNI_H_W(v0, v0, 14); XVADDI_HU(v0, v0, 1); XVPICKOD_D(v0, v0, v0);

typo?

-XVPICKOD_D(v0, v0, v0); +XVPICKOD_D(q0, v0, v0);

sse_intrinsics pmulhrsw failed.

ksco · 2024-12-10T04:10:33Z

src/dynarec/la64/dynarec_la64_660f.c

+                    GETEX(q1, 0, 0);
+                    GETGX_empty(q0);
+                    v0 = fpu_get_scratch(dyn);
+                    VREPLGR2VR_D(v0, xZR);


Use VXOR_V(v0, v0, v0);, which has determined latency and IPC, and it's already used everywhere.

ksco · 2024-12-10T04:10:44Z

src/dynarec/la64/dynarec_la64_660f.c

+                    GETEX(q1, 0, 0);
+                    GETGX_empty(q0);
+                    v0 = fpu_get_scratch(dyn);
+                    VREPLGR2VR_D(v0, xZR);


src/dynarec/la64/dynarec_la64_660f.c

Co-authored-by: Yang Liu <liuyang22@iscas.ac.cn>

xiangzhai · 2024-12-10T09:50:23Z

Thanks 🍻

xiangzhai marked this pull request as draft December 9, 2024 06:11

xiangzhai changed the title [LA64_DYNAREC] Added PMULHRSW, PABSB, PABSW, PACKUSDW, PMINUW, PMAXSD, PMULLD, PBLENDW, PSRLW, PSUBUSB, PMINUB, PADDUSW, PMAXUB, PSRAD, PSUBSB, PSUBSW, PMINSW, PADDSB, PADDSW, PMAXSW and PMADDWD opcodes [LA64_DYNAREC] Added more 660F opcodes Dec 9, 2024

[LA64_DYNAREC] Added more 660F opcodes

55ec355

xiangzhai force-pushed the la64_dynarec_660f branch from 206dfc5 to 55ec355 Compare December 9, 2024 06:52

xiangzhai marked this pull request as ready for review December 9, 2024 06:58

ksco reviewed Dec 9, 2024

View reviewed changes

ksco reviewed Dec 10, 2024

View reviewed changes

xiangzhai and others added 2 commits December 10, 2024 14:00

[LA64_DYNAREC] Change VREPLGR2VR_D to VXOR_V

ad59514

[LA64_DYNAREC] Optimize PMULHRSW

676aa8c

Co-authored-by: Yang Liu <liuyang22@iscas.ac.cn>

xiangzhai requested a review from ksco December 10, 2024 09:16

ksco approved these changes Dec 10, 2024

View reviewed changes

ksco requested a review from ptitSeb December 10, 2024 09:34

ptitSeb approved these changes Dec 10, 2024

View reviewed changes

ptitSeb merged commit f1addb8 into ptitSeb:main Dec 10, 2024
27 checks passed

xiangzhai deleted the la64_dynarec_660f branch December 12, 2024 01:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[LA64_DYNAREC] Added more 660F opcodes #2127

[LA64_DYNAREC] Added more 660F opcodes #2127

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[LA64_DYNAREC] Added more 660F opcodes #2127

[LA64_DYNAREC] Added more 660F opcodes #2127

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!