-
Notifications
You must be signed in to change notification settings - Fork 72
Add libxmp_crc32c function and fast module hashing mode. #651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Some notes:
|
Thanks, I will check into this. I don't know if I'll get back to this right away. This branch is a tumor that grew off of MOD VBlank detection and my old CRC-32C testing, I had to get it out of my working tree :-) |
For calling cpuid, and also the PIC issue, see SDL's code: |
For RISC stuff, CC: @ccawley2011 |
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #651 +/- ##
======================================
- Coverage 84% 84% -0%
======================================
Files 154 156 +2
Lines 30051 30135 +84
======================================
+ Hits 25302 25356 +54
- Misses 4749 4779 +30
Continue to review full report in Codecov by Sentry.
|
I don't know anything about RISC-V. I'm more familiar with RISC OS, which is unrelated. That said, it would be interesting to get the ARMv8 code working on RISC OS, although GCC 4.7.4 predates the |
I suspect supporting all of these is going to get messy—maybe an
Does RISC OS allow detection of the required CPU features at runtime? (System call, user code runs at EL1 or higher, RISC OS guarantees access to |
It should be possible using the OS_PlatformFeatures SWI. |
c6276a5
to
ca34b1e
Compare
I fixed the RISC-V CRC-32C implementation and fixed minor issues that caused some of the Linux regression tests to fail. I have a much better idea of how to handle the vectorized x86 implementation now, and I'll fix the MSVC/Watcom tests when I get to that. I found this regarding an ARM vectorized implementation though:
|
4683847
to
799ab5f
Compare
Adds a CRC-32C calculation function simplified from Mark Adler's fast CRC-32C function, with fast hardware ARM and RISC-V Zbc implementations added. Also adds an optional fast module hashing mode to libxmp, allowing the MD5 to be skipped unless a module matches both the expected length and CRC-32C of a module in the quirks list. This reduces loading time significantly for most modules, at the cost of users opting out of being able to use the MD5 for their own module quirks lists. Fast hashing mode is intended to accelerate fuzzing and loading on embedded platforms, and is disabled by default. Finally, I've removed an old copy of nebulos by Audiomonster from the quirks list. I've been unable to locate this copy, and all existing copies are correctly detected as VBlank by scan compare. See libxmp issue libxmp#400 for more information. TODO: * Locate missing copy of "No Mercy" by Alf. * Documentation for xmp_set_player setting to configure hashing mode. * Vectorized x86, ASM vectorized ARM.
799ab5f
to
04fcb51
Compare
Adds a CRC-32C calculation function simplified from Mark Adler's fast CRC-32C function, with fast hardware ARM and RISC-V Zbc implementations added.
Also adds an optional fast module hashing mode to libxmp, allowing the MD5 to be skipped unless a module matches both the expected length and CRC-32C of a module in the quirks list. This reduces loading time significantly for most modules, at the cost of users opting out of being able to use the MD5 for their own module quirks lists. Fast hashing mode is intended to accelerate fuzzing and loading on embedded platforms, and is disabled by default.
Finally, I've removed an old copy of nebulos by Audiomonster from the quirks list. I've been unable to locate this copy, and all existing copies are correctly detected as VBlank by scan compare.
See libxmp issue #400 for more information.
TODO:
LIBXMP_NO_HARDWARE_CRC
in Makefile.vc and watcom.mif?getauxval
/asm/hwcap.h
/sysctlbyname
.Vectorized ARM CRC-32C implementation. This needs an extraDone with pure C right now, but it might be helpful to also have a GCC/clang inline assembly implementation too.sysctlbyname
call, asPMULL
is optional.Fix unfinished RISC-V Zbc CRC-32C implementation.I'm not aware of any useful boards that implement B yet, and Linux hasn't added a HWCAP flag for B or Zbc yet, so this is mostly just for QEMU :^)When the fast routine is enabled, I've seen this reduce full loads of the Modland Protracker directory from ~50s to ~30s (i7-7700, Windows 10, test-dev/xmpchk, filesystem cached in RAM). Fuzzing improvement won't be quite this good since most inputs already fail to load, and the fuzzing routine now uses the CRC-32C to seed player settings. All fast hashing code is currently disabled in the patch to pass the regression tests.
Other useless things:
vpmsumd
(intrinsicvec_pmsum_be
), which is almost equivalent to the entirefold
function in crc32c_arm.h (the products are XORed into bits 1-127 of the result so the required modulos are different, and the result still needs to be XORed with the next input). This seems to have been added in Power ISA 2.07 (POWER8 and up), which as far as I can tell is used exclusively in servers and expensive workstations, so this is very low priority.xmulx
,xmulxhi
, andxmpmul
which can perform the required polynomial math for this. Buy me a workstation that supports these instructions and I will implement it :)