Optimize literals encoding during compression #1411

Nicoshev · 2024-05-14T00:39:29Z

Hello, I hope you are doing well : )

I wanted to propose a change into the way the literals are written onto the output buffer during compression.

The intent is to take advantage of the existing execution fork yielded by 'if (litLength >= RUN_MASK)'
The wildCopy loop is now ommited when less than 15 literals need to be copied.
The precondition of having to copy at least 15 literals is used to process them chunks of 16 bytes.

Regarding performance, the same two scenarios of PR#1407 were tested.

Scenario 1:
CPU: Ryzen 5900X
Compilers: gcc-11, clang-8.
Files: enwik9, each file within the silesia corpus, a custom memory dump retrieved from my PC.
Results: Decompression throughput improved in all combinations of files and compilers.

Scenario 2:
CPU: i5-1245U
Compilers: gcc-[9-12], clang-[11-14].
Files: enwik9, 'xml' from the silesia corpus, the custom memory dump retrieved from my PC.
Results: Decompression throughput improved in all combinations of files and compilers.

On the Ryzen 5900:
The change yielded compression throughput improvements for all files when clang was used.
Regarding gcc, throughput decreased for 'sao' and 'x-ray' and increased for all other 12 files.
There seems to be an inclination towards the change improving throughput.

On the weaker i5-1245U:
on gcc-9: 10 files showed better throuhgput, 4 files worsened
on gcc-10: 12 files showed better throuhgput, 2 files worsened
on gcc-11: 13 files showed better throuhgput, 1 file worsened
on gcc-12: 13 files showed better throuhgput, 1 file worsened
on clang-11: 10 files showed better throuhgput, 4 files worsened
on clang-12: 12 files showed better throuhgput, 2 files worsened
on clang-13: 13 files showed better throuhgput, 1 files worsened
on clang-14: 12 files showed better throuhgput, 2 files worsened
Although it has more mixed results, it also showed a noticeable inclination towards including the change

Overall, there seems to be a trend towards improving performance by including the change.
It is relevant to remark that the trend improves as more powerful CPUs and more modern compilers are used.

Thanks,
Nicolas

Cyan4973 · 2024-05-14T01:28:47Z

Differentiating literal copy operations depending on their lengths seems a good idea.
It's obvious that branch statistics will be different depending on knowing in advance if the length is short or long(er).

In term of execution, I still see a number of branches in this PR, and I would likely consider a slightly different variant, trying to reach some wildcopy16 or equivalent (note: it doesn't exist yet, we only have wildcopy8 and wildcopy32), though it strongly depends on guarantees of distance from end of buffer, which are not obvious at this point in the code base.

Nicoshev · 2024-05-14T01:40:03Z

The existing wildCopyX functions can overwrite up to X bytes after the end of the passed buffer.
In this case, after the space for the literals, we can only guarantee 13 bytes.
That is why the extra 8 byte copy is appended after the loop: to handle the cases where (litLength%16)<8.
Since litLength is at least 15, the first 16 bytes written are guaranteed to be within the output buffer limits.

I do not mind adding a function with this behaviour:
Copies in chunks of 16 bytes, only overwriting up to 8 bytes after the first iteration.

But just wanted to make sure that you are fine with having a slightly different wildcopy function.

Nicoshev added 6 commits May 13, 2024 17:16

Optimize literals encoding

0b223a3

Add cast

50a0c08

Refactor loop

383b2fc

Try two 8-byte writes

854d271

Move pointer declaration

46ae041

Fix merge

75eaff3

Cyan4973 self-assigned this May 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize literals encoding during compression #1411

Optimize literals encoding during compression #1411

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Optimize literals encoding during compression #1411

Are you sure you want to change the base?

Optimize literals encoding during compression #1411

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!