8000 [bug-fix] Fix rare corruption bug affecting the block splitter by terrelln · Pull Request #3517 · facebook/zstd · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[bug-fix] Fix rare corruption bug affecting the block splitter #3517

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 23, 2023

Conversation

terrelln
Copy link
Contributor

TLDR: This is a very rare bug affecting the block splitter. Only levels using the optimal parser (levels 13 and above depending on the source size), or users who explicitly enable the block splitter, are affected.

The block splitter confuses sequences with literal length == 65536 that use a repeat offset code. It interprets this as literal length == 0 when deciding the meaning of the repeat offset, and corrupts the repeat offset history. This is benign, merely causing suboptimal compression performance, if the confused history is flushed before the end of the block, e.g. if there are 3 consecutive non-repeat code sequences after the mistake. It also is only triggered if the block splitter decided to split the block.

All that to say: This is a rare bug, and requires quite a few conditions to trigger. However, the good news is that if you have a way to validate that the decompressed data is correct, e.g. you've enabled zstd's checksum or have a checksum elsewhere, the original data is very likely recoverable. So if you were affected by this bug please reach out.

The fix is to remind the block splitter that the literal length is actually 64K. The test case is a bit tricky to set up, but I've managed to reproduce the issue.

Thanks to @danlark1 for alerting us to the issue and providing us a reproducer!

@terrelln terrelln force-pushed the 2023-02-21-corruption-fix branch 2 times, most recently from 8e13002 to cbeb60f Compare February 22, 2023 23:22
Copy link
Contributor
@Cyan4973 Cyan4973 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great fix @terrelln ! That's indeed a pretty rare corner case.

@danlark1
Copy link
Contributor

Thanks a lot, am I right it was introduced with zstd 1.5.0 when blocksplitter was released?

@terrelln
Copy link
Contributor Author

Thanks a lot, am I right it was introduced with zstd 1.5.0 when blocksplitter was released?

Correct, the first affected version was zstd v1.5.0.

The block splitter confuses sequences with literal length == 65536 that use a
repeat offset code. It interprets this as literal length == 0 when deciding the
meaning of the repeat offset, and corrupts the repeat offset history. This is
benign, merely causing suboptimal compression performance, if the confused
history is flushed before the end of the block, e.g. if there are 3 consecutive
non-repeat code sequences after the mistake. It also is only triggered if the
block splitter decided to split the block.

All that to say: This is a rare bug, and requires quite a few conditions to
trigger. However, the good news is that if you have a way to validate that the
decompressed data is correct, e.g. you've enabled zstd's checksum or have a
checksum elsewhere, the original data is very likely recoverable. So if you were
affected by this bug please reach out.

The fix is to remind the block splitter that the literal length is actually 64K.
The test case is a bit tricky to set up, but I've managed to reproduce the issue.

Thanks to @danlark1 for alerting us to the issue and providing us a reproducer!
@terrelln terrelln force-pushed the 2023-02-21-corruption-fix branch from cbeb60f to 07ae766 Compare February 23, 2023 17:47
@terrelln terrelln merged commit 395a2c5 into facebook:dev Feb 23, 2023
@georgthegreat
Copy link
Contributor

@terrelln, could you tag a release version with this fix?

@Cyan4973
Copy link
Contributor

A v1.5.5 release version will be produced in March including this fix.

@rationalex
Copy link

@terrelln could you please share with us a snippet that reproduces this bug? I'd like to use it to check whether another zstd implementation (https://github.com/klauspost/compress/tree/master/zstd to be specific) has the same issue.

ligurio added a commit to ligurio/tarantool that referenced this pull request Mar 2, 2023
Updated third_party/zstd submodule from v1.5.2 to pre-1.5.5 version.
The new version fixes a rare bug that was introduced in 1.5.0 [1].

A v1.5.5 release version will be produced in March.

1. facebook/zstd#3517

Fixes tarantool#8391

NO_DOC=build
NO_TEST=build
locker pushed a commit to ligurio/tarantool that referenced this pull request Mar 2, 2023
Updated third_party/zstd submodule from v1.5.2 to pre-1.5.5 version.
The new version fixes a rare bug that was introduced in 1.5.0 [1].

A v1.5.5 release version will be produced in March.

1. facebook/zstd#3517

Fixes tarantool#8391

NO_DOC=build
NO_TEST=build
locker pushed a commit to ligurio/tarantool that referenced this pull request Mar 2, 2023
Updated third_party/zstd submodule from v1.5.2 to pre-1.5.5 version.
The new version fixes a rare bug that was introduced in 1.5.0 [1].

A v1.5.5 release version will be produced in March.

1. facebook/zstd#3517

Fixes tarantool#8391

NO_DOC=build
NO_TEST=build
ligurio added a commit to ligurio/tarantool that referenced this pull request Mar 2, 2023
Updated third_party/zstd submodule from v1.5.2 to pre-1.5.5 version.
The new version fixes a rare bug that was introduced in 1.5.0 [1].

A v1.5.5 release version will be produced in March.

1. facebook/zstd#3517

Fixes tarantool#8391

NO_DOC=build
NO_TEST=build
locker pushed a commit to tarantool/tarantool that referenced this pull request Mar 3, 2023
Updated third_party/zstd submodule from v1.5.2 to pre-1.5.5 version.
The new version fixes a rare bug that was introduced in 1.5.0 [1].

A v1.5.5 release version will be produced in March.

1. facebook/zstd#3517

Fixes #8391

NO_DOC=build
NO_TEST=build
locker pushed a commit to tarantool/tarantool that referenced this pull request Mar 3, 2023
Updated third_party/zstd submodule from v1.5.2 to pre-1.5.5 version.
The new version fixes a rare bug that was introduced in 1.5.0 [1].

A v1.5.5 release version will be produced in March.

1. facebook/zstd#3517

Fixes #8391

NO_DOC=build
NO_TEST=build

(cherry picked from commit ecbbc92)
james-rms added a commit to ros2/rosbag2 that referenced this pull request Apr 5, 2023
Facebook recently published a patch update to fix a data corruption bug in their
zstd compression library. See the release notes here:

https://github.com/facebook/zstd/releases/tag/v1.5.5

See also the bug-fix patch here:

facebook/zstd#3517

Signed-off-by: James Smith <james@foxglove.dev>
herbygillot added a commit to macports/macports-ports that referenced this pull request Apr 5, 2023
- remove rare corruption patch, as fix has been merged upstream
  (facebook/zstd#3517)
bob-beck pushed a commit to openbsd/ports that referenced this pull request Apr 5, 2023
This release corrects a corruption bug in high compression mode. More
information can be found at facebook/zstd#3517.
Overview on other changes can be found at
https://github.com/facebook/zstd/releases/tag/v1.5.5.

Minor of SHLIB has been bumped because of addition of new symbols.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants
0