contrib: add xor-blocks tool to obfuscate blocks directory #32451

andrewtoth · 2025-05-08T18:02:59Z

I wrote a tool in Rust to xor the blocks directory with a random key. It was pointed out to me that there already exists some Rust code in contrib, so this might be a welcome addition to the toolkit here.

This lets you obfuscate the blocks blk.dat and rev.dat files if you synced with a version prior to v28.

It checks if a xor.dat file exists, and if it is zero it overwrites it with a non-zero random key. It then goes through each *.dat file in the blocks directory, checking if the first 4 bytes are the magic bytes. If so it reads the whole file into memory, xors all bytes with the key, then writes to a temporary file. It then renames the temporary file to the dat file it xor'd. This lets users safely run this on any blocks directory, as long as they let it completely finish once before starting bitcoind.

DrahtBot · 2025-05-08T18:03:03Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32451.

Reviews

See the guideline for information on the review process.

Type	Reviewers
Concept ACK	1440000bytes

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

contrib/xor-blocks/src/main.rs

Sjors · 2025-05-09T10:14:24Z

as long as they let it completely finish once before starting bitcoind

That seems likely to cause issues. It seems better if we track the height at which xor starts. A script could then work backwards and lower the height. And in that case it might as well be an RPC command.

laanwj · 2025-05-09T10:40:07Z

It seems better if we track the height at which xor starts

i really like this idea. "Start XORing from now on" as a default would be enough to prevent new problems, and potentially there could be a background process that updates historical blocks (which would be way less performance critical as no one is waiting for it).

That said it's more complicated and more error-prone, i understand why the decision was made to do all or nothing.

LarryRuane

This is a great tool, thanks for implementing it!

Consider obtaining the .lock file in the data directory in the same way bitcoind does. This will prevent bitcoind from starting if this tool is running, and vice versa.

Just to illustrate, if I try to start a second bitcoind, I get this error, so maybe do the same thing here.

$ bitcoind
Error: Cannot obtain a lock on directory /home/larry/.bitcoin. Bitcoin Core is probably already running.

LarryRuane · 2025-05-10T17:14:36Z

contrib/xor-blocks/src/main.rs

+    let blocks_path = if env::args().len() > 1 {
+        env::args().nth(1).expect("the arg to exist").into()


Consider accepting this argument in the same way it's specified with bitcoind and bitcoin-cli, that is, -datadir=<dir>. Besides being consistent with these existing executables, it would also be cleaner if more arguments were added in the future (because the data directory wouldn't be specified in a different way from other arguments).

Sadly, that's not really that easy because of damn Windows non-compliant UTF-16. I wrote a crate for it called parse_arg but it's soft-deprecated there. Feel free to copy-paste or just depdend on it. Maybe I'll end up putting it into another dedicated crate.

But what I'd like to see here is having proper usage page if no arguments are provided. clap or configure_me can do that but for a simple tool a bit of manual code is likely fine.

I added checking for -datadir= and --datadir=, but not datadir= or ---datadir=. Just like bitcoind :).

I'd like to see here is having proper usage page if no arguments are provided

If no args are provided, it just picks up the default datadir. IMO that's the best behavior. If you give more than 1 arg it prints out a usage instruction as an error.
Is that what you had in mind?

LarryRuane · 2025-05-10T17:35:16Z

as long as they let it completely finish once before starting bitcoind

That seems likely to cause issues. It seems better if we track the height at which xor starts. A script could then work backwards and lower the height. And in that case it might as well be an RPC command.

I have an idea for a follow-up PR (haven't started implementing yet) which would cause bitcoind to, upon opening a block file, look at the first 4 bytes, and if it's MAGIC, then don't xor anything while reading or writing to that file. But if xor-ing with xor.dat yields MAGIC, then do the xor when reading or writing the file. That is, make the xor per-file. Since this PR makes each file update atomic (by writing to a temp file then doing a rename, rather than rewriting the file in-place), each file will either be fully xored or not. So I think that would work.

That way, one could start this xor-blocks tool, interrupt it at any time (for example, suppose you unexpectedly need to use the node immediately), and safely start bitcoind. Later, you can shut down bitcoind and restart xor-blocks and let it finish. It wouldn't even have to depend on converting the blocks files in order, nor keep track of that separate state (height or block file number), which seems more robust.

1440000bytes · 2025-05-11T22:03:38Z

Concept ACK

Could be a useful tool for paranoid users who can run this tool.

Is it possible to add this feature in bitcoin core itself?

Kixunil · 2025-05-12T22:39:50Z

contrib/xor-blocks/src/main.rs

+    let blocks_path = if env::args().len() > 1 {
+        env::args().nth(1).expect("the arg to exist").into()


Sadly, that's not really that easy because of damn Windows non-compliant UTF-16. I wrote a crate for it called parse_arg but it's soft-deprecated there. Feel free to copy-paste or just depdend on it. Maybe I'll end up putting it into another dedicated crate.

But what I'd like to see here is having proper usage page if no arguments are provided. clap or configure_me can do that but for a simple tool a bit of manual code is likely fine.

contrib/xor-blocks/src/main.rs

Kixunil · 2025-05-12T22:46:33Z

contrib/xor-blocks/src/main.rs

+        loop {
+            let key: [u8; 8] = rand::random();
+            // Don't use keys with 4 bytes of leading zeros
+            // They won't let us detect the first 4 bytes of magic in the files


Why though? After "unxoring" the magic bytes just will be the same. Anyway, might still be a good idea to mask it better.

we would reapply the xor on restart in that case, since we wouldn't detect that we've processed it already.

Alternatively we could keep a counter of processed blocks - modify the xor.dat to make sure we can't start bitcoind (or @LarryRuane's lock would probably also do the same) and add a height of how many blocks we've processed so far (though that would make parallelization more difficult). When we're done we'd restore the xor.dat file (which we have to modify anyway if we're xor-ing)

Oh, I see, that's clever! But then yes, this code tries to be atomic but it isn't because it doesn't call sync_data.

contrib/xor-blocks/src/main.rs

maflcko

I wonder if -reindex could be taught to do this instead. This may be simpler for users to set. Also, it will be atomic, safe and interruptible. However, it will eat more CPU and thus take a longer time. I'd say this is probably fine, because the only users will be one-time users that have an old leftover datadir. The benefit on top would be that anyone doing a reindex for other reasons, likely will get an obfuscation "for free".

contrib/xor-blocks/src/main.rs

l0rinc · 2025-05-13T11:45:29Z

contrib/xor-blocks/src/main.rs

+
+    let mut block = fs::read(path)?;
+    block
+        .iter_mut()


would it help if we used .par_iter() here instead (not sure if that would require adding a new lib or not, but seems like a naturally parallelizable task)?

Requires the rayon dependency. But I strongly suspect the entire thing is bottlenecked on IO.

contrib/xor-blocks/src/main.rs

l0rinc · 2025-05-13T11:50:26Z

contrib/xor-blocks/src/main.rs

+        loop {
+            let key: [u8; 8] = rand::random();
+            // Don't use keys with 4 bytes of leading zeros
+            // They won't let us detect the first 4 bytes of magic in the files


we would reapply the xor on restart in that case, since we wouldn't detect that we've processed it already.

Alternatively we could keep a counter of processed blocks - modify the xor.dat to make sure we can't start bitcoind (or @LarryRuane's lock would probably also do the same) and add a height of how many blocks we've processed so far (though that would make parallelization more difficult). When we're done we'd restore the xor.dat file (which we have to modify anyway if we're xor-ing)

contrib/xor-blocks/src/main.rs

andrewtoth · 2025-05-17T22:55:43Z

Thank you @l0rinc @Sjors @laanwj @LarryRuane @1440000bytes @Kixunil @maflcko @yancyribbens for your reviews and suggestions!

The latest version changes the following:

data is XOR'd 16 bytes at a time using the doubled key as a u128. This is done using pointer casting and unsafe dereferencing, so we don't have to copy the data back and forth when converting [u8; 16] to u128.
A custom datadir option is now passed as -datadir=, similar to bitcoind.
Files are read 16 bytes at a time and the bytes are written to a temporary file immediately after XORing. This substantially reduces memory usage.
File::sync_data() is called before renaming to make the writing atomic.
The xor.dat key file is only written when overwriting a zero key, not every time the program is run. File::sync_data() is also called after writing.
The rand crate has been removed in favor of std::hash::RandomState. This removes all dependencies from Cargo.toml.
Errors are returned in appropriate places where success was returned previously.
Various other improvements to syntax.

andrewtoth · 2025-05-17T23:16:58Z

Consider obtaining the .lock file in the data directory in the same way bitcoind does. This will prevent bitcoind from starting if this tool is running, and vice versa.

@LarryRuane great suggestion. Unfortunately file locking is still experimental in Rust (?), so users would have to run nightly to be able to run the program. I think we should wait until it's stabilized instead of pulling in third party deps to do the locking.

It seems better if we track the height at which xor starts

i really like this idea. "Start XORing from now on" as a default would be enough to prevent new problems, and potentially there could be a background process that updates historical blocks (which would be way less performance critical as no one is waiting for it).

That said it's more complicated and more error-prone, i understand why the decision was made to do all or nothing.

@Sjors @laanwj There are many edge cases to track here. If you had to service an RPC or p2p request for a block from a pre-xor'd file, you'd have to switch on the fly to xoring vs not. Similar if you got reorged from a block in an xor'd file to not. If you requested a pruned block from a peer and now store it in a newly xor'd file.

I think all or nothing is a more sane approach for something that should only be a temporary solution (ideally everyone resyncs at some point).

Is it possible to add this feature in bitcoin core itself?

@1440000bytes Yes, I suppose it's possible. It would be similar to how the UTXO db was migrated from per-tx values to per-output. It was a 10 minute or so migration when the UTXO set was much smaller. This tool unfortunately will take a while longer (20 minutes or so on my NVME, many hours for spinning disk drives).

I wonder if -reindex could be taught to do this instead.

@maflcko That would be convenient, but a reindex is a much longer task than this tool. So I don't think "instead", but rather "also".

contrib/xor-blocks/src/main.rs

Co-Authored-By: Larry Ruane <larryruane@gmail.com>

Kixunil

Sadly, this new implementation is quite broken in multiple ways.

Kixunil · 2025-05-27T09:05:50Z

contrib/xor-blocks/src/main.rs

+const MAGIC: [u8; 4] = [0xf9, 0xbe, 0xb4, 0xd9];
+const XOR_FILE_NAME: &str = "xor.dat";
+
+#[derive(Debug)]


I see you're using this to display the messages from main. However that will lead to horrible messages containing Error { msg: "..." }. I suggest you implement Debug manually instead just displaying the string.

Kixunil · 2025-05-27T09:07:50Z

contrib/xor-blocks/src/main.rs

+
+impl From<io::Error> for Error {
+    fn from(value: io::Error) -> Self {
+        Error::new(value.to_string())


This loses error sources. The proper way to do this is to create a mutable string and then append the sources in a loop with ": " as a separator. (You could also do cargo-style "\n\tcaused by: ").

Kixunil · 2025-05-27T09:09:59Z

contrib/xor-blocks/src/main.rs

+    let start = Instant::now();
+
+    let data_dir_path = if env::args().len() == 2 {
+        let arg = env::args().nth(1).expect("the arg to exist");


Using args is problematic for paths because it breaks for some values. args_os is the appropriate solution.

Kixunil · 2025-05-27T09:20:47Z

contrib/xor-blocks/src/main.rs

+        }
+        let (_, datadir) = arg.split_once("-datadir=").expect("split to be some");
+        datadir.into()
+    } else if env::args().len() == 1 {


This kind of c-style handling is not idiomatic Rust. I'd do something like:

let mut args = std::env::args_os(); args.next().ok_or("Missing program name")?; let data_dir_path = if let Some(first_arg) = args.next() { let data_dir = if first_arg == "-datadir" || first_arg == "--datadir" { args.next().ok_or("the datadir argument is missing a value")? } else if let Some(data_dir) = first_arg.strip_prefix("-datadir=") { data_dir.to_owned() } else if let Some(data_dir) = first_arg.strip_prefix("--datadir=") { data_dir.to_owned() } else { return Err(Error::new(format!("Unknown argument: {:?}", first_arg)); }; if args.next().is_some() { return Err("Too many arguments".into()); } data_dir } else { // fallback to home here. }

Kixunil · 2025-05-27T09:23:24Z

contrib/xor-blocks/src/main.rs

+        return Err("Invalid number of arguments. Run with `cargo run --release` if using default datadir. Otherwise run with `cargo run --release -- -datadir=<custom datadir>`.".into());
+    };
+
+    let blocks_path = data_dir_path.join("blocks");


Reminds me that non-mainnet use paths like /regtest/blocks, perhaps we should accept -chain argument as well.

Kixunil · 2025-05-27T09:37:19Z

contrib/xor-blocks/src/main.rs

+            }
+            let mut out_file = File::options().write(true).open(&key_path)?;
+            out_file.write_all(&key)?;
+            out_file.sync_data()?;


To be atomic, this should create a temporary file first, write it, sync it, and then rename.

Does it matter though? If it crashes before sync_all we either have all zero in xor.dat or a random key that has not yet been used for xoring anything. On next run it will either rewrite a new key if still zero or start using the previously written random key.

On the next run it'll crash failing to convert into [u8; 8]. While not catastrophic, it makes the program more annoying for end users.

Kixunil · 2025-05-27T09:44:51Z

contrib/xor-blocks/src/main.rs

+
+fn xor_file(path: &Path, key: u128) -> Result<(), io::Error> {
+    let mut buf_u128 = 0u128;
+    let buf = unsafe { (&mut buf_u128 as *mut _ as *mut [u8; 16]).as_mut() }.unwrap();


This seems over-complicated and not needed. You're not saing anything by "avoiding the conversion" - the optimizer would take care of that for you.

Kixunil · 2025-05-27T09:45:33Z

contrib/xor-blocks/src/main.rs

+
+    loop {
+        buf_u128 ^= key;
+        writer.write_all(buf)?;


This is clearly unsound, you're doing shared mutability which is forbidden in Rust.

Kixunil · 2025-05-27T09:50:08Z

contrib/xor-blocks/src/main.rs

+            writer.write_all(&buf[..n])?;
+            break;
+        }
+    }


The read API doesn't guarantee that n < 16 means EOF, so the loop may break prematurely leading to garbage data.

It's also not as fast as it could be - reading larger chunks (e.g. 4096) and then processing them with chunks_exact would allow auto-vectorization. Also you wouldn't need BufReader in that case.

Kixunil · 2025-05-27T09:51:23Z

contrib/xor-blocks/src/main.rs

+        }
+    }
+
+    writer.into_inner()?.sync_data()?;


You need to call flush() before into_inner.

into_inner flushes before returning. https://doc.rust-lang.org/std/io/struct.BufWriter.html#method.into_inner

~~Yes, but it loses the error information so you'd silently continue even if write failed leading to data corruption.~~

Sorry, I misremembered what it does.

Kixunil · 2025-05-27T09:54:56Z

Unfortunately file locking is still experimental in Rust (?), so users would have to run nightly to be able to run the program. I think we should wait until it's stabilized instead of pulling in third party deps to do the locking.

You can just define extern "C" {} functions to call them the same way bitcoind does. It might even be the more correct approach anyway because there are three (!!!) kinds of locks on Unix and you must not mix them, and if Rust uses the same as bitcoind that's just luck.

maflcko · 2025-05-27T10:37:50Z

I wonder if -reindex could be taught to do this instead.

@maflcko That would be convenient, but a reindex is a much longer task than this tool. So I don't think "instead", but rather "also".

Maybe just from a code-review/maintenance perspective this is preferable, given that the feature implemented here is quite tricky to get right for all edge cases by (re-)implementing the C++ logic.

l0rinc · 2025-05-27T15:45:28Z

@andrewtoth, if you've closed it on purpose, do you want to explain the reason? Is it because you agree with @maflcko that making it a built-in functionality might be simpler? I'm also leaning in that direction, especially after #31144, where obfuscation should become a lot cheaper.

Kixunil · 2025-05-27T16:23:43Z

A bigger reason to do this in bitcoind is that it may be able to run in the background and lead to a better user experience.

andrewtoth · 2025-05-29T00:31:46Z

@l0rinc yes, I agree with other reviewers a feature like this should be added directly to bitcoind. If needed, this tool is still available to be run at github.com/andrewtoth/blocks-xor.

DrahtBot added the Scripts and tools label May 8, 2025

bitcoin deleted a comment from Bitcoin3554 May 8, 2025

l0rinc reviewed May 8, 2025

View reviewed changes

contrib/xor-blocks/src/main.rs Outdated Show resolved Hide resolved

LarryRuane reviewed May 10, 2025

View reviewed changes

l0rinc mentioned this pull request May 11, 2025

Remove arbitrary limits on OP_Return (datacarrier) outputs #32359

Closed

Kixunil reviewed May 12, 2025

View reviewed changes

maflcko reviewed May 13, 2025

View reviewed changes

contrib/xor-blocks/src/main.rs Outdated Show resolved Hide resolved

l0rinc reviewed May 13, 2025

View reviewed changes

yancyribbens reviewed May 13, 2025

View reviewed changes

contrib/xor-blocks/src/main.rs Outdated Show resolved Hide resolved

andrewtoth force-pushed the xor-blocks branch from dc41366 to b45ace7 Compare May 17, 2025 22:45

andrewtoth force-pushed the xor-blocks branch 3 times, most recently from a7deddb to 15aeb39 Compare May 18, 2025 14:56

maflcko reviewed May 18, 2025

View reviewed changes

contrib/xor-blocks/src/main.rs Outdated Show resolved Hide resolved

andrewtoth force-pushed the xor-blocks branch 2 times, most recently from 7d0a90a to 0c23055 Compare May 18, 2025 16:54

contrib: add xor-blocks tool to obfuscate blocks directory

c2acb6f

Co-Authored-By: Larry Ruane <larryruane@gmail.com>

andrewtoth force-pushed the xor-blocks branch from 0c23055 to c2acb6f Compare May 25, 2025 17:11

Kixunil reviewed May 27, 2025

View reviewed changes

andrewtoth closed this May 27, 2025

		let blocks_path = if env::args().len() > 1 {
		env::args().nth(1).expect("the arg to exist").into()

contrib: add xor-blocks tool to obfuscate blocks directory #32451

contrib: add xor-blocks tool to obfuscate blocks directory #32451

Uh oh!

Conversation

Uh oh!

Uh oh! 8000

Code Coverage & Benchmarks

Reviews

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment