-
Notifications
You must be signed in to change notification settings - Fork 37.4k
contrib: add xor-blocks tool to obfuscate blocks directory #32451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32451. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. |
That seems likely to cause issues. It seems better if we track the height at which xor starts. A script could then work backwards and lower the height. And in that case it might as well be an RPC command. |
i really like this idea. "Start XORing from now on" as a default would be enough to prevent new problems, and potentially there could be a background process that updates historical blocks (which would be way less performance critical as no one is waiting for it). That said it's more complicated and more error-prone, i understand why the decision was made to do all or nothing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great tool, thanks for implementing it!
Consider obtaining the .lock
file in the data directory in the same way bitcoind
does. This will prevent bitcoind
from starting if this tool is running, and vice versa.
Just to illustrate, if I try to start a second bitcoind
, I get this error, so maybe do the same thing here.
$ bitcoind
Error: Cannot obtain a lock on directory /home/larry/.bitcoin. Bitcoin Core is probably already running.
contrib/xor-blocks/src/main.rs
Outdated
let blocks_path = if env::args().len() > 1 { | ||
env::args().nth(1).expect("the arg to exist").into() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider accepting this argument in the same way it's specified with bitcoind
and bitcoin-cli
, that is, -datadir=<dir>
. Besides being consistent with these existing executables, it would also be cleaner if more arguments were added in the future (because the data directory wouldn't be specified in a different way from other arguments).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly, that's not really that easy because of damn Windows non-compliant UTF-16. I wrote a crate for it called parse_arg
but it's soft-deprecated there. Feel free to copy-paste or just depdend on it. Maybe I'll end up putting it into another dedicated crate.
But what I'd like to see here is having proper usage page if no arguments are provided. clap
or configure_me
can do that but for a simple tool a bit of manual code is likely fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added checking for -datadir=
and --datadir=
, but not datadir=
or ---datadir=
. Just like bitcoind
:).
I'd like to see here is having proper usage page if no arguments are provided
If no args are provided, it just picks up the default datadir. IMO that's the best behavior. If you give more than 1 arg it prints out a usage instruction as an error.
Is that what you had in mind?
I have an idea for a follow-up PR (haven't started implementing yet) which would cause That way, one could start this |
Concept ACK Could be a useful tool for paranoid users who can run this tool. Is it possible to add this feature in bitcoin core itself? |
contrib/xor-blocks/src/main.rs
Outdated
let blocks_path = if env::args().len() > 1 { | ||
env::args().nth(1).expect("the arg to exist").into() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly, that's not really that easy because of damn Windows non-compliant UTF-16. I wrote a crate for it called parse_arg
but it's soft-deprecated there. Feel free to copy-paste or just depdend on it. Maybe I'll end up putting it into another dedicated crate.
But what I'd like to see here is having proper usage page if no arguments are provided. clap
or configure_me
can do that but for a simple tool a bit of manual code is likely fine.
loop { | ||
let key: [u8; 8] = rand::random(); | ||
// Don't use keys with 4 bytes of leading zeros | ||
// They won't let us detect the first 4 bytes of magic in the files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why though? After "unxoring" the magic bytes just will be the same. Anyway, might still be a good idea to mask it better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we would reapply the xor on restart in that case, since we wouldn't detect that we've processed it already.
Alternatively we could keep a counter of processed blocks - modify the xor.dat
to make sure we can't start bitcoind
(or @LarryRuane's lock would probably also do the same) and add a height of how many blocks we've processed so far (though that would make parallelization more difficult). When we're done we'd restore the xor.dat
file (which we have to modify anyway if we're xor-ing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see, that's clever! But then yes, this code tries to be atomic but it isn't because it doesn't call sync_data
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if -reindex
could be taught to do this instead. This may be simpler for users to set. Also, it will be atomic, safe and interruptible. However, it will eat more CPU and thus take a longer time. I'd say this is probably fine, because the only users will be one-time users that have an old leftover datadir. The benefit on top would be that anyone doing a reindex for other reasons, likely will get an obfuscation "for free".
contrib/xor-blocks/src/main.rs
Outdated
|
||
let mut block = fs::read(path)?; | ||
block | ||
.iter_mut() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it help if we used .par_iter()
here instead (not sure if that would require adding a new lib or not, but seems like a naturally parallelizable task)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requires the rayon
dependency. But I strongly suspect the entire thing is bottlenecked on IO.
loop { | ||
let key: [u8; 8] = rand::random(); | ||
// Don't use keys with 4 bytes of leading zeros | ||
// They won't let us detect the first 4 bytes of magic in the files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we would reapply the xor on restart in that case, since we wouldn't detect that we've processed it already.
Alternatively we could keep a counter of processed blocks - modify the xor.dat
to make sure we can't start bitcoind
(or @LarryRuane's lock would probably also do the same) and add a height of how many blocks we've processed so far (though that would make parallelization more difficult). When we're done we'd restore the xor.dat
file (which we have to modify anyway if we're xor-ing)
Thank you @l0rinc @Sjors @laanwj @LarryRuane @1440000bytes @Kixunil @maflcko @yancyribbens for your reviews and suggestions! The latest version changes the following:
|
@LarryRuane great suggestion. Unfortunately file locking is still experimental in Rust (?), so users would have to run nightly to be able to run the program. I think we should wait until it's stabilized instead of pulling in third party deps to do the locking.
@Sjors @laanwj There are many edge cases to track here. If you had to service an RPC or p2p request for a block from a pre-xor'd file, you'd have to switch on the fly to xoring vs not. Similar if you got reorged from a block in an xor'd file to not. If you requested a pruned block from a peer and now store it in a newly xor'd file. I think all or nothing is a more sane approach for something that should only be a temporary solution (ideally everyone resyncs at some point).
@1440000bytes Yes, I suppose it's possible. It would be similar to how the UTXO db was migrated from per-tx values to per-output. It was a 10 minute or so migration when the UTXO set was much smaller. This tool unfortunately will take a while longer (20 minutes or so on my NVME, many hours for spinning disk drives).
@maflcko That would be convenient, but a reindex is a much longer task than this tool. So I don't think "instead", but rather "also". |
a7deddb
to
15aeb39
Compare
7d0a90a
to
0c23055
Compare
Co-Authored-By: Larry Ruane <larryruane@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly, this new implementation is quite broken in multiple ways.
const MAGIC: [u8; 4] = [0xf9, 0xbe, 0xb4, 0xd9]; | ||
const XOR_FILE_NAME: &str = "xor.dat"; | ||
|
||
#[derive(Debug)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you're using this to display the messages from main
. However that will lead to horrible messages containing Error { msg: "..." }
. I suggest you implement Debug
manually instead just displaying the string.
|
||
impl From<io::Error> for Error { | ||
fn from(value: io::Error) -> Self { | ||
Error::new(value.to_string()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This loses error sources. The proper way to do this is to create a mutable string and then append the sources in a loop with ": "
as a separator. (You could also do cargo-style "\n\tcaused by: "
).
let start = Instant::now(); | ||
|
||
let data_dir_path = if env::args().len() == 2 { | ||
let arg = env::args().nth(1).expect("the arg to exist"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using args
is problematic for paths because it breaks for some values. args_os
is the appropriate solution.
} | ||
let (_, datadir) = arg.split_once("-datadir=").expect("split to be some"); | ||
datadir.into() | ||
} else if env::args().len() == 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This kind of c-style handling is not idiomatic Rust. I'd do something like:
let mut args = std::env::args_os();
args.next().ok_or("Missing program name")?;
let data_dir_path = if let Some(first_arg) = args.next() {
let data_dir = if first_arg == "-datadir" || first_arg == "--datadir" {
args.next().ok_or("the datadir argument is missing a value")?
} else if let Some(data_dir) = first_arg.strip_prefix("-datadir=") {
data_dir.to_owned()
} else if let Some(data_dir) = first_arg.strip_prefix("--datadir=") {
data_dir.to_owned()
} else {
return Err(Error::new(format!("Unknown argument: {:?}", first_arg));
};
if args.next().is_some() {
return Err("Too many arguments".into());
}
data_dir
} else {
// fallback to home here.
}
return Err("Invalid number of arguments. Run with `cargo run --release` if using default datadir. Otherwise run with `cargo run --release -- -datadir=<custom datadir>`.".into()); | ||
}; | ||
|
||
let blocks_path = data_dir_path.join("blocks"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reminds me that non-mainnet use paths like /regtest/blocks
, perhaps we should accept -chain
argument as well.
} | ||
let mut out_file = File::options().write(true).open(&key_path)?; | ||
out_file.write_all(&key)?; | ||
out_file.sync_data()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be atomic, this should create a temporary file first, write it, sync it, and then rename.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it matter though? If it crashes before sync_all we either have all zero in xor.dat or a random key that has not yet been used for xoring anything. On next run it will either rewrite a new key if still zero or start using the previously written random key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the next run it'll crash failing to convert into [u8; 8]
. While not catastrophic, it makes the program more annoying for end users.
|
||
fn xor_file(path: &Path, key: u128) -> Result<(), io::Error> { | ||
let mut buf_u128 = 0u128; | ||
let buf = unsafe { (&mut buf_u128 as *mut _ as *mut [u8; 16]).as_mut() }.unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems over-complicated and not needed. You're not saing anything by "avoiding the conversion" - the optimizer would take care of that for you.
|
||
loop { | ||
buf_u128 ^= key; | ||
writer.write_all(buf)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is clearly unsound, you're doing shared mutability which is forbidden in Rust.
writer.write_all(&buf[..n])?; | ||
break; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The read
API doesn't guarantee that n < 16
means EOF, so the loop may break prematurely leading to garbage data.
It's also not as fast as it could be - reading larger chunks (e.g. 4096) and then processing them with chunks_exact
would allow auto-vectorization. Also you wouldn't need BufReader
in that case.
} | ||
} | ||
|
||
writer.into_inner()?.sync_data()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to call flush()
before into_inner
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
into_inner
flushes before returning. https://doc.rust-lang.org/std/io/struct.BufWriter.html#method.into_inner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but it loses the error information so you'd silently continue even if write failed leading to data corruption.
Sorry, I misremembered what it does.
You can just define |
Maybe just from a code-review/maintenance perspective this is preferable, given that the feature implemented here is quite tricky to get right for all edge cases by (re-)implementing the C++ logic. |
@andrewtoth, if you've closed it on purpose, do you want to explain the reason? Is it because you agree with @maflcko that making it a built-in functionality might be simpler? I'm also leaning in that direction, especially after #31144, where obfuscation should become a lot cheaper. |
A bigger reason to do this in bitcoind is that it may be able to run in the background and lead to a better user experience. |
@l0rinc yes, I agree with other reviewers a feature like this should be added directly to bitcoind. If needed, this tool is still available to be run at github.com/andrewtoth/blocks-xor. |
I wrote a tool in Rust to xor the blocks directory with a random key. It was pointed out to me that there already exists some Rust code in contrib, so this might be a welcome addition to the toolkit here.
This lets you obfuscate the blocks blk.dat and rev.dat files if you synced with a version prior to v28.
It checks if a
xor.dat
file exists, and if it is zero it overwrites it with a non-zero random key. It then goes through each*.dat
file in the blocks directory, checking if the first 4 bytes are the magic bytes. If so it reads the whole file into memory, xors all bytes with the key, then writes to a temporary file. It then renames the temporary file to the dat file it xor'd. This lets users safely run this on any blocks directory, as long as they let it completely finish once before starting bitcoind.