Description
#377 added support for recovering dependency lists from Rust binaries built with cargo auditable
. However, it is not universally adopted; while many Linux distributions build all their packages with it, there are plenty of non-auditable binaries out there.
cargo audit
scanner has long supported a fallback mode for scanning binaries that's used when cargo auditable
data is not available. It relies on the observation that in Rust, the panic messages include paths to the source file, which for packages coming from a package registry contain the name and version of the package in a really predictable format. Not all packages panic, so not the full list can be recovered; but on the plus side, there are hardly any false positives.
The Rust implementation of this approach can be used as reference. Is really straightforward, with just ~30 SLOC: https://github.com/rustsec/rustsec/blob/1e9b1a26401a07ff10045915619c6e538861eeef/quitters/src/lib.rs
Performance considerations
I've measured the Rust implementation of the extractor. With a warm filesystem cache it only takes 7ms to scan a 15MB Rust executable. That was measured with hyperfine
and includes process startup costs and regex compilation time, which will be amortized when scanning multiple files. So CPU load is not a concern, but disk throughput could become a bottleneck and may be worth optimizing for.
#377 includes an extension blacklist that entirely skips some files that aren't going to be executable. That is a good first step.
Another easy optimization is only loading the first 8 bytes of a file and inspecting the magic number. If it is not a PE, ELF, Mach-O or WASM, then the file can be quickly discarded and the rest of it does not have to be loaded. The extraction library used by #377 already implements this, as does cargo audit
; this optimization should be reused.
Further optimizations are possible, but come with diminishing returns and trade-offs. I could not find a simple, fast and robust way to tell if a
53A1
given executable is a Rust one or not. Loading only the symbol table from disk and checking it for the presence of rust_eh_personality
could be one way, but requires some platform-specific binary parsing, and I am not 100% confident that this symbol is always present. It is also possible to only load and scan the binary section where these strings should reside, such as .rodata
in ELF, but this relies on an implementation detail of where the compiler and linker place these strings, once again requires different handling for different platforms, and I'm not confident these will not change in the future. So I suggest pursuing these only if the initial implementation proves to be too slow in practice.