hashscan

hashscan is a utility for identifying hashes and other strings within large files. Hashscan utilizes Intel Hyperscan for high performance regex pattern matching.

hashscan

Hardware Requirements

x86 32-bit or 64-bit CPU
- Intel Streaming SIMD Extensions 3 (SSE3)
- Intel Streaming SIMD Extensions 4.2 (SSE4.2) optional
- POPCNT optional
- Bit Manipulation Instructions BMI/BMI2 optional
- Intel Advanced Vector Extensions 2 (AVX2) optional
- Intel Advanced Vector Extensions 512 (AVX-512) optional

Hyperscan can make use of SSE4.2, POPCNT, BMI/BMI2, and AVX2 if present; these are determined at library compile time.

There might possibly be plans to support ARM by utilizing Vectorscan for ARM builds.

Dependencies

Intel Hyperscan
- CMake >= 2.8.11
- Ragel 6.9
- Python 2.7
- Boost >= 1.57

Building

Building hashscan first requires building hyperscan. Hyperscan is added as a git submodule within this repo. Follow the Hyperscan build instructions

mkdir -p lib/hyperscan                            # the directory to build hyperscan
cd lib/hyperscan 
cmake [-G <generator>] [options] ../../hyperscan  # Use build system of your liking, by default is make.
make -j<jobs>                                     # Build hyperscan, in this line using make in parallel as the generator. Use the generator you chose
./bin/unit-hyperscan                              # Run hyperscan unit tests

Compiling Hyperscan depends on a recent version of the Boost C++ header library. If the Boost libraries are installed on the build machine in the usual paths, CMake will find them. If the Boost libraries are not installed, the location of the Boost source tree can be specified during the CMake configuration step using the BOOST_ROOT variable (described below).

Another alternative is to put a copy of (or a symlink to) the boost subdirectory in hyperscan/include/boost.

For example: for the Boost-1.59.0 release:

ln -s boost_1_59_0/boost hyperscan/include/boost

Refer to the hyperscan instruction for more detailed information regarding building hyperscan to your liking.

Once hyperscan is build, you can build hashscan by

cd <repo root>
make

Usage

You can use hashscan from the command line by

hashscan <file to scan>

Performance Metrics

The following figures were measured using Hyperfine on a Intel i9-13900H @ 5.400 GHz. The file provided is a dataset of 75000 lines for a total of 2.9Mb in size.

Time (mean ± σ):      3.061 s ±  0.103 s    [User: 3.040 s, System: 0.021 s]
  Range (min … max):    2.750 s …  3.239 s    100 runs

  runs:        100
  mean:      3.061 s
  stddev:    0.103 s
  median:    3.033 s
  min:       2.750 s
  max:       3.239 s

  percentiles:
     P_05 .. P_95:    2.919 s .. 3.178 s
     P_25 .. P_75:    2.989 s .. 3.149 s  (IQR = 0.160 s)

Testing

You can run tests for the supported patterns by running the python scripts in the test folder. These will generate a textfile that can be passed to hashscan. For example, to test SHA256

python3 test/sha256/generate_sha256.py >> test/sha256/sha256_.txt 
hashscan test/sha256/sha256_test.txt

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
doc		doc
hyperscan @ bc3b191		hyperscan @ bc3b191
src		src
test		test
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENCE		LICENCE
Makefile		Makefile
README.md		README.md

Uh oh!

Repository files navigation

hashscan

Hardware Requirements

Dependencies

Building

Usage

Performance Metrics

Testing

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

AeybelV/hashscan

Folders and files

Latest commit

History

Repository files navigation

hashscan

Hardware Requirements

Dependencies

Building

Usage

Performance Metrics

Testing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages