An inter-package analysis technique for supply chain protection, that combines three analyses to identify malicious packages with high precision and high recall. Our approach incorporates an integrity check based on AST differentiation analysis, that can identify discrepancies between the distributed artifacts and the source repository. We then utilize CodeQL to detect malicious behavior using data-flow analysis. Lastly, we implement a transitive dependency analysis to identify malicious packages installed as part of the dependency resolution.
Hercule is a supply chain protection tool for Python packages that can accurately identify benign and malicious packages. Hercule is:
- Accurate: Achieves a f1-score of 0.95 scanning nearly 5000 packages
- Extensible: Hercule is designed so that it can be easily extended to support advanced behavior analysis using CodeQL queries
- Scalable: Hercule can analyze large Python packages including but not limited to scipy, numpy, pandas etc
Note: See the Experiment Guide to replicate the experiments for ICSE'25.
- Python - 3.8.10+
- Docker - 27.3.1+
- CodeQL - 2.15.0+
- GumTree - 3.1.0+
- PythonParser - 1.4+
- Parso - 0.8.3+
Building Hercule is easy with the provided Dockerfile, which has all the build dependencies and libraries required. We provide two Dockerfiles
- Dockerfile: This will build the environment necessary to run the stand-alone tool
- experiments/ICSE25/Dockerfile: This environment includes all necessary dependencies to reproduce the experiments
You can use the following command to build hercule image:
cd Hercule
docker build -t rshariffdeen/hercule .
# start docker
docker run -it rshariffdeen/hercule /bin/bash
The experiments/ICSE25/Dockerfile depends on the hercule base image. The instructions to build and execute experiments/ICSE25/Dockerfile can be found here.
Hercule supports analyzing packages in different formats, i.e., wheels, zips, .tar.gz files, by passing the flag --file/-F
. To invoke the tool, call
hercule -F target_package
Hercule can be invoked on a collection of collected packages using the --dir/-D
flag to allow for batch mode processing, e.g.
hercule -D directory_of_packages
To allow for a comparison with similar tools, Hercule has two additional modes - Bandit4Mal (enabled using the flag --banditmal
) and LastPyMile(enabled using the flag --lastpymile
), which act as the respective tools.
See Example Page for a detailed description
- Current implementation is based on CodeQL, and thus inherits the limitations of that framework. Please note that CodeQL license is limited to academic research purpose only.
- Hercule does not perform dynamic analysis
Hercule should be considered alpha-quality software. Bugs can be reported here:
https://github.com/rshariffdeen/Hercule/issues
We welcome contributions to improve this work, see details
Detecting Python Malware in the Software Supply Chain with Program Analysis
Ridwan Shariffdeen, Behnaz Hassanshahi, Martin Mirchev, Ali El Husseini, Abhik Roychoudhury
47th International Conference on Software Engineering (ICSE), Software Engineering in Practice track (SEIP)
This research was supported by the National Research Foundation, Singapore, and Cyber Security Agency of Singapore under its National Cybersecurity R&D Programme (Fuzz Testing NRF-NCR25-Fuzz-0001). Any opinions, findings and conclusions, or recommendations expressed in this material are those of the author(s) and do not reflect the views of National Research Foundation, Singapore, and Cyber Security Agency of Singapore.
This project is licensed under the MIT License - see the LICENSE file for details
If you use Hercule in your research work, we would highly appreciate it if you cited the following paper:
@inproceedings{hercule,
author = {Shariffdeen, Ridwan and Hassanshahi, Behnaz and Mirchev, Martin and Husseini, Ali El and Roychoudhury, Abhik},
title = {Detecting Python Malware in the Software Supply Chain with Program Analysis},
year = {2025},
doi = {},
booktitle = {2025 IEEE/ACM 47th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion)},
pages = {},
numpages = {},
keywords = {Supply Chain Protection, Program Analysis, Malware Detection},
location = {Ottawa, Canada},
series = {ICSE SEIP 2025}
}