10000 GitHub - rshariffdeen/Hercule: An inter-package analysis techinque for supply chain protection, that combines three analyses to identify malicious packages with high precision and high recall
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

An inter-package analysis techinque for supply chain protection, that combines three analyses to identify malicious packages with high precision and high recall

License

Notifications You must be signed in to change notification settings

rshariffdeen/Hercule

Repository files navigation

Hercule

Docker Pulls DOI

Artifact Available Artifact Functional Artifact Reusable

An inter-package analysis technique for supply chain protection, that combines three analyses to identify malicious packages with high precision and high recall. Our approach incorporates an integrity check based on AST differentiation analysis, that can identify discrepancies between the distributed artifacts and the source repository. We then utilize CodeQL to detect malicious behavior using data-flow analysis. Lastly, we implement a transitive dependency analysis to identify malicious packages installed as part of the dependency resolution.

Hercule is a supply chain protection tool for Python packages that can accurately identify benign and malicious packages. Hercule is:

  • Accurate: Achieves a f1-score of 0.95 scanning nearly 5000 packages
  • Extensible: Hercule is designed so that it can be easily extended to support advanced behavior analysis using CodeQL queries
  • Scalable: Hercule can analyze large Python packages including but not limited to scipy, numpy, pandas etc

Note: See the Experiment Guide to replicate the experiments for ICSE'25.

Dependencies

  • Python - 3.8.10+
  • Docker - 27.3.1+
  • CodeQL - 2.15.0+
  • GumTree - 3.1.0+
  • PythonParser - 1.4+
  • Parso - 0.8.3+

Build using Dockerfile

Building Hercule is easy with the provided Dockerfile, which has all the build dependencies and libraries required. We provide two Dockerfiles

  • Dockerfile: This will build the environment necessary to run the stand-alone tool
  • experiments/ICSE25/Dockerfile: This environment includes all necessary dependencies to reproduce the experiments

You can use the following command to build hercule image:

cd Hercule
docker build -t rshariffdeen/hercule .
# start docker
docker run -it rshariffdeen/hercule /bin/bash              

The experiments/ICSE25/Dockerfile depends on the hercule base image. The instructions to build and execute experiments/ICSE25/Dockerfile can be found here.

Example Usage

Single Package

Hercule supports analyzing packages in different formats, i.e., wheels, zips, .tar.gz files, by passing the flag --file/-F . To invoke the tool, call

hercule -F target_package

Batch Mode

Hercule can be invoked on a collection of collected packages using the --dir/-D flag to allow for batch mode processing, e.g.

hercule -D directory_of_packages

To allow for a comparison with similar tools, Hercule has two additional modes - Bandit4Mal (enabled using the flag --banditmal) and LastPyMile(enabled using the flag --lastpymile), which act as the respective tools. See Example Page for a detailed description

Limitations

  • Current implementation is based on CodeQL, and thus inherits the limitations of that framework. Please note that CodeQL license is limited to academic research purpose only.
  • Hercule does not perform dynamic analysis

Bugs

Hercule should be considered alpha-quality software. Bugs can be reported here:

https://github.com/rshariffdeen/Hercule/issues

Documentation

Contributions

We welcome contributions to improve this work, see details

Publication

Detecting Python Malware in the Software Supply Chain with Program Analysis
Ridwan Shariffdeen, Behnaz Hassanshahi, Martin Mirchev, Ali El Husseini, Abhik Roychoudhury
47th International Conference on Software Engineering (ICSE), Software Engineering in Practice track (SEIP)

Acknowledgements

This research was supported by the National Research Foundation, Singapore, and Cyber Security Agency of Singapore under its National Cybersecurity R&D Programme (Fuzz Testing NRF-NCR25-Fuzz-0001). Any opinions, findings and conclusions, or recommendations expressed in this material are those of the author(s) and do not reflect the views of National Research Foundation, Singapore, and Cyber Security Agency of Singapore.

License

This project is licensed under the MIT License - see the LICENSE file for details

Citation

If you use Hercule in your research work, we would highly appreciate it if you cited the following paper:

@inproceedings{hercule,
author = {Shariffdeen, Ridwan and Hassanshahi, Behnaz and Mirchev, Martin  and Husseini, Ali El and Roychoudhury, Abhik},
title = {Detecting Python Malware in the Software Supply Chain with Program Analysis},
year = {2025},
doi = {},
booktitle = {2025 IEEE/ACM 47th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion)},
pages = {},
numpages = {},
keywords = {Supply Chain Protection, Program Analysis, Malware Detection},
location = {Ottawa, Canada},
series = {ICSE SEIP 2025}
}

About

An inter-package analysis techinque for supply chain protection, that combines three analyses to identify malicious packages with high precision and high recall

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •  
0