8000 GitHub - a0x77n/orbit-dataset: Dataset Repository for Research Project "Orbit"
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

a0x77n/orbit-dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dataset for Research Project "Orbit"

This repository contains all the data used in the evaluation of the research paper "On the Role of Pre-trained Embeddings in Binary Code Analysis" published in the Proceedings of the 19th ACM Asia Conference on Computer and Communications Security.

Raw Debian Packages

The debs directory contains the raw Debian packages used in our evaluation. Multiple compilation configurations exist for each source package, and depending on the source package, multiple binary packages may be generated.

The directory structure is as follows: ./debs/<source_pkg>/<compilation_id>/<binary_pkg>.deb.

JSON Dump

The fundump directory contains a JSON file for each binary package from the debs directory, including data and labels for each function extracted from the package. It also contains additional metadata.

Each file consists of a list of JSON objects representing a function. Below is an example structure:

{
    "src_deb": "...",
    "deb": "...",
    "version": "...",
    "arch": "...",
    "producer": "...",
    "optimization_option": "...",
    "compiler": "...",
    "binary": "...",
    "function": "...",
    "full_path": "...",
    "line": "...",
    "column": "...",
    "bytes": "...",
    "addr": "...",
    "signature": {
        "return_type": "...",
        "parameter_list": [
            {
                "index": 1,
                "name": "...",
                "type": "..."
            },
            {
                "index": 2,
                "name": "...",
                "type": "..."
            }
        ]
    }
}

The directory structure is as follows: ./fundump/<source_pkg>/<compilation_id>/<binary_pkg>.json.xz.

Function Similarity Dataset

The directory function-similarity contains the JSON files train-data.json.xz and test-data.json.xz with the training and test data, respectively, for the function similarity experiments.

Each file consists of a list of labeled function pairs. An example structure is shown below:

{
    "function1": {
        "bytes": "...",
        "addr": "..."
    },
    "function2": {
        "bytes": "...",
        "addr": "..."
    },
    "label": "..."
}

Cloning the Repository with Git LFS

This repository uses Git LFS (Large File Storage) to manage the dataset files. To ensure that you properly clone all the large files along with the rest of the repository, please make sure you have Git LFS installed.

Contact

For any inquiries, please contact Alwin Maier at mlsec-alwin@posteo.de.

About

Dataset Repository for Research Project "Orbit"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0