Introduction

This project constructs a knowledge graph based on data from the CVE Project's cvelistV5 repository. It parses the raw JSON file information into a Neo4j graph database that can be used for a variety of purposes.

Note: when constructing the complete graph for the first time, expect to leave the program running for a very long time. During testing the overall rate achieved was about 4.5 CVEs per second, resulting in an overall execution time of ~17.7 hours. To avoid needing to go through this lengthy process, you can simply load the most recent neo4j.dump file into Neo4j - instructions below.

For a detailed description of the project, see the paper.

Schema

Usage

There are two usage scenarios:

Viewing the existing database dump: this does not require you to run the script or set up the environment. You will just load the database dump and get going.
Constructing the graph: either start from scratch or update an existing graph. This involves running the script, and can take a long time (17+ hours) if starting from scratch.

Viewing the database dump

The database dump file can be found in the /dump directory of the repository.

Docker

If you want to run Neo4j as a standalone Docker container, you can execute the following commands:

Load the dump into a database:

docker run --interactive --tty --rm \
  -v ./dump:/dump -v ./neo4j-data:/data \
  neo4j/neo4j-admin:latest \
  neo4j-admin database load neo4j --from-path=/dump

Then, launch a container that uses the loaded database:

docker run -d \
  -p 7474:7474 -p 7687:7687 \
  -e NEO4J_AUTH=neo4j/password \
  -v ./neo4j-data:/data \
  --name=cvegraph-neo4j \
  neo4j:latest

Neo4j Desktop

Start by downloading Neo4j Desktop. Once launched, create a new project. Do not add a DBMS yet. Click the "Add" button and select "File". Select the file in /dump.

Then, open the dropdown menu for the file and click "Create new DBMS from dump".

You will now be able to use all Neo4j Desktop features with the newly created DBMS. You can ensure that the import was successful by connecting to the DBMS, selecting the neo4j database, and looking at the details that pop up on the right.

Constructing the graph

Environment Setup

Install dependencies
- uv
  - curl -LsSf https://astral.sh/uv/install.sh | sh
- Docker
  - curl -fsSL https://get.docker.com -o get-docker.sh && sudo sh ./get-docker.sh
  - Ensure user can execute Docker run commands: sudo apt install -y uidmap && dockerd-rootless-setuptool.sh install
Set up Python environment: uv sync
Clone submodule: git submodule update --init

Run

uv run cvegraph.py

Roadmap

Additional data sources
- NVD
- CWE
- Exploit-DB
- CPE
- ATT&CK
  - Presents a much larger problem than the others due to the challenge of mapping CVEs to ATT&CK TTPs. However, this would be immensely valuable and facilitate the inclusion of APT/threat actor group related data
Asynchronicity
- Logic is already fully implemented in async_cvegraph.py, but there are some quirks; lots of records go missing. For instance, a test run using only CVE-2024-* files finds 36,080 on disk, but only ~28k get sent to Neo4j
Add configuration options to use an external Neo4j database
New ingestion options
- Full: check every single CVE file (current method)
- Quick: check first and last CVEs in the database, omit all files within that range
Investigate CSV database import
- According to Neo4j documentation, this is the fastest possible way to perform bulk import
- May be able to speed up the construction of the database from scratch by first writing records to CSV and then importing them, rather than the current approach of sending batches of 1000 at a time

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
cvelistV5 @ e3aee42		cvelistV5 @ e3aee42
dump		dump
img		img
.gitignore		.gitignore
.gitmodules		.gitmodules
.python-version		.python-version
README.md		README.md
async_cvegraph.py		async_cvegraph.py
cvegraph.py		cvegraph.py
paper.pdf		paper.pdf
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Introduction

Schema

Usage

Viewing the database dump

Docker

Neo4j Desktop

Constructing the graph

Environment Setup

Run

Roadmap

About

Uh oh!

Releases

Packages

Uh oh!

Languages

hc-nolan/CVEgraph

Folders and files

Latest commit

History

Repository files navigation

Introduction

Schema

Usage

Viewing the database dump

Docker

Neo4j Desktop

Constructing the graph

Environment Setup

Run

Roadmap

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages