8000 BED-5682 generic ingest schema validation by brandonshearin · Pull Request #1352 · SpecterOps/BloodHound · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

BED-5682 generic ingest schema validation #1352

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 33 commits into from
Apr 22, 2025
Merged

BED-5682 generic ingest schema validation #1352

merged 33 commits into from
Apr 22, 2025

Conversation

brandonshearin
Copy link
Contributor
@brandonshearin brandonshearin commented Apr 11, 2025

Description

  • Inject IngestSchema dependency onto Resources for only-once compilation
  • ValidateMetaTag augmented to validate generic files. moved it to a new file stream_decoder.go to live with other stream-decoding funcs
  • 3rd party github.com/santhosh-tekuri/jsonschema/v6 v6.0.1 package added for JSON Schema validation
  • defined a JSON Schema for nodes and edges in edge.json and node.json.

Motivation and Context

This PR addresses: BED-5682 and BED-5593
We want to ingest arbitrary nodes and edges into the graph. We want to accept generic payloads that describe a graph snapshot that conforms to a standardized node/edge JSON schema.

This changeset augments the existing file upload workflow to support generic files. The scope of these changes is to write generic payloads to the file system, if the files pass validation. A subsequent PR containing datapipe changes will actually process the generic files on disk and write entities to the graph db.

The existing file ingest handler used a streaming json decoder to introspect the payload and reject any files that did not have valid data and meta tags (this can be found inside of ValidateMetaTag(). This changeset extends ValidateMetaTag to validate nodes and edges in a generic payload and reject any request that fails validation. We will collect up to 15 errors before returning to the caller.

How Has This Been Tested?

  • Uploaded generic files through the UI to verify that the API can correctly accept good files and reject bad ones, and that the UI displays those errors. screenshot included
  • Verified that valid generic files get correctly ingested and written to the FS in the docker container. They get written to the filesystem at the bhapi/work/tmp directory
  • Verified that any IngestTask record created for a generic file has its is_generic flag properly set in PG
  • An extensive unit test suite has been added at stream_decoder_test.go to cover the positive and negative cases for validation
  • Perf test with parca to ensure that large requests do not blow out memory

Screenshots (optional):

The file upload UI for a mixed bag of sharphound/generic ingest files. "bad_aicas" is a wrongly-formatted sharphound collection so you can see what we currently display for rejected files. "payload1" is a wrongly-formatted generic ingest file. You can see the API response with the schema violations that payload 1 had. Each violation gets its own message.
Screenshot 2025-04-11 at 1 19 31 PM

This is a parca profile of the API during the ingestion of a .5gb and 1gb file (2mil and 4mil nodes). I wanted to make sure that the stream was working correctly and we weren't blowing out memory as we streamed in the payload. As you can see, there was no significant impact on memory pressure
Screenshot 2025-04-11 at 2 52 01 PM

Types of changes

  • Chore (a change that does not modify the application functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Database Migrations

Checklist:

@brandonshearin brandonshearin requested a review from wes-mil April 11, 2025 19:34
@brandonshearin brandonshearin added enhancement New feature or request api A pull request containing changes affecting the API code. labels Apr 11, 2025
@brandonshearin brandonshearin self-assigned this Apr 11, 2025
@brandonshearin brandonshearin changed the title draft: BED-5682 generic ingest schema validation BED-5682 generic ingest schema validation Apr 15, 2025
@brandonshearin brandonshearin merged commit 41faca9 into main Apr 22, 2025
8 checks passed
@brandonshearin brandonshearin deleted the BED-5682 branch April 22, 2025 17:21
@github-actions github-actions bot locked and limited conversation to collaborators Apr 22, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api A pull request containing changes affecting the API code. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0