-
Notifications
You must be signed in to change notification settings - Fork 181
BED-5682 generic ingest schema validation #1352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
wes-mil
approved these changes
Apr 17, 2025
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Labels
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Resources
for only-once compilationValidateMetaTag
augmented to validate generic files. moved it to a new filestream_decoder.go
to live with other stream-decoding funcsgithub.com/santhosh-tekuri/jsonschema/v6 v6.0.1
package added for JSON Schema validationedge.json
andnode.json
.Motivation and Context
This PR addresses: BED-5682 and BED-5593
We want to ingest arbitrary nodes and edges into the graph. We want to accept generic payloads that describe a graph snapshot that conforms to a standardized node/edge JSON schema.
This changeset augments the existing file upload workflow to support generic files. The scope of these changes is to write generic payloads to the file system, if the files pass validation. A subsequent PR containing datapipe changes will actually process the generic files on disk and write entities to the graph db.
The existing file ingest handler used a streaming json decoder to introspect the payload and reject any files that did not have valid
data
andmeta
tags (this can be found inside ofValidateMetaTag()
. This changeset extendsValidateMetaTag
to validate nodes and edges in a generic payload and reject any request that fails validation. We will collect up to 15 errors before returning to the caller.How Has This Been Tested?
IngestTask
record created for a generic file has its is_generic flag properly set in PGstream_decoder_test.go
to cover the positive and negative cases for validationScreenshots (optional):
The file upload UI for a mixed bag of sharphound/generic ingest files. "bad_aicas" is a wrongly-formatted sharphound collection so you can see what we currently display for rejected files. "payload1" is a wrongly-formatted generic ingest file. You can see the API response with the schema violations that payload 1 had. Each violation gets its own message.

This is a parca profile of the API during the ingestion of a .5gb and 1gb file (2mil and 4mil nodes). I wanted to make sure that the stream was working correctly and we weren't blowing out memory as we streamed in the payload. As you can see, there was no significant impact on memory pressure

Types of changes
Checklist: