-
Notifications
You must be signed in to change notification settings - Fork 1.7k
feat: presigned upload flow #33746
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: presigned upload flow #33746
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Summary
Implements a new presigned URL upload flow for handling large symbol sets in error tracking, bypassing Django's file size limitations.
- Modified
object_storage.py
to support presigned S3 URLs and head_object checks for better file handling - Reduced maximum file size from 1GB to 100MB in
api/error_tracking.py
with new multi-step upload process - Updated Rust CLI in
cli/src/commands/sourcemap/upload.rs
to implement three-step upload process with content hash validation - Added content_hash requirement in
cymbal/src/symbol_store/saving.rs
to ensure symbol set integrity before loading
4 files reviewed, 3 comments
Edit PR Review Bot Settings | Greptile
@@ -266,7 +266,7 @@ impl SymbolSetRecord { | |||
SymbolSetRecord, | |||
r#"SELECT id, team_id, ref as set_ref, storage_ptr, created_at, failure_reason, content_hash, last_used | |||
FROM posthog_errortrackingsymbolset | |||
WHERE team_id = $1 AND ref = $2"#, | |||
WHERE content_hash is not null AND team_id = $1 AND ref = $2"#, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@oliverb123 I can't seem to get the error here to go away. I've tried DATABASE_URL='postgres://posthog:posthog@localhost:5432/posthog' cargo sqlx prepare --workspace
from the Rust root and DATABASE_URL='postgres://posthog:posthog@localhost:5432/posthog' cargo sqlx prepare
from the cymbal directory but neither seem to make the error go away / regenerate the files I need
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just ran cargo sqlx prepare --workspace
and it seemed to work? Pushed the updated query snapshots
…sthog into err/dn-feat/presigned-uploads
📸 UI snapshots have been updated1 snapshot changes in total. 0 added, 1 modified, 0 deleted:
Triggered by this commit. |
📸 UI snapshots have been updated1 snapshot changes in total. 0 added, 1 modified, 0 deleted: Caution Detected flapping snapshotsThese snapshots have auto-updated more than once since the last human commit:
The flippy-flappies are deadly and must be fixed ASAP. They're productivity killers.
Triggered by this commit. |
posthog/api/error_tracking.py
Outdated
content_length = s3_upload.get("ContentLength") | ||
|
||
if not content_length or content_length > ONE_HUNDRED_MEGABYTES: | ||
# TODO: remove from s3 (reuse dagster job) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the symbol sets delete function deletes the s3 object too, if you set the storage_ptr first.
Size Change: 0 B Total Size: 2.57 MB ℹ️ View Unchanged
|
…sthog into err/dn-feat/presigned-uploads
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me ! It can be out of scope but I think we should also consider adding a status
column for symbol sets to differentiate between the different states (fetched, uploading, uploaded...) instead of relying on content_hash being present or not.
@@ -40,13 +63,13 @@ pub fn upload( | |||
&host, | |||
&token, | |||
Some(directory.clone()), | |||
Some(content_hash(&uploads)), | |||
Some(content_hash(uploads.iter().map(|upload| &upload.data))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The content_hash
calculation includes all uploads before any are actually uploaded. This creates a potential inconsistency if some uploads are later skipped due to size constraints in the new implementation. The hash would then represent data that wasn't actually uploaded to the server, potentially causing validation issues downstream. Consider calculating the hash only after determining which uploads will actually be processed, or implementing a mechanism to recalculate the hash if any uploads are skipped.
Spotted by Diamond
Is this helpful? React 👍 or 👎 to let us know.
Important
👉 Stay up-to-date with PostHog coding conventions for a smoother review.
Problem
Uploading large symbol sets via the Django app is not possible because of max file sizes on our api gateway
Changes
content_hash
to symbol setcontent_hash
in cymbalTests