8000 feat: presigned upload flow by daibhin · Pull Request #33746 · PostHog/posthog · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

feat: presigned upload flow #33746

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 26 commits into from
Jun 17, 2025
Merged

feat: presigned upload flow #33746

merged 26 commits into from
Jun 17, 2025

Conversation

daibhin
Copy link
Contributor
@daibhin daibhin commented Jun 16, 2025

Important

👉 Stay up-to-date with PostHog coding conventions for a smoother review.

Problem

Uploading large symbol sets via the Django app is not possible because of max file sizes on our api gateway

Changes

  • Fetch a presigned url from s3
  • Upload directly to s3
  • Finish upload by adding content_hash to symbol set
  • Only fetch symbol sets with a content_hash in cymbal

Tests

  • Added some
  • Verified locally we can upload files larger than 21MB now
Screenshot 2025-06-17 at 12 31 04

Copy link
Contributor
@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

Implements a new presigned URL upload flow for handling large symbol sets in error tracking, bypassing Django's file size limitations.

  • Modified object_storage.py to support presigned S3 URLs and head_object checks for better file handling
  • Reduced maximum file size from 1GB to 100MB in api/error_tracking.py with new multi-step upload process
  • Updated Rust CLI in cli/src/commands/sourcemap/upload.rs to implement three-step upload process with content hash validation
  • Added content_hash requirement in cymbal/src/symbol_store/saving.rs to ensure symbol set integrity before loading

4 files reviewed, 3 comments
Edit PR Review Bot Settings | Greptile

@daibhin daibhin requested a review from oliverb123 June 16, 2025 16:48
@@ -266,7 +266,7 @@ impl SymbolSetRecord {
SymbolSetRecord,
r#"SELECT id, team_id, ref as set_ref, storage_ptr, created_at, failure_reason, content_hash, last_used
FROM posthog_errortrackingsymbolset
WHERE team_id = $1 AND ref = $2"#,
WHERE content_hash is not null AND team_id = $1 AND ref = $2"#,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oliverb123 I can't seem to get the error here to go away. I've tried DATABASE_URL='postgres://posthog:posthog@localhost:5432/posthog' cargo sqlx prepare --workspace from the Rust root and DATABASE_URL='postgres://posthog:posthog@localhost:5432/posthog' cargo sqlx prepare from the cymbal directory but neither seem to make the error go away / regenerate the files I need

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just ran cargo sqlx prepare --workspace and it seemed to work? Pushed the updated query snapshots

@daibhin daibhin requested review from a team and hpouillot June 17, 2025 10:10
@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

1 snapshot changes in total. 0 added, 1 modified, 0 deleted:

  • chromium: 0 added, 1 modified, 0 deleted (diff for shard 2)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

1 snapshot changes in total. 0 added, 1 modified, 0 deleted:

Caution

Detected flapping snapshots

These snapshots have auto-updated more than once since the last human commit:

  • scenes-app-insights-funnels--funnel-top-to-bottom-edit--dark.png (chromium, shard 2)

The flippy-flappies are deadly and must be fixed ASAP. They're productivity killers.
Run pnpm storybook locally and make the fix now.
(Often, the cause is ResizeObserver being used instead of the better CSS container queries.)

  • chromium: 0 added, 1 modified, 0 deleted (diff for shard 2)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

content_length = s3_upload.get("ContentLength")

if not content_length or content_length > ONE_HUNDRED_MEGABYTES:
# TODO: remove from s3 (reuse dagster job)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the symbol sets delete function deletes the s3 object too, if you set the storage_ptr first.

Copy link
Contributor
github-actions bot commented Jun 17, 2025

Size Change: 0 B

Total Size: 2.57 MB

ℹ️ View Unchanged
Filename Size
frontend/dist/toolbar.js 2.57 MB

compressed-size-action

Copy link
Contributor
@hpouillot hpouillot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me ! It can be out of scope but I think we should also consider adding a status column for symbol sets to differentiate between the different states (fetched, uploading, uploaded...) instead of relying on content_hash being present or not.

@@ -40,13 +63,13 @@ pub fn upload(
&host,
&token,
Some(directory.clone()),
Some(content_hash(&uploads)),
Some(content_hash(uploads.iter().map(|upload| &upload.data))),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The content_hash calculation includes all uploads before any are actually uploaded. This creates a potential inconsistency if some uploads are later skipped due to size constraints in the new implementation. The hash would then represent data that wasn't actually uploaded to the server, potentially causing validation issues downstream. Consider calculating the hash only after determining which uploads will actually be processed, or implementing a mechanism to recalculate the hash if any uploads are skipped.

Spotted by Diamond

Is this helpful? React 👍 or 👎 to let us know.

@daibhin daibhin merged commit 8c2ef1c into master Jun 17, 2025
102 checks passed
@daibhin daibhin deleted the err/dn-feat/presigned-uploads branch June 17, 2025 15:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0