8000 fix(bigquery-dwh): Get correct primary keys from BigQuery by tomasfarias · Pull Request #33526 · PostHog/posthog · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

fix(bigquery-dwh): Get correct primary keys from BigQuery #33526

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 11, 2025

Conversation

tomasfarias
Copy link
Contributor
@tomasfarias tomasfarias commented Jun 11, 2025

Important

👉 Stay up-to-date with PostHog coding conventions for a smoother review.

Problem

The field constraint_name from TABLE_CONSTRAINTS is not the column name when the constraint is a primary key (it's called pk$ in that case). This means folks with primary key fields that are not id cannot incrementally sync.

Changes

This queries KEY_COLUMN_USAGE instead to actually get the primary key column name.

Did you write or update any docs for this change?

How did you test this code?

@tomasfarias tomasfarias requested a review from a team June 11, 2025 09:14
Copy link
Contributor
@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

Fixed BigQuery data warehouse integration to correctly retrieve primary key column names by querying KEY_COLUMN_USAGE instead of relying on TABLE_CONSTRAINTS.constraint_name field.

  • Modified get_primary_keys in posthog/temporal/data_imports/pipelines/bigquery/source.py to use KEY_COLUMN_USAGE for accurate primary key column identification
  • Fixed critical issue where constraint_name field incorrectly returns 'pk$' instead of actual column names
  • Important data integrity fix ensuring proper table relationships and constraints in BigQuery integration

1 file reviewed, no comments
Edit PR Review Bot Settings | Greptile

@tomasfarias tomasfarias force-pushed the fix/get-primary-keys-from-bigquery branch from 5234721 to aabb8a2 Compare June 11, 2025 11:39
The field `constraint_name` from TABLE_CONSTRAINTS is not the column
name when the constraint is a primary key (it's `pk$` in that case).

This queries KEY_COLUMN_USAGE instead to actually get the primary key
column name.
@tomasfarias tomasfarias force-pushed the fix/get-primary-keys-from-bigquery branch from aabb8a2 to e6fd6f1 Compare June 11, 2025 13:25
@tomasfarias tomasfarias merged commit f8484bf into master Jun 11, 2025
95 checks passed
@tomasfarias tomasfarias deleted the fix/get-primary-keys-from-bigquery branch June 11, 2025 14:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0