8000 feat(experiments): Support all internal and user filters for dw by danielbachhuber · Pull Request #27603 · PostHog/posthog · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

feat(experiments): Support all internal and user filters for dw #27603

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 38 commits into from
Jan 29, 2025

Conversation

danielbachhuber
Copy link
Contributor
@danielbachhuber danielbachhuber commented Jan 16, 2025

Fixes #27012
See https://posthog.slack.com/archives/C07PXH2GTGV/p1737044475404549

Changes

Ensures support for all permutations of test_account_filters when using a data warehouse table with a trends experiment.

How did you test this code?

Tests should pass. Includes tests for both test_account_filters + data warehouse trends experiment and test_account_filters + event trends experiment.

@danielbachhuber danielbachhuber requested review from a team and mariusandra January 28, 2025 15:10
@danielbachhuber danielbachhuber marked this pull request as ready for review January 28, 2025 15:10
Copy link
Collaborator
@mariusandra mariusandra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some of the generic HogQL changes could get a few generic tests to cover the changes - even if test_experiments_trends_query_builder.py indirectly tests them.

However I'd ping in @Gilbert09 for a quick look on the warehouse parts. Looks okay to me, but I'm not too familiar with the code around here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes here seem good to me, though 1) I'd ask @Gilbert09 to double check the DataWarehouseJoin part, and 2) I wonder if we need explicit tests (e.g. in test_database.py) for these changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

56f88bf covers the happy path for both isolate_scope=True and events_join = next(

Comment on lines +678 to +681
table_type = node.type.table_type
while isinstance(table_type, ast.VirtualTableType):
table_type = table_type.table_type
self.scopes.append(ast.SelectQueryType(tables={node.type.name: table_type}))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wonder if this should be tested somehow in test_resolver.py, though it might be tricky and I'm happy to 🙈 here if that helps

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason behind not wanting a virtual table attached to the scope? Will this break existing behavior elsewhere at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason behind not wanting a virtual table attached to the scope?

Here's the relevant conversation #27603 (comment)

Will this break existing behavior elsewhere at all?

Existing tests seem to pass fine. I don't know enough about the application to evaluate whether there's some usage not covered by tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wonder if this should be tested somehow in test_resolver.py, though it might be tricky and I'm happy to 🙈 here if that helps

@mariusandra It's implicitly covered by 56f88bf (new test), and that style of test is generally how I'd think to solve it (register a new database table and then verify the field resolves through a virtual table to it.

Is there some other style of test you had in mind?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... I think it's fine. It would probably be very similar code in both places, so I'm happy with it as it is.

Comment on lines 709 to 712
expr = property_to_expr(property_clone, self.team)
if property_clone["type"] in ("group", "element") and hasattr(expr, "left"):
expr.left.chain = ["events", *expr.left.chain]
filters.append(expr)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels kind of magical, but I really don't know anything about what is expected when is_data_warehouse_series is true, and it's not too dissimilar to other code around it. I'd ping @Gilbert09 again just in case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'm not so sure on this - looks like this was first introduced in #27011. What happens if the data warehouse table has no events table attached? Should we not be testing for modifiers.dataWarehouseEventsModifiers here too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion, updated with ec5aaca

@mariusandra mariusandra requested a review from Gilbert09 January 29, 2025 14:19
Copy link
Member
@Gilbert09 Gilbert09 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the ping @mariusandra. A few questions before I'm happy to ✅ this

Comment on lines 416 to 417
for join in DataWarehouseJoin.objects.filter(team_id=team.pk).exclude(deleted=True)
if join.joining_table_name == "events"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason we're not doing DataWarehouseJoin.objects.filter(team_id=team.pk, joining_table_name="events").exclude(deleted=True)? Will save us the for loop

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking for any warehouse join onto events (not specifically related to the warehouse_modifier.table_name table - is that intentional?

Copy link
Contributor Author
@danielbachhuber danielbachhuber Jan 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naivete and why I appreciate code review 😄 updated with f0c2ad2

Comment on lines +426 to +428
warehouse[warehouse_modifier.table_name].fields["person_id"] = ExpressionField(
name="person_id",
expr=parse_expr(warehouse_modifier.distinct_id_field),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we not want to fallback onto using something like pdi.person_id instead here? That is the true person_id and not distinct_id

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some discussion here: https://posthog.slack.com/archives/C019RAX2XBN/p1738074683173439?thread_ts=1738073053.553839&cid=C019RAX2XBN

I don't want to touch it on this PR but I've made note to look at it further.

Comment on lines +678 to +681
table_type = node.type.table_type
while isinstance(table_type, ast.VirtualTableType):
table_type = table_type.table_type
self.scopes.append(ast.SelectQueryType(tables={node.type.name: table_type}))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason behind not wanting a virtual table attached to the scope? Will this break existing behavior elsewhere at all?

filters.append(property_to_expr(property_clone, self.team))
expr = property_to_expr(property_clone, self.team)
if property_clone["type"] in ("group", "element") and hasattr(expr, "left"):
expr.left.chain = ["events", *expr.left.chain]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we wanna check that this is indeed the correct ast node and not just check whether it has a left attribute? I presume you're testing that expr is a ast.CompareOperation and that expr.left is a ast.Field node?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion, updated with 9d67b02

Comment on lines 709 to 712
expr = property_to_expr(property_clone, self.team)
if property_clone["type"] in ("group", "element") and hasattr(expr, "left"):
expr.left.chain = ["events", *expr.left.chain]
filters.append(expr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'm not so sure on this - looks like this was first introduced in #27011. What happens if the data warehouse table has no events table attached? Should we not be testing for modifiers.dataWarehouseEventsModifiers here too?

Comment on lines 416 to 420
for join in DataWarehouseJoin.objects.filter(
team_id=team.pk,
source_table_name=warehouse_modifier.table_name,
joining_table_name="events",
).exclude(deleted=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use .first() instead of using the loop and next() - it'll return DataWarehouseJoin | None

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -671,7 +671,9 @@ def _events_filter(
) -> ast.Expr:
series = self.series
filters: list[ast.Expr] = []
is_data_warehouse_series = isinstance(series, DataWarehouseNode)
is_data_warehouse_event_series = isinstance(series, DataWarehouseNode) and any(
series.table_name == modifier.table_name for modifier in self.modifiers.dataWarehouseEventsModifiers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll wanna check self.modifiers.dataWarehouseEventsModifiers is not None

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danielbachhuber
Copy link
Contributor Author

Keeping the train running. @mariusandra Let me know if you'd like more tests around resolver.py — happy to accommodate!

@danielbachhuber danielbachhuber merged commit 3397f68 into master Jan 29, 2025
92 checks passed
@danielbachhuber danielbachhuber deleted the experiments/dw-internal-filters branch January 29, 2025 20:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support all internal and test user filters for data warehouse Trends queries
3 participants
0