-
Notifications
You must be signed in to change notification settings - Fork 1.7k
feat(experiments): Support all internal and user filters for dw #27603
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
posthog/hogql_queries/experiments/test/test_experiment_trends_query_runner.py
Show resolved
Hide resolved
…posthog into experiments/dw-internal-filters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think some of the generic HogQL changes could get a few generic tests to cover the changes - even if test_experiments_trends_query_builder.py
indirectly tests them.
However I'd ping in @Gilbert09 for a quick look on the warehouse parts. Looks okay to me, but I'm not too familiar with the code around here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes here seem good to me, though 1) I'd ask @Gilbert09 to double check the DataWarehouseJoin
part, and 2) I wonder if we need explicit tests (e.g. in test_database.py
) for these changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
56f88bf covers the happy path for both isolate_scope=True
and events_join = next(
table_type = node.type.table_type | ||
while isinstance(table_type, ast.VirtualTableType): | ||
table_type = table_type.table_type | ||
self.scopes.append(ast.SelectQueryType(tables={node.type.name: table_type})) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also wonder if this should be tested somehow in test_resolver.py
, though it might be tricky and I'm happy to 🙈 here if that helps
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason behind not wanting a virtual table attached to the scope? Will this break existing behavior elsewhere at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason behind not wanting a virtual table attached to the scope?
Here's the relevant conversation #27603 (comment)
Will this break existing behavior elsewhere at all?
Existing tests seem to pass fine. I don't know enough about the application to evaluate whether there's some usage not covered by tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also wonder if this should be tested somehow in
test_resolver.py
, though it might be tricky and I'm happy to 🙈 here if that helps
@mariusandra It's implicitly covered by 56f88bf (new test), and that style of test is generally how I'd think to solve it (register a new database table and then verify the field resolves through a virtual table to it.
Is there some other style of test you had in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah... I think it's fine. It would probably be very similar code in both places, so I'm happy with it as it is.
expr = property_to_expr(property_clone, self.team) | ||
if property_clone["type"] in ("group", "element") and hasattr(expr, "left"): | ||
expr.left.chain = ["events", *expr.left.chain] | ||
filters.append(expr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels kind of magical, but I really don't know anything about what is expected when is_data_warehouse_series
is true, and it's not too dissimilar to other code around it. I'd ping @Gilbert09 again just in case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I'm not so sure on this - looks like this was first introduced in #27011. What happens if the data warehouse table has no events
table attached? Should we not be testing for modifiers.dataWarehouseEventsModifiers
here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion, updated with ec5aaca
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the ping @mariusandra. A few questions before I'm happy to ✅ this
posthog/hogql/database/database.py
Outdated
for join in DataWarehouseJoin.objects.filter(team_id=team.pk).exclude(deleted=True) | ||
if join.joining_table_name == "events" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason we're not doing DataWarehouseJoin.objects.filter(team_id=team.pk, joining_table_name="events").exclude(deleted=True)
? Will save us the for loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking for any warehouse join onto events
(not specifically related to the warehouse_modifier.table_name
table - is that intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naivete and why I appreciate code review 😄 updated with f0c2ad2
warehouse[warehouse_modifier.table_name].fields["person_id"] = ExpressionField( | ||
name="person_id", | ||
expr=parse_expr(warehouse_modifier.distinct_id_field), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we not want to fallback onto using something like pdi.person_id
instead here? That is the true person_id
and not distinct_id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some discussion here: https://posthog.slack.com/archives/C019RAX2XBN/p1738074683173439?thread_ts=1738073053.553839&cid=C019RAX2XBN
I don't want to touch it on this PR but I've made note to look at it further.
table_type = node.type.table_type | ||
while isinstance(table_type, ast.VirtualTableType): | ||
table_type = table_type.table_type | ||
self.scopes.append(ast.SelectQueryType(tables={node.type.name: table_type})) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason behind not wanting a virtual table attached to the scope? Will this break existing behavior elsewhere at all?
filters.append(property_to_expr(property_clone, self.team)) | ||
expr = property_to_expr(property_clone, self.team) | ||
if property_clone["type"] in ("group", "element") and hasattr(expr, "left"): | ||
expr.left.chain = ["events", *expr.left.chain] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we wanna check that this is indeed the correct ast
node and not just check whether it has a left
attribute? I presume you're testing that expr
is a ast.CompareOperation
and that expr.left
is a ast.Field
node?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion, updated with 9d67b02
expr = property_to_expr(property_clone, self.team) | ||
if property_clone["type"] in ("group", "element") and hasattr(expr, "left"): | ||
expr.left.chain = ["events", *expr.left.chain] | ||
filters.append(expr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I'm not so sure on this - looks like this was first introduced in #27011. What happens if the data warehouse table has no events
table attached? Should we not be testing for modifiers.dataWarehouseEventsModifiers
here too?
posthog/hogql/database/database.py
Outdated
for join in DataWarehouseJoin.objects.filter( | ||
team_id=team.pk, | ||
source_table_name=warehouse_modifier.table_name, | ||
joining_table_name="events", | ||
).exclude(deleted=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use .first()
instead of using the loop and next()
- it'll return DataWarehouseJoin | None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -671,7 +671,9 @@ def _events_filter( | |||
) -> ast.Expr: | |||
series = self.series | |||
filters: list[ast.Expr] = [] | |||
is_data_warehouse_series = isinstance(series, DataWarehouseNode) | |||
is_data_warehouse_event_series = isinstance(series, DataWarehouseNode) and any( | |||
series.table_name == modifier.table_name for modifier in self.modifiers.dataWarehouseEventsModifiers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll wanna check self.modifiers.dataWarehouseEventsModifiers
is not None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keeping the train running. @mariusandra Let me know if you'd like more tests around |
Fixes #27012
See https://posthog.slack.com/archives/C07PXH2GTGV/p1737044475404549
Changes
Ensures support for all permutations of
test_account_filters
when using a data warehouse table with a trends experiment.How did you test this code?
Tests should pass. Includes tests for both
test_account_filters
+ data warehouse trends experiment andtest_account_filters
+ event trends experiment.