-
Notifications
You must be signed in to change notification settings - Fork 6
Add RFC for query migrations #293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
||
To be discusssed: | ||
- Should we add a version field to the queries, so that we know which migrations to run | ||
- Should we keep migrations forever or should we have a system in place that discards frontend side queries that are older than a certain time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you talk about a "migration" here, what does that look like in code? Are we gonna have a history of the model changes over time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's what I was thinking. More concretely I'm thinking of adding a folder with query migrations like:
posthog/schema_migrations/
0001_trends_query_bool_to_enum.py
0002_insight_query_date_range_refactor.py
...
and each file has a class/methods to determine on which type of query to run and how to modify from one version to the next. I'm thinking we want to have separate versions for each query type so that not all queries have to be "migrated" if just changing on kind.
|
||
Migrating the schema from one version to the next should be as easy as a database migration for developers. | ||
|
||
### Option a) Add a backend side migration and allow the frontend to use this in a new endpoint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've also experienced a bunch of issues from having conversions on both the frontend and backend, and so I think this makes sense - we'd need to make sure this endpoint was very fast though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯 I'm thinking it's alright if we have a quick way to return queries that don't need to be touched. We'd then keep backend side queries up-to-date with a backfill job and frontend side queries that are important should be touched (and with that upgraded) often as well.
|
||
I'm favoring option a, where we add a backend side migration and call that from the frontend, as it seems to be the easiest (and as such, most scalable) solution for future developers adding query migrations. | ||
|
||
### Option d) Something easy I'm not seeing right now? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we considered something like a library/package with the conversion code compiled to both Python and JS? Not sure if something like this exists, but it would help reduce code duplication and things getting out of sync
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, we could transpile code from one language to the other or run the migrations inside of a VM. I'm worrying that this is brittle and would blow up bundle size (if using python as source of truth) however.
|
||
## Imp 10000 lementation | ||
|
||
- Get in touch with customers to deprecate `filters` based insights (already happening). Remove all remaining `filters` conversions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Big 👍 on getting filters
removed asap
|
||
### Option a) Add a backend side migration and allow the frontend to use this in a new endpoint | ||
|
||
We could add only a backend side migration and let the frontend call a `/query/validation` endpoint that also migrates queries to newer versions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since queries are always retrieved from the backend, in most cases we can do something sneakier that avoids the roundtrip: In all of the call sites from the "Queries are persisted in various places" list, we'd do the /query/validation
transformation internally before the query ever gets to the user. This would mean the frontend never seeing old-style queries.
With one exception: filters in the URL – where the roundtrip would still be necessary, and would be quite painful (e.g. in some cases the frontend would not be able to support the query anymore, so it would have to wait for the validation round trip – a few hundred milliseconds of a wait doesn't sound so bad, but it can definitely add up to a slow experience!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, we definitely want to handle this backend side as much as possible. I'll probably find out how easy that is for notebooks, activity logs, etc when working on it 😆
We also have queries stored in local storage, so these need to do the roundtrip as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would migrating the query take few hundred ms? Correct me if wrong, but the /query/validation
should parse => apply migrations => serialize, it should not take more than 50-100ms?
It would also be easy to cache those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validation itself should be quick, but there's also significant overhead to every individual API request, plus internet speeds vary – piece by piece these things add up to a slow app experience, translating to lower retention → less 💵.
- Implement the above. | ||
|
||
To be discusssed: | ||
- Should we add a version field to the queries, so that we know which migrations to run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may start to version out API and put the version in each request. This would allow us to discover all situations where someone is calling our trunk API with old app version, whatever reason.
No description provided.