-
Notifications
You must be signed in to change notification settings - Fork 4.4k
🚀 Feature: caching queries #6305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@AidenY69, thanks for raising this issue! 🙏🏼
Some questions for you:
|
|
Interesting...I'm still surprised an indexed query would take 10-11s to respond with only 20 results 🧐 I would try to investigate why the query is so slow. As for caching, Appwrite doesn't cache Some other members from the community have also made use of Cloudflare to cache specific routes and then purge the cache on update. For more info, see Discord. |
oh sorry, maybe I misunderstood your question. the collection has 40 million items, the result is 20, so it has to process the entire collection (40m), sort by date, then presents the 20.
I'm not sure I understand what you're saying here, but the actual items don't change, but the amount does, so today there could be 40m, but next week it might be 45m etc. the amount of items only goes up as users use the platform, but the items themselves are fixed data at the time of creation. So I'm not sure what you mean by "put the data in a single document"?
I'll give this a check. I already use Cloudflare, and have the site optimized to cache via NGINX, but since the actual database query is the bottleneck, I would think Appwrite would make sense to look into some built-in caching system for larger collections |
Query caching is an interesting feature! Just to get more context, @AidenY69 what permission model are you using for this collection? is this collections-level or document-level permissions? Can you share more info to understand why the query is taking 10-11 seconds? Is this expected from your side due to potential query complexity? |
It's collections-level perms. The speed is directly correlated to the size of the collection. When I had a few million items, the query was nearly instant, as it hit 10m+ then it slowed down abit but still just a second or two, at 40m it's 10-11s. I expect to hit 50-60m in the next few weeks, so I would imagine the speed will likely slow down another couple seconds. Nothing on the server config has changed, no hardware changes, no backend changes etc. I haven't even touched the codebase in well over a month. The query has remained the same too. It's 100% correlated to the size of the database, which makes sense. but that's why I think having a cache of some kind would help. |
Curious to learn more on how would you imagine this caching mechanism work.
Will you be open to chat more with our team about this feature? This is an interesting idea for us to explore. |
In my use case, per query would be ideal. If you think back to the example of 'most recent', that query (and results and permissions) doesn't change on a user level. But different queries happen on the same collection. So one query is just 'most recent' but on the same collection, it's also sorted by day/week/month time periods. I'd want the 'most recent' to be cached on a shorter interval so users can see that the website is active, but for something like the daily or weekly chart, that makes sense to update on a longer interval.
I had assumed on the server-side entirely. I use typescript with server side rendering, so the user performs the action, then the server side does most of the processing, then sends back the completed request. I guess client side makes more sense for client-side rendering platforms. But I would prefer the server just keep a cache.
Yes, but I would assume it would be command-line. When I say cache, I guess I just more-so was thinking of a specialized database (MariaDB, MySQL, or MongoDB) specifically for caching the results from queries, not like a folder/file stored on the server. So in that case the admin can just flush the database. The process would essentially be whenever a query is made, it checks that 'query database' and if a cache doesn't exist, then it goes to the main collection and can store it in the query database for future use. Then that query database can be be deleted every X interval per query, or I guess collection, depending what you chose.
Sure! I would also actually suggest creating a working environment that mimics a larger setup. I doubt many people have collections in the double digit millions, but I think it might just be worth creating an instance with fake data in the 50m+ range just to see how things get optimized, for when that day happens since platforms grow like Duckweeds these days, so only matter of time. I know I said 10-11ms on my setup, but that's on a very powerful server with minimal users as I went back to a closed-beta, I would think this would actually be a lot worse platforms that have a lot more traffic. |
If there's an index that matches the sorting and queries, the database shouldn't have to process the entire collection 🧐
For example, you can have a cache collection with a value attribute that is a big string. Then, when your 20 most recent changes, stringify the new list and put it in the document with id |
@eldadfux While appwrite use redis, I can only advise you to switch to Redis Stack and use search & json features It could also improve your architecture for the cloud. MariaDB will have fewer requests, and the load will be more on redis (read requests are generally more frequent than write requests). |
I have always thought query caching was called indexing... |
Query Caching:
Indexing:
Comparison:
|
🔖 Feature description
I'm putting this as a feature, and not a bug, but feel free to re-tag it.
I have a platform with 40+ million items in its database, growing exponentially. Whenever pages are loaded, the entire database is queried for the results, however, this becomes a very slow process, especially as the database grows. In my use case, most of the pages are rankings, so it would be like "most viewed today/week/month/etc" or "most recent" etc.
My suggestion is that instead of querying every single hit, that there is some optimizations that would allow the developer to cache the query. That way most users will get an instant reload, while every now an then when the cache is cleared, that one hit will be slow, so 99%+ users will get a fast load and that 1 person gets screwed. Right now, every user gets a slow hit, and as we know, the speed of a reload affects the user's preference for the platform, which could result in them going to a faster competitor, rage quitting, etc.
This will not only decrease load times for the end user, but will put less stress on the server, especially in cloud situations that are charged per compute unit.
🎤 Pitch
For example, when a user clicks on "most recent", instead of the entire database being queried, it could be that a cache is stored of the last hit every 15 minutes (developer configurable), sure the list will be out of date by X minutes/hours/days, but for lists that don't change often this would be very fast. For example, if the page is for "most viewed this week", that could be set to update daily.
There's more examples too that don't really affect me, but could for others, for example social media type platforms that want to sort their feeds based on activity period, or messaging platforms, etc.
👀 Have you spent some time to check if this issue has been raised before?
🏢 Have you read the Code of Conduct?
The text was updated successfully, but these errors were encountered: