-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Allow setting a custom max context window for Google Gemini API provider (and/or universal max context window) #3717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@mrubens Approved? |
It is now possible to set a threshold (percentage of the context window) at which automatic condensing is triggered. Does that satisfy this requirement? |
Yeah |
Closing as complete since it is now possible to set a threshold (percentage of the context window) at which automatic condensing is triggered, and there is a manual condense button. |
@canrobins13 I think we should still be able to set the max context window on the api provider level since the context condensing % threshold is not an api provider level setting and thus trying to cap Gemini at 200k (it gets more expensive after 200k) by setting the threshold to 80% will negativly impact when the user switches to use a model with a lower context they dont want to condense at that threshold. |
@hannesrudolph Hi Hannes, happy to work on this issue! :) |
@hannesrudolph another option is to have a max number of tokens to condense at across all providers, and then condense at whichever is less of the percent and the absolute token count. I worry a bit that people won’t bother to keep track of provider specific settings or will get confused by them |
Would be nice to have ability to set/store/restore the limit per model (for local models - especially valuable) or per provider/model |
It's all yours! |
Is this duplicative of the work we're doing to set provider-specific condensing thresholds? If we do need this, I'm not sure it should be Gemini-only. We should probably discuss. |
Hi @mrubens @hannesrudolph!! I looked into this a bit, and if you refer to this PR: #4456, then yes, it seems that it is duplicate work, but there are subtle differences. In my opinion, it seems clearer for users to set it via context limit rather than thresholding if they have a specific goal for which token they want to cap at. Currently, users must search for the model's limit and calculate it for thresholding. But if they already know their token budgeting limit, it will be easier and 8E62 more direct for them to set via context limit. Like anojndr has 250 000 TPM in this mind, if he set the context limit to 250 000, then the job is done. If he has to do it via thresholding, then he needs to calculate it and set the percentage. I think the latter is worse in terms of user experience. However, I think this (meaning strategically setting context limit) can coexist with #4456 and be helpful in the future for cost calculation and budgeting features, e.g., for future intelligent cost optimization and monitoring systems. This will give users more finer-grained control and management over the cost. ConclusionI think both profile-specific thresholding and profile-specific context limit will coexist well with each other and give users finer-grained control over the cost. If we go this route and with this in mind, then yeah this should not be Gemini-specific. |
What problem does this proposed feature solve?
Currently, users cannot change the max context window for the Google Gemini API provider in Roo Code—it is fixed at 1 million tokens. This is problematic for users on the Gemini free tier, where the tokens-per-minute (TPM) limit is 250k. As a result, it's easy to hit provider-side limits and get errors, especially when Roo tries to use the full 1M context window.
Additionally, Roo Code's "Intelligently condense the context window" feature is based on the max context window setting. If users could set a lower max context window (e.g., 250k), the condensation/summarization would trigger at the right time for their actual usage limits, making the feature much more useful and preventing API errors.
Describe the proposed solution in detail
Technical considerations or implementation details (optional)
No response
Describe alternatives considered (if any)
Additional Context & Mockups
Proposal Checklist
Are you interested in implementing this feature if approved?
The text was updated successfully, but these errors were encountered: