8000 Allow setting a custom max context window for Google Gemini API provider (and/or universal max context window) · Issue #3717 · RooCodeInc/Roo-Code · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Allow setting a custom max context window for Google Gemini API provider (and/or universal max context window) #3717

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
3 of 4 tasks
anojndr opened this issue May 18, 2025 · 11 comments · May be fixed by #4360
Open
3 of 4 tasks
Assignees
Labels
enhancement New feature or request feature request Feature request, not a bug Issue - In Progress Someone is actively working on this. Should link to a PR soon.

Comments

@anojndr
Copy link
anojndr commented May 18, 2025

What problem does this proposed feature solve?

Currently, users cannot change the max context window for the Google Gemini API provider in Roo Code—it is fixed at 1 million tokens. This is problematic for users on the Gemini free tier, where the tokens-per-minute (TPM) limit is 250k. As a result, it's easy to hit provider-side limits and get errors, especially when Roo tries to use the full 1M context window.

Additionally, Roo Code's "Intelligently condense the context window" feature is based on the max context window setting. If users could set a lower max context window (e.g., 250k), the condensation/summarization would trigger at the right time for their actual usage limits, making the feature much more useful and preventing API errors.

Describe the proposed solution in detail

  • Add an option in Roo Code settings to set a custom max context window for the Google Gemini API provider, similar to how it works for OpenAI-compatible providers.
    • This could be a per-provider setting, or (even better) a universal/global max context window setting that applies to all providers unless overridden.
  • When set, Roo Code should respect this limit for all context management, including:
    • How much context is sent in each request to Gemini
    • When to trigger "Intelligently condense the context window" (so it summarizes before hitting the user-defined limit, not the hardcoded 1M)
    • Any UI warnings or token usage displays should reflect the user-set limit
  • Ideally, this setting should be easy to find and adjust, with a sensible default (e.g., 1M for Gemini, but user-overridable).

Technical considerations or implementation details (optional)

No response

Describe alternatives considered (if any)

  • The only current workaround is to manually keep conversations short or start new threads, which is disruptive and doesn't allow users to take full advantage of Roo Code's context management features.
  • Another alternative is to only allow this for OpenAI-compatible providers, but this leaves Gemini users at a disadvantage.
  • A universal/global max context window setting would be a good alternative, as it would help users who switch between providers or use multiple models.

Additional Context & Mockups

  • This feature would especially benefit users on the Gemini free tier (250k TPM), but also anyone who wants more control over context size for cost or performance reasons.
  • It would make the "Intelligently condense the context window" feature much more effective, since condensation would happen at the right time for the user's actual limits.

Proposal Checklist

  • I have searched existing Issues and Discussions to ensure this proposal is not a duplicate.
  • This proposal is for a specific, actionable change intended for implementation (not a general idea).
  • I understand that this proposal requires review and approval before any development work begins.

Are you interested in implementing this feature if approved?

  • Yes, I would like to contribute to implementing this feature.
@anojndr anojndr added the enhancement New feature or request label May 18, 2025
@dosubot dosubot bot added the feature request Feature request, not a bug label May 18, 2025
@hannesrudolph hannesrudolph moved this from New to Issue [Unassigned] in Roo Code Roadmap May 21, 2025
@hannesrudolph
Copy link
Collaborator
hannesrudolph commented May 21, 2025

@mrubens Approved?

@hannesrudolph hannesrudolph added the Issue - Unassigned / Actionable Clear and approved. Available for contributors to pick up. label May 21, 2025
@canrobins13
Copy link

It is now possible to set a threshold (percentage of the context window) at which automatic condensing is triggered. Does that satisfy this requirement?

@anojndr
Copy link
8000
Author
anojndr commented May 29, 2025

It is now possible to set a threshold (percentage of the context window) at which automatic condensing is triggered. Does that satisfy this requirement?

Yeah

@canrobins13
Copy link

Closing as complete since it is now possible to set a threshold (percentage of the context window) at which automatic condensing is triggered, and there is a manual condense button.

@github-project-automation github-project-automation bot moved this from New to Done in Roo Code Roadmap May 29, 2025
@github-project-automation github-project-automation bot moved this from Issue [Unassigned] to Done in Roo Code Roadmap May 29, 2025
@hannesrudolph hannesrudolph reopened this Jun 2, 2025
@github-project-automation github-project-automation bot moved this from Done to New in Roo Code Roadmap Jun 2, 2025
@github-project-automation github-project-automation bot moved this from Done to Triage in Roo Code Roadmap Jun 2, 2025
@hannesrudolph
Copy link
Collaborator

@canrobins13 I think we should still be able to set the max context window on the api provider level since the context condensing % threshold is not an api provider level setting and thus trying to cap Gemini at 200k (it gets more expensive after 200k) by setting the threshold to 80% will negativly impact when the user switches to use a model with a lower context they dont want to condense at that threshold.

@hannesrud
8000
olph hannesrudolph self-assigned this Jun 2, 2025
@hannesrudolph hannesrudolph moved this from Triage to Issue [In Progress] in Roo Code Roadmap Jun 2, 2025
@HahaBill
Copy link
HahaBill commented Jun 2, 2025

@hannesrudolph Hi Hannes, happy to work on this issue! :)

@canrobins13
Copy link

@hannesrudolph another option is to have a max number of tokens to condense at across all providers, and then condense at whichever is less of the percent and the absolute token count. I worry a bit that people won’t bother to keep track of provider specific settings or will get confused by them

@avtc
Copy link
avtc commented Jun 2, 2025

Would be nice to have ability to set/store/restore the limit per model (for local models - especially valuable) or per provider/model

@hannesrudolph
Copy link
Collaborator

@hannesrudolph Hi Hannes, happy to work on this issue! :)

It's all yours!

@daniel-lxs daniel-lxs added Issue - In Progress Someone is actively working on this. Should link to a PR soon. and removed Issue - Unassigned / Actionable Clear and approved. Available for contributors to pick up. labels Jun 3, 2025
@HahaBill HahaBill linked a pull request Jun 5, 2025 that will close this issue
23 tasks
@mrubens
Copy link
Collaborator
mrubens commented Jun 9, 2025

Is this duplicative of the work we're doing to set provider-specific condensing thresholds?

If we do need this, I'm not sure it should be Gemini-only. We should probably discuss.

@HahaBill
Copy link
HahaBill commented Jun 9, 2025

Is this duplicative of the work we're doing to set provider-specific condensing thresholds?

If we do need this, I'm not sure it should be Gemini-only. We should probably discuss.

Hi @mrubens @hannesrudolph!! I looked into this a bit, and if you refer to this PR: #4456, then yes, it seems that it is duplicate work, but there are subtle differences.

In my opinion, it seems clearer for users to set it via context limit rather than thresholding if they have a specific goal for which token they want to cap at. Currently, users must search for the model's limit and calculate it for thresholding. But if they already know their token budgeting limit, it will be easier and 8E62 more direct for them to set via context limit.

Like anojndr has 250 000 TPM in this mind, if he set the context limit to 250 000, then the job is done. If he has to do it via thresholding, then he needs to calculate it and set the percentage. I think the latter is worse in terms of user experience.

However, I think this (meaning strategically setting context limit) can coexist with #4456 and be helpful in the future for cost calculation and budgeting features, e.g., for future intelligent cost optimization and monitoring systems. This will give users more finer-grained control and management over the cost.

Conclusion

I think both profile-specific thresholding and profile-specific context limit will coexist well with each other and give users finer-grained control over the cost. If we go this route and with this in mind, then yeah this should not be Gemini-specific.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request feature request Feature request, not a bug Issue - In Progress Someone is actively working on this. Should link to a PR soon.
Projects
Status: Issue [In Progress]
Development

Successfully merging a pull request may close this issue.

7 participants
0