forked from cline/cline
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Local model API request fails if prompt ingestion takes more than 10 minutes #3621
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Labels
bug
Something isn't working
Issue - In Progress
Someone is actively working on this. Should link to a PR soon.
Comments
This may come from the default 10 minutes OpenAI module timeout: A custom timeout would need to be passed for this provider. Other OpenAI compatible providers like Ollama and OpenAI Compatible appear to face the same issue when used with a slower local model and a large prompt. |
Sorry. Closed by accident. |
23 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
bug
Something isn't working
Issue - In Progress
Someone is actively working on this. Should link to a PR soon.
App Version
3.16.6
API Provider
LM Studio
Model Used
Qwen3-32B
🔁 Steps to Reproduce
I am trying to use a local Qwen3-32B model via llama.cpp. To do so, I use the LMStudio integration that I point to the local server. Everything works fine, but after 10 minutes (600 seconds), the connection is dropped and I get an API Request Failed message. The inference is on cpu and quite slow, but I would be happy to let it crunch while I'm doing something else. If I use the tiny Qwen3 0.6B model, the inference is fast enough and everything works as expected (although with very mediocre results).
When it fails, llama.cpp finishes processing the prompt anyway. It succeeds on retry, the prompt being already cached.
💥 Outcome Summary (Optional)
No response
📄 Relevant Logs or Errors
The text was updated successfully, but these errors were encountered: