Open
Description
I am building an AI applicatiom where I would like to host the LLM on my own Mac Studio using LM Studio as an OpenAI-compatible endpoint and MLX as backend
I use the "lms server start" command to run it on my machine.
It works, but I just have one question- does LM studio have any support for concurrency- like what if I get 3 requests together - how does that handling work?
Will they all just fall in some sort of “queue” where request get processed sequentially?
Or
If there is enough space in the RAM -> it will be able to handle multiple requests concurrently?
Metadata
Metadata
Assignees
Labels
No labels