Support for concurrent requests with LM Studio server

I am building an AI applicatiom where I would like to host the LLM on my own Mac Studio using LM Studio as an OpenAI-compatible endpoint and MLX as backend

I use the "lms server start" command to run it on my machine.

It works, but I just have one question- does LM studio have any support for concurrency- like what if I get 3 requests together - how does that handling work?

Will they all just fall in some sort of “queue” where request get processed sequentially?

Or

If there is enough space in the RAM -> it will be able to handle multiple requests concurrently?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions