-
Notifications
You must be signed in to change notification settings - Fork 6.5k
Pass parameters to custom routers through LLMConfig #53870
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
from ray import serve
from ray.serve.llm import LLMConfig, build_openai_app
from ray.serve._private.request_router.prefix_aware_router import PrefixAwarePow2ReplicaRouter
llm_config = LLMConfig(
model_loading_config=dict(
model_id="deepseek",
model_source="qwen/Qwen2.5-7B-Instruct",
),
runtime_env=dict(
env_vars={"VLLM_USE_V1": "1"}
),
deployment_config=dict(
autoscaling_config=dict(min_replicas=1, max_replicas=1),
request_router_class=PrefixAwarePow2ReplicaRouter,
request_router_kwargs=dict(
imbalanced_threshold=9,
)
),
engine_kwargs=dict(
tensor_parallel_size=2,
pipeline_parallel_size=2,
gpu_memory_utilization=0.92,
dtype="auto",
max_num_seqs=40,
max_model_len=16384,
enable_chunked_prefill=True,
enable_prefix_caching=True,
trust_remote_code=True,
),
)
app = build_openai_app({"llm_configs": [llm_config]})
serve.run(app, blocking=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enables passing custom keyword arguments (request_router_kwargs
) through the Serve configuration to user‐provided request router classes. Additionally, it refactors the prefix‐aware router’s eviction loop from asyncio
tasks to a background thread and updates related tests.
- Add
request_router_kwargs
field in protobuf, config model, deployment API, and router - Wire serialization/deserialization of
request_router_kwargs
- Refactor eviction loop in
prefix_tree.py
fromasyncio
tothreading
- Update tests to call the router’s private selection method
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.
Show a summary per file
File | Description |
---|---|
src/ray/protobuf/serve.proto | Add new bytes request_router_kwargs field to DeploymentConfig |
python/ray/serve/deployment.py | Expose request_router_kwargs in options() and apply to config |
python/ray/serve/_private/router.py | Forward request_router_kwargs to router constructor and store it |
python/ray/serve/_private/config.py | Define request_router_kwargs , validate JSON, and handle proto I/O |
python/ray/llm/tests/serve/cpu/deployments/test_prefix_aware_request_router.py | Replace public call with private _choose_replica_for_request |
python/ray/llm/_internal/serve/request_router/prefix_aware/prefix_tree.py | Convert eviction loop from asyncio to a background threading.Thread |
Comments suppressed due to low confidence (2)
python/ray/serve/deployment.py:241
- [nitpick] Inserting a new parameter into the middle of the
options()
signature can break callers using positional arguments. Consider making it keyword-only or placing it at the end with a default.
request_router_kwargs: Default[Union[Dict, None]] = DEFAULT.VALUE,
python/ray/llm/tests/serve/cpu/deployments/test_prefix_aware_request_router.py:127
- [nitpick] The test now calls a private method (
_choose_replica_for_request
) instead of the public API (choose_replica_for_request
). It's better to test via the public interface to avoid coupling to internal implementation.
chosen = await prefix_request_router._choose_replica_for_request(req)
python/ray/llm/_internal/serve/request_router/prefix_aware/prefix_tree.py
Outdated
Show resolved
Hide resolved
python/ray/llm/_internal/serve/request_router/prefix_aware/prefix_tree.py
Outdated
Show resolved
Hide resolved
9e660d4
to
58174ac
Compare
Hi @kouroshHakha! This is ready for your review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two points:
- Let's separate out serve only changes from the eviction thread changes and review the serve changes with serve team
- Let's talk about the request router kwargs. The original intention of the design was to not expose the complexity of the constructor of the RequestRouter to the user. Right now the request_router_kwargs are passed through to the constructor which inflates the other kwargs that were supposed to stay hidden. Here is my proposal:
Modify the RequestRouter's constructor and interface this way:
class RequestRouter:
def __init__(self, ..., custom_init_kwargs=...)
...
self.init(**custom_init_kwargs)
def init(**kwargs):
# custom initialization for the Request Router. Called after the base constructor __init__ is done.
This way when I inherit this class I can simply do:
class MyRouter(RequestRouter)
def init(self, param1=None)
self.param1 = param1
def choose_replica(...):
# create a policy based on self.param1
@serve.deployment(request_router_class=MyRouter, request_router_init_kwargs={"param1": 10})
class MyDeployment:
....
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Seiji Eicher <58963096+eicherseiji@users.noreply.github.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
768de7c
to
98e11ef
Compare
…ng user typos Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Called at the end of RequestRouter.__init__ with request_router_kwargs | ||
from DeploymentConfig . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment needs to be updated?
default=DEFAULT_REQUEST_ROUTER_PATH | ||
) | ||
# Keyword arguments that will be passed to the | ||
# request router class __init__ method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment needs to be updated?
// Timeout after which a replica started a record routing stats without a response. | ||
double request_routing_stats_timeout_s = 4; | ||
|
||
// kwargs which will be passed to the router class' __init__ method |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment here
Why are these changes needed?
Follow ups to #52725.
We also discussed passing kwargs directly to the derived class'
__init__
, but concerned that this may lead to typos getting swallowed by kwargs in RequestRouter.init. Instead, initialize_state without kwargs can throw a TypeError, e.g.:Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.