[<Ray component: Core|RLlib|etc...>] Ray Timeout Error running VLLM Multi-Node(tp_size=2) Online Server with Acl_Graph when handling curl request

What happened + What you expected to happen

Online Server work fine when idle

But when sending curl request, Error occur:

INFO 06-16 10:59:12 [loggers.py:116] Engine 000: Avg prompt throughput: 0.4 tokens/s, Avg generation throughput: 0.2 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
ERROR 06-16 10:59:13 [dump_input.py:68] Dumping input data
ERROR 06-16 10:59:13 [dump_input.py:70] V1 LLM engine (v0.8.5.dev648+g55aa7af99.d20250609) with config: model='/home/b30071341/online/Qwen2.5-0.5B-Instruct/', speculative_config=None, tokenizer='/home/b30071341/online/Qwen2.5-0.5B-Instruct/', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=2, pipeline_parallel_size=1, disable_custom_all_reduce=True, quantization=None, enforce_eager=False, kv_cache_dtype=auto,  device_config=npu, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=/home/b30071341/online/Qwen2.5-0.5B-Instruct/, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, pooler_config=None, compilation_config={"level": 3, "custom_ops": ["all"], "splitting_ops": ["vllm.unified_attention", "vllm.unified_attention_with_output", "vllm.unified_ascend_attention_with_output", "vllm.unified_ascend_mla_attention_with_output", "vllm.unified_ascend_attention_with_output", "vllm.unified_ascend_mla_attention_with_output"], "use_inductor": false, "compile_sizes": [], "use_cudagraph": true, "cudagraph_num_of_warmups": 1, "cudagraph_capture_sizes": [4, 2, 1], "max_capture_size": 4},
ERROR 06-16 10:59:13 [dump_input.py:78] Dumping scheduler output for model execution:
ERROR 06-16 10:59:13 [dump_input.py:79] SchedulerOutput(scheduled_new_reqs=[],scheduled_cached_reqs=[CachedRequestData(req_id='{obj}',resumed_from_preemption=false,new_token_ids=[78],new_block_ids=[],num_computed_tokens=5)],num_scheduled_tokens={cmpl-5adf4710158f4cd8906aa5092b26365c-0: 1},total_num_scheduled_tokens=1,scheduled_spec_decode_tokens={},scheduled_encoder_inputs={},num_common_prefix_blocks=1,finished_req_ids=[],free_encoder_input_ids=[],structured_output_request_ids={},grammar_bitmask=null,kv_connector_metadata=null)
ERROR 06-16 10:59:13 [dump_input.py:81] SchedulerStats(num_running_reqs=1, num_waiting_reqs=0, gpu_cache_usage=1.1388464510320162e-06, prefix_cache_stats=PrefixCacheStats(reset=False, requests=0, queries=0, hits=0), spec_decoding_stats=None)
ERROR 06-16 10:59:13 [core.py:505] EngineCore encountered a fatal error.
ERROR 06-16 10:59:13 [core.py:505] Traceback (most recent call last):
ERROR 06-16 10:59:13 [core.py:505]   File "/usr/local/python3.11/lib/python3.11/site-packages/ray/dag/compiled_dag_node.py", line 2515, in _execute_until
ERROR 06-16 10:59:13 [core.py:505]     result = self._dag_output_fetcher.read(timeout)
ERROR 06-16 10:59:13 [core.py:505]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-16 10:59:13 [core.py:505]   File "/usr/local/python3.11/lib/python3.11/site-packages/ray/experimental/channel/common.py", line 309, in read
ERROR 06-16 10:59:13 [core.py:505]     outputs = self._read_list(timeout)
ERROR 06-16 10:59:13 [core.py:505]               ^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-16 10:59:13 [core.py:505]   File "/usr/local/python3.11/lib/python3.11/site-packages/ray/experimental/channel/common.py", line 400, in _read_list
ERROR 06-16 10:59:13 [core.py:505]     raise e
ERROR 06-16 10:59:13 [core.py:505]   File "/usr/local/python3.11/lib/python3.11/site-packages/ray/experimental/channel/common.py", line 382, in _read_list
ERROR 06-16 10:59:13 [core.py:505]     result = c.read(min(remaining_timeout, iteration_timeout))
ERROR 06-16 10:59:13 [core.py:505]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-16 10:59:13 [core.py:505]   File "/usr/local/python3.11/lib/python3.11/site-packages/ray/experimental/channel/shared_memory_channel.py", line 776, in read
ERROR 06-16 10:59:13 [core.py:505]     return self._channel_dict[self._resolve_actor_id()].read(timeout)
ERROR 06-16 10:59:13 [core.py:505]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-16 10:59:13 [core.py:505]   File "/usr/local/python3.11/lib/python3.11/site-packages/ray/experimental/channel/shared_memory_channel.py", line 612, in read
ERROR 06-16 10:59:13 [core.py:505]     output = self._buffers[self._next_read_index].read(timeout)
ERROR 06-16 10:59:13 [core.py:505]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-16 10:59:13 [core.py:505]   File "/usr/local/python3.11/lib/python3.11/site-packages/ray/experimental/channel/shared_memory_channel.py", line 480, in read
ERROR 06-16 10:59:13 [core.py:505]     ret = self._worker.get_objects(
ERROR 06-16 10:59:13 [core.py:505]           ^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-16 10:59:13 [core.py:505]   File "/usr/local/python3.11/lib/python3.11/site-packages/ray/_private/worker.py", line 911, in get_objects
ERROR 06-16 10:59:13 [core.py:505]     ] = self.core_worker.get_objects(
ERROR 06-16 10:59:13 [core.py:505]         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-16 10:59:13 [core.py:505]   File "python/ray/_raylet.pyx", line 3162, in ray._raylet.CoreWorker.get_objects
ERROR 06-16 10:59:13 [core.py:505]   File "python/ray/includes/common.pxi", line 106, in ray._raylet.check_status
ERROR 06-16 10:59:13 [core.py:505] ray.exceptions.RayChannelTimeoutError: System error: Timed out waiting for object available to read. ObjectID: 00cc316456d39201286bda103a37bbae72e42e380100000004e1f505
ERROR 06-16 10:59:13 [core.py:505]
ERROR 06-16 10:59:13 [core.py:505] The above exception was the direct cause of the following exception:
ERROR 06-16 10:59:13 [core.py:505]
ERROR 06-16 10:59:13 [core.py:505] Traceback (most recent call last):
ERROR 06-16 10:59:13 [core.py:505]   File "/home/b30071341/online/vllm/vllm/v1/engine/core.py", line 496, in run_engine_core
ERROR 06-16 10:59:13 [core.py:505]     engine_core.run_busy_loop()
ERROR 06-16 10:59:13 [core.py:505]   File "/home/b30071341/online/vllm/vllm/v1/engine/core.py", line 523, in run_busy_loop
ERROR 06-16 10:59:13 [core.py:505]     self._process_engine_step()
ERROR 06-16 10:59:13 [core.py:505]   File "/home/b30071341/online/vllm/vllm/v1/engine/core.py", line 548, in _process_engine_step
ERROR 06-16 10:59:13 [core.py:505]     outputs = self.step_fn()
ERROR 06-16 10:59:13 [core.py:505]               ^^^^^^^^^^^^^^
ERROR 06-16 10:59:13 [core.py:505]   File "/home/b30071341/online/vllm/vllm/v1/engine/core.py", line 233, in step
ERROR 06-16 10:59:13 [core.py:505]     model_output = self.execute_model(scheduler_output)
ERROR 06-16 10:59:13 [core.py:505]                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-16 10:59:13 [core.py:505]   File "/home/b30071341/online/vllm/vllm/v1/engine/core.py", line 213, in execute_model
ERROR 06-16 10:59:13 [core.py:505]     raise err
ERROR 06-16 10:59:13 [core.py:505]   File "/home/b30071341/online/vllm/vllm/v1/engine/core.py", line 207, in execute_model
ERROR 06-16 10:59:13 [core.py:505]     return self.model_executor.execute_model(scheduler_output)
ERROR 06-16 10:59:13 [core.py:505]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-16 10:59:13 [core.py:505]   File "/home/b30071341/online/vllm/vllm/v1/executor/ray_distributed_executor.py", line 61, in execute_model
ERROR 06-16 10:59:13 [core.py:505]     return refs[0].get()
ERROR 06-16 10:59:13 [core.py:505]            ^^^^^^^^^^^^^
ERROR 06-16 10:59:13 [core.py:505]   File "/usr/local/python3.11/lib/python3.11/site-packages/ray/experimental/compiled_dag_ref.py", line 115, in get
ERROR 06-16 10:59:13 [core.py:505]     self._dag._execute_until(
ERROR 06-16 10:59:13 [core.py:505]   File "/usr/local/python3.11/lib/python3.11/site-packages/ray/dag/compiled_dag_node.py", line 2525, in _execute_until
ERROR 06-16 10:59:13 [core.py:505]     raise RayChannelTimeoutError(
ERROR 06-16 10:59:13 [core.py:505] ray.exceptions.RayChannelTimeoutError: System error: If the execution is expected to take a long time, increase RAY_CGRAPH_get_timeout which is currently 5 seconds. Otherwise, this may indicate that the execution is hanging.
INFO 06-16 10:59:13 [ray_distributed_executor.py:128] Shutting down Ray distributed executor. If you see error log from logging.cc regarding SIGTERM received, please ignore because this is the expected termination process in Ray.
2025-06-16 10:59:13,465 INFO compiled_dag_node.py:2157 -- Tearing down compiled DAG
ERROR 06-16 10:59:13 [async_llm.py:403] AsyncLLM output_handler failed.
ERROR 06-16 10:59:13 [async_llm.py:403] Traceback (most recent call last):
ERROR 06-16 10:59:13 [async_llm.py:403]   File "/home/b30071341/online/vllm/vllm/v1/engine/async_llm.py", line 361, in output_handler
ERROR 06-16 10:59:13 [async_llm.py:403]     outputs = await engine_core.get_output_async()
ERROR 06-16 10:59:13 [async_llm.py:403]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-16 10:59:13 [async_llm.py:403]   File "/home/b30071341/online/vllm/vllm/v1/engine/core_client.py", line 806, in get_output_async
ERROR 06-16 10:59:13 [async_llm.py:403]     raise self._format_exception(outputs) from None
ERROR 06-16 10:59:13 [async_llm.py:403] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
2025-06-16 10:59:13,466 INFO compiled_dag_node.py:2162 -- Cancelling compiled worker on actor: Actor(RayWorkerWrapper, 286bda103a37bbae72e42e3801000000)
INFO 06-16 10:59:13 [async_llm.py:328] Request cmpl-5adf4710158f4cd8906aa5092b26365c-0 failed (engine dead).
2025-06-16 10:59:13,466 INFO compiled_dag_node.py:2162 -- Cancelling compiled worker on actor: Actor(RayWorkerWrapper, 75817d3c18f84417f2e95ef401000000)
INFO:     127.0.0.1:59172 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error
2025-06-16 10:59:13,475 INFO compiled_dag_node.py:2184 -- Waiting for worker tasks to exit
INFO:     Shutting down
INFO:     Waiting for application shutdown.
INFO:     Application shutdown complete.
INFO:     Finished server process [2566602]
*** SIGTERM received at time=1750042753 on cpu 168 ***
PC: @     0xffffb7041ea0  (unknown)  (unknown)
    @     0xfffd4383c948        464  absl::lts_20230802::AbslFailureSignalHandler()
    @     0xffffb72607dc  742831744  (unknown)
    @     0xffffb7044e50        176  pthread_cond_timedwait
    @     0xfffd42e6bdfc         96  ray::core::GetRequest::Wait()
    @     0xfffd42e6e408       1184  ray::core::CoreWorkerMemoryStore::GetImpl()
    @     0xfffd42e6ed60       1440  ray::core::CoreWorkerMemoryStore::Get()
    @     0xfffd42e6efc4         32  ray::core::CoreWorkerMemoryStore::Get()
    @     0xfffd42d9755c        208  ray::core::CoreWorker::GetObjects()
    @     0xfffd42d9cf64       1744  ray::core::CoreWorker::Get()
    @     0xfffd42ccc584        176  __pyx_pw_3ray_7_raylet_10CoreWorker_43get_objects()
    @     0xaaaad75d4cb8       1232  _PyEval_EvalFrameDefault
    @     0xaaaad75393cc        208  _PyFunction_Vectorcall
    @     0xaaaad75d95c8       1120  _PyEval_EvalFrameDefault
    @     0xaaaad75393cc        208  _PyFunction_Vectorcall
    @     0xaaaad75d95c8       1120  _PyEval_EvalFrameDefault
    @     0xaaaad75393cc        208  _PyFunction_Vectorcall
    @     0xaaaad75d95c8       1120  _PyEval_EvalFrameDefault
    @     0xaaaad76ee88c        208  _PyEval_Vector
    @     0xaaaad76ed614        160  PyEval_EvalCode
    @     0xaaaad773c120         64  run_mod
    @     0xaaaad773c270         80  PyRun_SimpleStringFlags
    @     0xaaaad7753224         64  pymain_run_command
    @     0xaaaad7752c20        112  Py_RunMain
    @     0xaaaad743ee68         96  main
    @     0xffffb6fe84c4        272  (unknown)
    @     0xffffb6fe8598         16  __libc_start_main
[2025-06-16 10:59:13,704 E 2567087 2567087] logging.cc:496: *** SIGTERM received at time=1750042753 on cpu 168 ***
[2025-06-16 10:59:13,704 E 2567087 2567087] logging.cc:496: PC: @     0xffffb7041ea0  (unknown)  (unknown)
[2025-06-16 10:59:13,711 E 2567087 2567087] logging.cc:496:     @     0xfffd4383c970        464  absl::lts_20230802::AbslFailureSignalHandler()
[2025-06-16 10:59:13,715 E 2567087 2567087] logging.cc:496:     @     0xffffb72607dc  742831744  (unknown)
[2025-06-16 10:59:13,715 E 2567087 2567087] logging.cc:496:     @     0xffffb7044e50        176  pthread_cond_timedwait
[2025-06-16 10:59:13,715 E 2567087 2567087] logging.cc:496:     @     0xfffd42e6bdfc         96  ray::core::GetRequest::Wait()
[2025-06-16 10:59:13,715 E 2567087 2567087] logging.cc:496:     @     0xfffd42e6e408       1184  ray::core::CoreWorkerMemoryStore::GetImpl()
[2025-06-16 10:59:13,715 E 2567087 2567087] logging.cc:496:     @     0xfffd42e6ed60       1440  ray::core::CoreWorkerMemoryStore::Get()
[2025-06-16 10:59:13,715 E 2567087 2567087] logging.cc:496:     @     0xfffd42e6efc4         32  ray::core::CoreWorkerMemoryStore::Get()
[2025-06-16 10:59:13,715 E 2567087 2567087] logging.cc:496:     @     0xfffd42d9755c        208  ray::core::CoreWorker::GetObjects()
[2025-06-16 10:59:13,715 E 2567087 2567087] logging.cc:496:     @     0xfffd42d9cf64       1744  ray::core::CoreWorker::Get()
[2025-06-16 10:59:13,715 E 2567087 2567087] logging.cc:496:     @     0xfffd42ccc584        176  __pyx_pw_3ray_7_raylet_10CoreWorker_43get_objects()
[2025-06-16 10:59:13,715 E 2567087 2567087] logging.cc:496:     @     0xaaaad75d4cb8       1232  _PyEval_EvalFrameDefault
[2025-06-16 10:59:13,715 E 2567087 2567087] logging.cc:496:     @     0xaaaad75393cc        208  _PyFunction_Vectorcall
[2025-06-16 10:59:13,715 E 2567087 2567087] logging.cc:496:     @     0xaaaad75d95c8       1120  _PyEval_EvalFrameDefault
[2025-06-16 10:59:13,715 E 2567087 2567087] logging.cc:496:     @     0xaaaad75393cc        208  _PyFunction_Vectorcall
[2025-06-16 10:59:13,715 E 2567087 2567087] logging.cc:496:     @     0xaaaad75d95c8       1120  _PyEval_EvalFrameDefault
[2025-06-16 10:59:13,715 E 2567087 2567087] logging.cc:496:     @     0xaaaad75393cc        208  _PyFunction_Vectorcall
[2025-06-16 10:59:13,715 E 2567087 2567087] logging.cc:496:     @     0xaaaad75d95c8       1120  _PyEval_EvalFrameDefault
[2025-06-16 10:59:13,715 E 2567087 2567087] logging.cc:496:     @     0xaaaad76ee88c        208  _PyEval_Vector
[2025-06-16 10:59:13,715 E 2567087 2567087] logging.cc:496:     @     0xaaaad76ed614        160  PyEval_EvalCode
[2025-06-16 10:59:13,715 E 2567087 2567087] logging.cc:496:     @     0xaaaad773c120         64  run_mod
[2025-06-16 10:59:13,715 E 2567087 2567087] logging.cc:496:     @     0xaaaad773c270         80  PyRun_SimpleStringFlags
[2025-06-16 10:59:13,715 E 2567087 2567087] logging.cc:496:     @     0xaaaad7753224         64  pymain_run_command
[2025-06-16 10:59:13,715 E 2567087 2567087] logging.cc:496:     @     0xaaaad7752c20        112  Py_RunMain
[2025-06-16 10:59:13,715 E 2567087 2567087] logging.cc:496:     @     0xaaaad743ee68         96  main
[2025-06-16 10:59:13,717 E 2567087 2567087] logging.cc:496:     @     0xffffb6fe84c4        272  (unknown)
[2025-06-16 10:59:13,717 E 2567087 2567087] logging.cc:496:     @     0xffffb6fe8598         16  __libc_start_main
2025-06-16 10:59:14,476 INFO compiled_dag_node.py:2157 -- Tearing down compiled DAG
2025-06-16 10:59:14,476 INFO compiled_dag_node.py:2162 -- Cancelling compiled worker on actor: Actor(RayWorkerWrapper, 286bda103a37bbae72e42e3801000000)
2025-06-16 10:59:14,477 INFO compiled_dag_node.py:2162 -- Cancelling compiled worker on actor: Actor(RayWorkerWrapper, 75817d3c18f84417f2e95ef401000000)
2025-06-16 10:59:14,483 INFO compiled_dag_node.py:2184 -- Waiting for worker tasks to exit
/usr/local/python3.11/lib/python3.11/tempfile.py:934: ResourceWarning: Implicitly cleaning up <TemporaryDirectory '/tmp/tmp45p4rged'>
  _warnings.warn(warn_message, ResourceWarning)

Versions / Dependencies

environment

python: 3.11.11
pytorch: 2.5.1
vllm: HEAD detached at 55aa7af99
vllm-ascend: On branch graph
ray: 2.47.0
cann: 
Version=7.8.T5.0.B028
version_dir=8.2.RC1
timestamp=20250522_142641356
runtime_acl_version=1.0
runtime_dvpp_version=1.0
required_driver_ascendhal_version=4.0.0
required_driver_dvppkernels_version=1.1
required_driver_tsfw_version=1.0
required_opp_abi_version=">=6.3, <=7.8"
required_package_amct_acl_version="7.8"
required_package_aoe_version="7.8"
required_package_compiler_version="7.8"
required_package_fwkplugin_version="7.8"
required_package_hccl_version="7.8"
required_package_nca_version="7.8"
required_package_ncs_version="7.8"
required_package_opp_version="7.8"
required_package_opp_kernel_version=">=7.6, <=7.8"
required_package_toolkit_version="7.8"

Reproduction script

start ray master node

export HCCL_PORT=9301
export HCCL_IF_IP=xxx.xxx.xxx.xxx
export GLOO_SOCKET_IFNAME=enp189s0f0
export TP_SOCKET_IFNAME=enp189s0f0
export RAY_EXPERIMENTAL_NOSET_ASCEND_RT_VISIBLE_DEVICES=1
# export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export ASCEND_RT_VISIBLE_DEVICES=7
export VLLM_USE_V1=1
export VLLM_LOGGING_LEVEL="DEBUG"
export HCCL_ENTRY_LOG_ENABLE=1
export PYTHONPATH="${PYTHONPATH}:/home/online/vllm/:/home/online/vllm-ascend/"
export RAY_TMPDIR=/tmp/ray
export ASCEND_HOST_LOG_FILE_NUM=1000
export RAY_CGRAPH_get_timeout=10
export TASK_QUEUE_ENABLE=0
export HCCL_ASYNC_ERROR_HANDLING=0
export HCCL_EXEC_TIMEOUT=120
export HCCL_CONNECT_TIMEOUT=120
# export ACL_DEVICE_SYNC_TIMEOUT=120
export RAY_CGRAPH_get_timeout=5
# export VLLM_TORCH_PROFILER_DIR=/home/online/profile
export ASCEND_GLOBAL_LOG_LEVEL=1
export ASCEND_SLOG_PRINT_TO_STDOUT=0
export ASCEND_GLOBAL_EVENT_ENABLE=1

rm -f log.txt
ray stop --force
ray start --head --port=6379 --num-gpus=8

start ray slave node

export HCCL_PORT=9301
export HCCL_IF_IP=xxx.xxx.xxx.xxx
export GLOO_SOCKET_IFNAME=enp189s0f0
export TP_SOCKET_IFNAME=enp189s0f0
export RAY_EXPERIMENTAL_NOSET_ASCEND_RT_VISIBLE_DEVICES=1
# export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export ASCEND_RT_VISIBLE_DEVICES=7
export VLLM_USE_V1=1
export VLLM_LOGGING_LEVEL="DEBUG"
export HCCL_ENTRY_LOG_ENABLE=1
export PYTHONPATH="${PYTHONPATH}:/home/online/vllm/:/home/online/vllm-ascend/"
export RAY_TMPDIR=/tmp/ray
export ASCEND_HOST_LOG_FILE_NUM=1000
export RAY_CGRAPH_get_timeout=10
export TASK_QUEUE_ENABLE=0
export HCCL_ASYNC_ERROR_HANDLING=0
export HCCL_EXEC_TIMEOUT=120
export HCCL_CONNECT_TIMEOUT=120
# export ACL_DEVICE_SYNC_TIMEOUT=120
# export VLLM_TORCH_PROFILER_DIR=/home/online/profile
export RAY_CGRAPH_get_timeout=5
export ASCEND_GLOBAL_LOG_LEVEL=1
export ASCEND_SLOG_PRINT_TO_STDOUT=0
export ASCEND_GLOBAL_EVENT_ENABLE=1

rm -f log.txt
ray stop --force
ray start --address='master_node_ip:port' --num-gpus=8 --node-ip-address=xxx.xxx.xxx.xxx

start server command

python -m vllm.entrypoints.openai.api_server \
       --model="/home/online/Qwen2.5-0.5B-Instruct/" \
       --trust-remote-code \
       --distributed_executor_backend "ray" \
       --tensor-parallel-size 2 \
       --disable-frontend-multiprocessing \
       --port 8888

curl request

curl -X POST http://127.0.0.1:8888/v1/completions       -H "Content-Type: application/json"      -d '{
         "model": "/home/online/Qwen2.5-0.5B-Instruct/",
         "prompt": "AI的未来是",
         "max_tokens": 24
     }'

Issue Severity

None

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

What happened + What you expected to happen

Versions / Dependencies

Reproduction script

Issue Severity

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Description

What happened + What you expected to happen

Versions / Dependencies

Reproduction script

Issue Severity

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions