[Bug]: 使用2 * 8 A100-80GB部署Qwen3-235B-A22B报错找不到指定网卡

Model Series

Qwen3

What are the models used?

Qwen3-235B-A22B

What is the scenario where the problem happened?

vllm

Is this a known issue?

I have followed the GitHub README.
I have checked the Qwen documentation and cannot find an answer there.
I have checked the documentation of the related framework and cannot find useful information.
I have searched the issues and there is not a similar one.

Information about environment

vllm/vllm-openai:v0.8.5

Log output

打印日志如下：
INFO 04-30 02:37:31 [init.py:239] Automatically detected platform cuda.
INFO 04-30 02:37:37 [api_server.py:1043] vLLM API server version 0.8.5
INFO 04-30 02:37:37 [api_server.py:1044] args: Namespace(subparser='serve', model_tag='/root/.cache/huggingface/Qwen3-235B-A22B', config='', host='0.0.0.0', port=8000, uvicorn_log_level='info', disable_uvicorn_access_log=False, allow_credentials=False, allowed_origins=[''], allowed_methods=[''], allowed_headers=['*'], api_key=None, lora_modules=None, prompt_adapters=None, chat_template=None, chat_template_content_format='auto', response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, enable_ssl_refresh=False, ssl_cert_reqs=0, root_path=None, middleware=[], return_tokens_as_token_ids=False, disable_frontend_multiprocessing=False, enable_request_id_headers=False, enable_auto_tool_choice=False, tool_call_parser=None, tool_parser_plugin='', model='/root/.cache/huggingface/Qwen3-235B-A22B', task='auto', tokenizer=None, hf_config_path=None, skip_tokenizer_init=False, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=True, allowed_local_media_path=None, load_format='auto', download_dir=None, model_loader_extra_config={}, use_tqdm_on_load=True, config_format=<ConfigFormat.AUTO: 'auto'>, dtype='auto', max_model_len=None, guided_decoding_backend='auto', reasoning_parser='deepseek_r1', logits_processor_pattern=None, model_impl='auto', distributed_executor_backend=None, pipeline_parallel_size=2, tensor_parallel_size=8, data_parallel_size=1, enable_expert_parallel=True, max_parallel_loading_workers=None, ray_workers_use_nsight=False, disable_custom_all_reduce=False, block_size=None, gpu_memory_utilization=0.9, swap_space=4, kv_cache_dtype='auto', num_gpu_blocks_override=None, enable_prefix_caching=None, prefix_caching_hash_algo='builtin', cpu_offload_gb=0, calculate_kv_scales=False, disable_sliding_window=False, use_v2_block_manager=True, seed=None, max_logprobs=20, disable_log_stats=False, quantization=None, rope_scaling=None, rope_theta=None, hf_token=None, hf_overrides=None, enforce_eager=False, max_seq_len_to_capture=8192, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config={}, limit_mm_per_prompt={}, mm_processor_kwargs=None, disable_mm_preprocessor_cache=False, enable_lora=None, enable_lora_bias=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', long_lora_scaling_factors=None, max_cpu_loras=None, fully_sharded_loras=False, enable_prompt_adapter=None, max_prompt_adapters=1, max_prompt_adapter_token=0, device='auto', speculative_config=None, ignore_patterns=[], served_model_name=None, qlora_adapter_name_or_path=None, show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, disable_async_output_proc=False, max_num_batched_tokens=None, max_num_seqs=None, max_num_partial_prefills=1, max_long_partial_prefills=1, long_prefill_token_threshold=0, num_lookahead_slots=0, scheduler_delay_factor=0.0, preemption_mode=None, num_scheduler_steps=1, multi_step_stream_outputs=True, scheduling_policy='fcfs', enable_chunked_prefill=None, disable_chunked_mm_input=False, scheduler_cls='vllm.core.scheduler.Scheduler', override_neuron_config=None, override_pooler_config=None, compilation_config=None, kv_transfer_config=None, worker_cls='auto', worker_extension_cls='', generation_config='auto', override_generation_config=None, enable_sleep_mode=False, additional_config=None, enable_reasoning=True, disable_cascade_attn=False, disable_log_requests=False, max_log_len=None, disable_fastapi_docs=False, enable_prompt_tokens_details=False, enable_server_load_tracking=False, dispatch_function=<function ServeSubcommand.cmd at 0x7f99737c7e20>)
INFO 04-30 02:37:45 [config.py:717] This model supports multiple tasks: {'embed', 'reward', 'score', 'generate', 'classify'}. Defaulting to 'generate'.
WARNING 04-30 02:37:45 [arg_utils.py:1658] Pipeline Parallelism without Ray distributed executor is not supported by the V1 Engine. Falling back to V0.
WARNING 04-30 02:37:45 [arg_utils.py:1525] Chunked prefill is enabled by default for models with max_model_len > 32K. Chunked prefill might not work with some features or models. If you encounter any issues, please disable by launching with --enable-chunked-prefill=False.
INFO 04-30 02:37:45 [config.py:1770] Defaulting to use ray for distributed inference
INFO 04-30 02:37:45 [confi
7A50
g.py:2003] Chunked prefill is enabled with max_num_batched_tokens=2048.
INFO 04-30 02:37:45 [llm_engine.py:240] Initializing a V0 LLM engine (v0.8.5) with config: model='/root/.cache/huggingface/Qwen3-235B-A22B', speculative_config=None, tokenizer='/root/.cache/huggingface/Qwen3-235B-A22B', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=40960, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=8, pipeline_parallel_size=2, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='auto', reasoning_backend='deepseek_r1'), observability_config=ObservabilityConfig(show_hidden_metrics=False, otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=None, served_model_name=/root/.cache/huggingface/Qwen3-235B-A22B, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=None, chunked_prefill_enabled=True, use_async_output_proc=False, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"splitting_ops":[],"compile_sizes":[],"cudagraph_capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":256}, use_cached_outputs=False,
2025-04-30 02:37:46,000 INFO worker.py:1654 -- Connecting to existing Ray cluster at address: 192.168.40.11:6379...
2025-04-30 02:37:46,012 INFO worker.py:1841 -- Connected to Ray cluster.
INFO 04-30 02:37:48 [ray_utils.py:335] No current placement group found. Creating a new placement group.
INFO 04-30 02:37:48 [ray_distributed_executor.py:176] use_ray_spmd_worker: False
(pid=717) INFO 04-30 02:37:52 [init.py:239] Automatically detected platform cuda.
(pid=289, ip=192.168.40.19) INFO 04-30 02:37:58 [init.py:239] Automatically detected platform cuda. [repeated 8x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#log-deduplication for more options.)
INFO 04-30 02:38:01 [ray_distributed_executor.py:352] non_carry_over_env_vars from config: set()
INFO 04-30 02:38:01 [ray_distributed_executor.py:354] Copying the following environment variables to workers: ['LD_LIBRARY_PATH', 'VLLM_USAGE_SOURCE', 'VLLM_WORKER_MULTIPROC_METHOD', 'VLLM_USE_V1']
INFO 04-30 02:38:01 [ray_distributed_executor.py:357] If certain env vars should NOT be copied to workers, add them to /root/.config/vllm/ray_non_carry_over_env_vars.json file
INFO 04-30 02:38:01 [cuda.py:292] Using Flash Attention backend.
(RayWorkerWrapper pid=722) INFO 04-30 02:38:02 [cuda.py:292] Using Flash Attention backend.
(pid=295, ip=192.168.40.19) INFO 04-30 02:37:58 [init.py:239] Automatically detected platform cuda. [repeated 7x across cluster]
(RayWorkerWrapper pid=721) ERROR 04-30 02:38:09 [worker_base.py:620] Error executing method 'init_device'. This might cause deadlock in distributed execution.
(RayWorkerWrapper pid=721) ERROR 04-30 02:38:09 [worker_base.py:620] Traceback (most recent call last):
(RayWorkerWrapper pid=721) ERROR 04-30 02:38:09 [worker_base.py:620] File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker_base.py", line 612, in execute_method
(RayWorkerWrapper pid=721) ERROR 04-30 02:38:09 [worker_base.py:620] return run_method(self, method, args, kwargs)
(RayWorkerWrapper pid=721) ERROR 04-30 02:38:09 [worker_base.py:620] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(RayWorkerWrapper pid=721) ERROR 04-30 02:38:09 [worker_base.py:620] File "/usr/local/lib/python3.12/dist-packages/vllm/utils.py", line 2456, in run_method
(RayWorkerWrapper pid=721) ERROR 04-30 02:38:09 [worker_base.py:620] return func(*args, **kwargs)
(RayWorkerWrapper pid=721) ERROR 04-30 02:38:09 [worker_base.py:620] ^^^^^^^^^^^^^^^^^^^^^
(RayWorkerWrapper pid=721) ERROR 04-30 02:38:09 [worker_base.py:620] File "/usr/local/lib/python3.12/dist-packages/ray/util/tracing/tracing_helper.py", line 463, in _resume_span
(RayWorkerWrapper pid=721) ERROR 04-30 02:38:09 [worker_base.py:620] return method(self, *_args, **_kwargs)
(RayWorkerWrapper pid=721) ERROR 04-30 02:38:09 [worker_base.py:620] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(RayWorkerWrapper pid=721) ERROR 04-30 02:38:09 [worker_base.py:620] File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker_base.py", line 604, in init_device
(RayWorkerWrapper pid=721) ERROR 04-30 02:38:09 [worker_base.py:620] self.worker.init_device() # type: ignore
(RayWorkerWrapper pid=721) ERROR 04-30 02:38:09 [worker_base.py:620] ^^^^^^^^^^^^^^^^^^^^^^^^^
(RayWorkerWrapper pid=721) ERROR 04-30 02:38:09 [worker_base.py:620] File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker.py", line 186, in init_device
(RayWorkerWrapper pid=721) ERROR 04-30 02:38:09 [worker_base.py:620] init_worker_distributed_environment(self.vllm_config, self.rank,
(RayWorkerWrapper pid=721) ERROR 04-30 02:38:09 [worker_base.py:620] File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker.py", line 525, in init_worker_distributed_environment
(RayWorkerWrapper pid=721) ERROR 04-30 02:38:09 [worker_base.py:620] init_distributed_environment(parallel_config.world_size, rank,
(RayWorkerWrapper pid=721) ERROR 04-30 02:38:09 [worker_base.py:620] File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/parallel_state.py", line 909, in init_distributed_environment
(RayWorkerWrapper pid=721) ERROR 04-30 02:38:09 [worker_base.py:620] _WORLD = init_world_group(ranks, local_rank, backend)
(RayWorkerWrapper pid=721) ERROR 04-30 02:38:09 [worker_base.py:620] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(RayWorkerWrapper pid=721) ERROR 04-30 02:38:09 [worker_base.py:620] File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/parallel_state.py", line 771, in init_world_group
(RayWorkerWrapper pid=721) ERROR 04-30 02:38:09 [worker_base.py:620] return GroupCoordinator(
(RayWorkerWrapper pid=721) ERROR 04-30 02:38:09 [worker_base.py:620] ^^^^^^^^^^^^^^^^^
(RayWorkerWrapper pid=721) ERROR 04-30 02:38:09 [worker_base.py:620] File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/parallel_state.py", line 225, in init
(RayWorkerWrapper pid=721) ERROR 04-30 02:38:09 [worker_base.py:620] cpu_group = torch.distributed.new_group(ranks, backend="gloo")
(RayWorkerWrapper pid=721) ERROR 04-30 02:38:09 [worker_base.py:620] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(RayWorkerWrapper pid=721) ERROR 04-30 02:38:09 [worker_base.py:620] File "/usr/local/lib/python3.12/dist-packages/torch/distributed/c10d_logger.py", line 95, in wrapper
(RayWorkerWrapper pid=721) ERROR 04-30 02:38:09 [worker_base.py:620] func_return = func(*args, **kwargs)
(RayWorkerWrapper pid=721) ERROR 04-30 02:38:09 [worker_base.py:620] ^^^^^^^^^^^^^^^^^^^^^
(RayWorkerWrapper pid=721) ERROR 04-30 02:38:09 [worker_base.py:620] File "/usr/local/lib/python3.12/dist-packages/torch/distributed/distributed_c10d.py", line 4981, in new_group
(RayWorkerWrapper pid=721) ERROR 04-30 02:38:09 [worker_base.py:620] return _new_group_with_tag(
(RayWorkerWrapper pid=721) ERROR 04-30 02:38:09 [worker_base.py:620] ^^^^^^^^^^^^^^^^^^^^
(RayWorkerWrapper pid=721) ERROR 04-30 02:38:09 [worker_base.py:620] File "/usr/local/lib/python3.12/dist-packages/torch/distributed/distributed_c10d.py", line 5071, in _new_group_with_tag
(RayWorkerWrapper pid=721) ERROR 04-30 02:38:09 [worker_base.py:620] pg, pg_store = _new_process_group_helper(
(RayWorkerWrapper pid=721) ERROR 04-30 02:38:09 [worker_base.py:620] ^^^^^^^^^^^^^^^^^^^^^^^^^^
(RayWorkerWrapper pid=721) ERROR 04-30 02:38:09 [worker_base.py:620] File "/usr/local/lib/python3.12/dist-packages/torch/distributed/distributed_c10d.py", line 1953, in _new_process_group_helper
(RayWorkerWrapper pid=721) ERROR 04-30 02:38:09 [worker_base.py:620] backend_class = ProcessGroupGloo(
(RayWorkerWrapper pid=721) ERROR 04-30 02:38:09 [worker_base.py:620] ^^^^^^^^^^^^^^^^^
(RayWorkerWrapper pid=721) ERROR 04-30 02:38:09 [worker_base.py:620] RuntimeError: [enforce fail at /pytorch/third_party/gloo/gloo/transport/tcp/device.cc:83] ifa != nullptr. Unable to find address for: enp199s0f0
(RayWorkerWrapper pid=295, ip=192.168.40.19) INFO 04-30 02:38:03 [cuda.py:292] Using Flash Attention backend. [repeated 14x across cluster]

Description

镜像为vllm/vllm-openai:v0.8.5

通过examples/online_serving/run_cluster.sh (https://github.com/vllm-project/vllm/blob/main/examples/online_serving/run_cluster.sh) 创建了2个node

然后在主节点运行NCCL_SOCKET_IFNAME=enp199s0f0 GLOO_SOCKET_IFNAME=enp199s0f0 vllm serve /root/.cache/huggingface/Qwen3-235B-A22B --tensor-parallel-size 8 --pipeline-parallel-size 2 --enable-reasoning --reasoning-parser deepseek_r1 --trust-remote-code --host 0.0.0.0 --enable-expert-parallel

执行ifconfig，得到两台机器的网卡都是 enp199s0f0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Model Series

What are the models used?

What is the scenario where the problem happened?

Is this a known issue?

Information about environment

Log output

Description

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Description

Model Series

What are the models used?

What is the scenario where the problem happened?

Is this a known issue?

Information about environment

Log output

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions