realtime server CUDA_VISIBLE_DEVICES不生效对于VLLM导致两个模型都在一个GPU上加载 #83

Coding-Zuo · 2025-01-09T03:29:14Z

realtime server 方面的问题
CUDA_VISIBLE_DEVICES不生效对于VLLM导致两个模型都在一个GPU上加载导致OOM

os.environ["CUDA_VISIBLE_DEVICES"] = cuda_devices
llm = LLM(
model=engine_args,
dtype="float16",
tensor_parallel_size=1,
trust_remote_code=True,
gpu_memory_utilization=0.85,
disable_custom_all_reduce=True,
limit_mm_per_prompt={'image':256,'audio':50}
)

如果传入device还会报其他错误
llm = LLM(
model=engine_args,
dtype="float16",
device= cuda_devices,
tensor_parallel_size=1,
trust_remote_code=True,
gpu_memory_utilization=0.85,
disable_custom_all_reduce=True,
limit_mm_per_prompt={'image':256,'audio':50}
)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select)

 而且貌似两个80gA100显存不太够

torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.00 GiB. GPU 0 has a total capacity of 79.33 GiB of which 802.31 MiB is free. Process 4042162 has 49.78 GiB memory in use. Process 4042163 has 2.97 GiB memory in use. Process 4045381 has 22.37 GiB memory in use. Process 4045380 has 414.00 MiB memory in use. Process 4045382 has 2.97 GiB memory in use. Of the allocated memory 20.31 GiB is allocated by PyTorch, and 1.56 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

如果模型并行tensor_parallel_size=2, 还有cuda初始化方面的错误

该如何指定GPU？
2.如何并行部署？

The text was updated successfully, but these errors were encountered:

lxysl · 2025-01-12T02:48:56Z

已提交最新代码以解决在两张卡上部署的问题，关键修改是在子进程启动后再加载 torch 相关包

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

realtime server CUDA_VISIBLE_DEVICES不生效对于VLLM导致两个模型都在一个GPU上加载 #83

realtime server CUDA_VISIBLE_DEVICES不生效对于VLLM导致两个模型都在一个GPU上加载 #83

Coding-Zuo commented Jan 9, 2025

lxysl commented Jan 12, 2025

realtime server CUDA_VISIBLE_DEVICES不生效对于VLLM导致两个模型都在一个GPU上加载 #83

realtime server CUDA_VISIBLE_DEVICES不生效对于VLLM导致两个模型都在一个GPU上加载 #83

Comments

Coding-Zuo commented Jan 9, 2025

lxysl commented Jan 12, 2025