Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

realtime server CUDA_VISIBLE_DEVICES不生效对于VLLM导致两个模型都在一个GPU上加载 #83

Open
Coding-Zuo opened this issue Jan 9, 2025 · 1 comment

Comments

@Coding-Zuo
Copy link

realtime server 方面的问题
CUDA_VISIBLE_DEVICES不生效对于VLLM导致两个模型都在一个GPU上加载导致OOM

os.environ["CUDA_VISIBLE_DEVICES"] = cuda_devices
llm = LLM(
model=engine_args,
dtype="float16",
tensor_parallel_size=1,
trust_remote_code=True,
gpu_memory_utilization=0.85,
disable_custom_all_reduce=True,
limit_mm_per_prompt={'image':256,'audio':50}
)

如果传入device还会报其他错误
llm = LLM(
model=engine_args,
dtype="float16",
device= cuda_devices,
tensor_parallel_size=1,
trust_remote_code=True,
gpu_memory_utilization=0.85,
disable_custom_all_reduce=True,
limit_mm_per_prompt={'image':256,'audio':50}
)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select)

 而且貌似两个80gA100显存不太够

torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.00 GiB. GPU 0 has a total capacity of 79.33 GiB of which 802.31 MiB is free. Process 4042162 has 49.78 GiB memory in use. Process 4042163 has 2.97 GiB memory in use. Process 4045381 has 22.37 GiB memory in use. Process 4045380 has 414.00 MiB memory in use. Process 4045382 has 2.97 GiB memory in use. Of the allocated memory 20.31 GiB is allocated by PyTorch, and 1.56 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

如果模型并行tensor_parallel_size=2, 还有cuda初始化方面的错误

  1. 该如何指定GPU?
    2.如何并行部署?
@lxysl
Copy link
Contributor

lxysl commented Jan 12, 2025

已提交最新代码以解决在两张卡上部署的问题,关键修改是在子进程启动后再加载 torch 相关包

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants