You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
如果传入device还会报其他错误
llm = LLM(
model=engine_args,
dtype="float16", device= cuda_devices,
tensor_parallel_size=1,
trust_remote_code=True,
gpu_memory_utilization=0.85,
disable_custom_all_reduce=True,
limit_mm_per_prompt={'image':256,'audio':50}
)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select)
而且貌似两个80gA100显存不太够
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.00 GiB. GPU 0 has a total capacity of 79.33 GiB of which 802.31 MiB is free. Process 4042162 has 49.78 GiB memory in use. Process 4042163 has 2.97 GiB memory in use. Process 4045381 has 22.37 GiB memory in use. Process 4045380 has 414.00 MiB memory in use. Process 4045382 has 2.97 GiB memory in use. Of the allocated memory 20.31 GiB is allocated by PyTorch, and 1.56 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
如果模型并行tensor_parallel_size=2, 还有cuda初始化方面的错误
该如何指定GPU?
2.如何并行部署?
The text was updated successfully, but these errors were encountered:
realtime server 方面的问题
CUDA_VISIBLE_DEVICES不生效对于VLLM导致两个模型都在一个GPU上加载导致OOM
os.environ["CUDA_VISIBLE_DEVICES"] = cuda_devices
llm = LLM(
model=engine_args,
dtype="float16",
tensor_parallel_size=1,
trust_remote_code=True,
gpu_memory_utilization=0.85,
disable_custom_all_reduce=True,
limit_mm_per_prompt={'image':256,'audio':50}
)
如果传入device还会报其他错误
llm = LLM(
model=engine_args,
dtype="float16",
device= cuda_devices,
tensor_parallel_size=1,
trust_remote_code=True,
gpu_memory_utilization=0.85,
disable_custom_all_reduce=True,
limit_mm_per_prompt={'image':256,'audio':50}
)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select)
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.00 GiB. GPU 0 has a total capacity of 79.33 GiB of which 802.31 MiB is free. Process 4042162 has 49.78 GiB memory in use. Process 4042163 has 2.97 GiB memory in use. Process 4045381 has 22.37 GiB memory in use. Process 4045380 has 414.00 MiB memory in use. Process 4045382 has 2.97 GiB memory in use. Of the allocated memory 20.31 GiB is allocated by PyTorch, and 1.56 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
如果模型并行tensor_parallel_size=2, 还有cuda初始化方面的错误
2.如何并行部署?
The text was updated successfully, but these errors were encountered: