Merge EmbeddedLLM/vllm-rocm into vLLM main #1749

tjtanaa · 2023-11-22T05:17:16Z

Checklist:

Merge changes from upstream vllm commit 094f716
Dynamic code path selection for CUDA or ROCm in PyTorch
Pass all unit tests
ROCm Dockerfile

* port dtype_float16.cuh and cache_kernels.cu * port dtype_bfloat16.cuh * port attention_utils.cuh * port more kernels * fix typo * add cuda_compat.h * sync branches * update * update * fixes * cleanup * update * update * update * fmt * cleanup * refactor * update * detecting rocm and adding flag for compiling * using asm volatile instead of hip api * using asm volatile for type casting of f16 --------- Co-authored-by: Philipp Moritz <[email protected]> Co-authored-by: Amir Balwel <[email protected]>

…oblem and xformers license

simon-mo

Thank you for upstreaming this! We will review soon.

simon-mo · 2023-11-22T05:26:54Z

Dockerfile

-
-ENTRYPOINT ["python3", "-m", "vllm.entrypoints.openai.api_server"]
-
+FROM rocm/pytorch:rocm5.7_ubuntu22.04_py3.10_pytorch_2.0.1


Please make a new docker file.

hongxiayang · 2023-11-26T03:30:39Z

Dockerfile

+    && git clone https://github.com/ROCmSoftwarePlatform/flash-attention.git \
+    && cd flash-attention \
+    && git submodule update --init \
+    && sed -i -e "s/--offload-arch=native/--offload-arch=$(/opt/rocm/llvm/bin/amdgpu-offload-arch)/g" setup.py \


Thank you for the pull request.
This line is no-op since I don't see any reference of offload-arch in setup.py file.
Therefore, when I test this pull request and build the docker using this Dockerfile, it failed because of that.
Can you check the setup.py file?

/opt/rocm/bin/hipcc -I/app/libs/flash-attention/csrc/flash_attn_rocm -I/app/libs/flash-attention/csrc/flash_attn_rocm/src -I/app/libs/flash-attention/csrc/flash_attn_rocm/composable_kernel/include -I/app/libs/flash-attention/csrc/flash_attn_rocm/composable_kernel/library/include -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/TH -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/THC -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/THH -I/opt/rocm/include -I/opt/conda/envs/py_3.10/include/python3.10 -c -c /app/libs/flash-attention/csrc/flash_attn_rocm/src/flash_bwd_runner_batched_hdim32_bf16_causal_gfx9x_hip.hip -o /app/libs/flash-attention/build/temp.linux-x86_64-cpython-310/csrc/flash_attn_rocm/src/flash_bwd_runner_batched_hdim32_bf16_causal_gfx9x_hip.o -fPIC -D__HIP_PLATFORM_HCC__=1 -DUSE_ROCM=1 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -std=c++20 -DNDEBUG -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --offload-arch=native -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=1 -fno-gpu-rdc^M clang++: error: cannot determine amdgcn architecture: /opt/rocm/llvm/bin/amdgpu-arch: ; consider passing it via '--offload-arch'^M

We implemented a temporary solution during the build process with ROCm/flash-attention@edc7698.
The issue with the hardcoded --offload-arch=native has been resolved by the commit ROCm/flash-attention@5f1ae07. It appears that the temporary fix is no longer necessary. Following the testing of the most recent version of flash-attention, we plan to revise the Dockerfile accordingly.

Yes, use a specific commit of a named branch to achieve stable and reproducible result than using the default branch since it might keep changing. Related to the name of the Dockerfile, you might want to rename the Dockerfile to Dockerfile.rocm_xxx with xxx related to the version of the rocm you are using.

tjtanaa · 2023-11-29T16:02:42Z

@hongxiayang @WoosukKwon @simon-mo
I am closing this PR and continue the work on PR EmbeddedLLM#17
🙏

simon-mo · 2023-12-10T22:02:57Z

The full version is merged in #1836!

tjtanaa and others added 20 commits October 27, 2023 00:27

Imported ROCm flash_attn-related xformers modules

eea4631

Ported vLLM 0.2.x to ROCm

fc2d074

Added hip adaptation in squeezellm layer

998a80d

Merged latest vllm main branch

cddb9b2

Added multi-gpu support, workaround for safetensors weight loading pr…

726cddf

…oblem and xformers license

vLLM vllm-project#1531

bf999b1

Omitted certain block sizes

7f5cf5b

Forced contiguous qkv for flash attention1 support

edab2f4

Fixed whatever this is

1c1bb0f

Removed debug print

1815c0a

Updated readme

749bc86

Disabled awq for now

b4d6f2e

Adapt to ROCm flash-attention2 interface

9be4bba

Update readme

077c77c

Update readme

3a0eea4

Merge branch 'main' into v0.2.1.post1-rocm-dev

89e8cf4

Merge branch 'main' into v0.2.1.post1-rocm-dev

168b6e6

Update docker

343d234

Update readme

5abe1e5

tanpinsiang mentioned this pull request Nov 22, 2023

Merging with vLLM main branch EmbeddedLLM/vllm#12

Closed

simon-mo reviewed Nov 22, 2023

View reviewed changes

tjtanaa mentioned this pull request Nov 22, 2023

Roadmap EmbeddedLLM/vllm#4

Open

16 tasks

hongxiayang reviewed Nov 26, 2023

View reviewed changes

tjtanaa mentioned this pull request Nov 29, 2023

[Continuation] Merge EmbeddedLLM/vllm-rocm into vLLM main #1836

Merged

6 tasks

tjtanaa closed this Nov 29, 2023

kliuae deleted the vllm-rocm-merge-to-vllm branch December 1, 2023 17:21

WoosukKwon mentioned this pull request Dec 10, 2023

[Do not merge] Hacks for the ROCm port #1314

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge EmbeddedLLM/vllm-rocm into vLLM main #1749

Merge EmbeddedLLM/vllm-rocm into vLLM main #1749

tjtanaa commented Nov 22, 2023 •

edited

Loading

simon-mo left a comment

simon-mo Nov 22, 2023

hongxiayang Nov 26, 2023 •

edited

Loading

tanpinsiang Nov 26, 2023

hongxiayang Nov 27, 2023 •

edited

Loading

tjtanaa commented Nov 29, 2023

simon-mo commented Dec 10, 2023


		ENTRYPOINT ["python3", "-m", "vllm.entrypoints.openai.api_server"]

		FROM rocm/pytorch:rocm5.7_ubuntu22.04_py3.10_pytorch_2.0.1

Merge EmbeddedLLM/vllm-rocm into vLLM main #1749

Merge EmbeddedLLM/vllm-rocm into vLLM main #1749

Conversation

tjtanaa commented Nov 22, 2023 • edited Loading

simon-mo left a comment

Choose a reason for hiding this comment

simon-mo Nov 22, 2023

Choose a reason for hiding this comment

hongxiayang Nov 26, 2023 • edited Loading

Choose a reason for hiding this comment

tanpinsiang Nov 26, 2023

Choose a reason for hiding this comment

hongxiayang Nov 27, 2023 • edited Loading

Choose a reason for hiding this comment

tjtanaa commented Nov 29, 2023

simon-mo commented Dec 10, 2023

tjtanaa commented Nov 22, 2023 •

edited

Loading

hongxiayang Nov 26, 2023 •

edited

Loading

hongxiayang Nov 27, 2023 •

edited

Loading