Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge EmbeddedLLM/vllm-rocm into vLLM main #1749

Closed
wants to merge 20 commits into from

Conversation

tjtanaa
Copy link
Contributor

@tjtanaa tjtanaa commented Nov 22, 2023

Checklist:

  • Merge changes from upstream vllm commit 094f716
  • Dynamic code path selection for CUDA or ROCm in PyTorch
  • Pass all unit tests
  • ROCm Dockerfile

Copy link
Collaborator

@simon-mo simon-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for upstreaming this! We will review soon.


ENTRYPOINT ["python3", "-m", "vllm.entrypoints.openai.api_server"]

FROM rocm/pytorch:rocm5.7_ubuntu22.04_py3.10_pytorch_2.0.1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make a new docker file.

@tjtanaa tjtanaa mentioned this pull request Nov 22, 2023
16 tasks
&& git clone https://github.com/ROCmSoftwarePlatform/flash-attention.git \
&& cd flash-attention \
&& git submodule update --init \
&& sed -i -e "s/--offload-arch=native/--offload-arch=$(/opt/rocm/llvm/bin/amdgpu-offload-arch)/g" setup.py \
Copy link
Collaborator

@hongxiayang hongxiayang Nov 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the pull request.
This line is no-op since I don't see any reference of offload-arch in setup.py file.
Therefore, when I test this pull request and build the docker using this Dockerfile, it failed because of that.
Can you check the setup.py file?

/opt/rocm/bin/hipcc  -I/app/libs/flash-attention/csrc/flash_attn_rocm -I/app/libs/flash-attention/csrc/flash_attn_rocm/src -I/app/libs/flash-attention/csrc/flash_attn_rocm/composable_kernel/include -I/app/libs/flash-attention/csrc/flash_attn_rocm/composable_kernel/library/include -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/TH -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/THC -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/THH -I/opt/rocm/include -I/opt/conda/envs/py_3.10/include/python3.10 -c -c /app/libs/flash-attention/csrc/flash_attn_rocm/src/flash_bwd_runner_batched_hdim32_bf16_causal_gfx9x_hip.hip -o /app/libs/flash-attention/build/temp.linux-x86_64-cpython-310/csrc/flash_attn_rocm/src/flash_bwd_runner_batched_hdim32_bf16_causal_gfx9x_hip.o -fPIC -D__HIP_PLATFORM_HCC__=1 -DUSE_ROCM=1 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -std=c++20 -DNDEBUG -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --offload-arch=native -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=1 -fno-gpu-rdc^M
clang++: error: cannot determine amdgcn architecture: /opt/rocm/llvm/bin/amdgpu-arch: ; consider passing it via '--offload-arch'^M

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We implemented a temporary solution during the build process with ROCm/flash-attention@edc7698.
The issue with the hardcoded --offload-arch=native has been resolved by the commit ROCm/flash-attention@5f1ae07. It appears that the temporary fix is no longer necessary. Following the testing of the most recent version of flash-attention, we plan to revise the Dockerfile accordingly.

Copy link
Collaborator

@hongxiayang hongxiayang Nov 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, use a specific commit of a named branch to achieve stable and reproducible result than using the default branch since it might keep changing. Related to the name of the Dockerfile, you might want to rename the Dockerfile to Dockerfile.rocm_xxx with xxx related to the version of the rocm you are using.

@tjtanaa
Copy link
Contributor Author

tjtanaa commented Nov 29, 2023

@hongxiayang @WoosukKwon @simon-mo
I am closing this PR and continue the work on PR EmbeddedLLM#17
🙏

@tjtanaa tjtanaa closed this Nov 29, 2023
@kliuae kliuae deleted the vllm-rocm-merge-to-vllm branch December 1, 2023 17:21
@simon-mo
Copy link
Collaborator

The full version is merged in #1836!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants