Feat/blockwise fp8 quant #1668

Degnel · 2025-02-05T15:08:53Z

Feat: Implementation of the DeepSeek blockwise quantization for fp8 tensors

WARNING: The code has been tested on the following files:

pytest test/float8/test_base.py
pytest test/float8/test_compile.py
pytest test/float8/test_numerics_integration.py

However, tests have not been performed on the following files due to limitations (Triton is unavailable on Windows and I don't own an NVIDIA GPU):

./test/float8/test_fsdp.sh
./test/float8/test_dtensor.sh
python test/float8/test_fsdp2/test_fsdp2.py

- first implementation of the DeepSeek blockwise quantizer (not fully fonctionnal) - amax has been unpdated - 2 more quantisation recipes has been added - a couple of things here and there to make it consistent

pytorch-bot · 2025-02-05T15:08:58Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1668

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

ROCM Infra failures during checkout of PyTorch

This comment was automatically generated by Dr. CI and updates every 15 minutes.

vkuzo · 2025-02-05T15:46:52Z

could you share what gemm kernel you plan to use in this PR? I think a good first step here is to have a fast gemm.

we have an issue tracking this here: #1594

cassanof · 2025-02-06T06:54:55Z

this might be a good place to start: https://huggingface.co/deepseek-ai/DeepSeek-V3/blob/1d044fd82b15f1cedb197a288e50cc96a2c27205/inference/kernel.py#L63

vkuzo · 2025-02-06T16:45:56Z

Overall it would be great to be able to support this recipe in torchao. I think having a gemm with compelling performance that supports 128x1 and 128x128 scaling is something we need first, with benchmarks comparing to other recipes such as rowwise scaled, etc.

supriyar · 2025-02-06T17:17:24Z

Relevant PR in SGLang that adds the triton kernels - sgl-project/sglang#2575 (thanks to @HandH1998). I think it makes sense to add this as a starting point to torchao.

Degnel and others added 8 commits February 1, 2025 14:25

Feat: blockwise fp8 quantizer

55dab5f

- first implementation of the DeepSeek blockwise quantizer (not fully fonctionnal) - amax has been unpdated - 2 more quantisation recipes has been added - a couple of things here and there to make it consistent

Feat: fp8 linear layer with blockwise quantization

5ab1eb2

Merge branch 'pytorch:main' into feat/blockwise_fp8_quant

4af3c34

Feat: adding assertions in the ops file

aa7dc87

Feat: adding some tests for blockwise fp8 quant

167fdce

Fix: fixes for the blockwise_fp8_quantization

9e9d16e

Merge branch 'pytorch:main' into feat/blockwise_fp8_quant

cf5802a

linting

6c9246a

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 5, 2025

Degnel mentioned this pull request Feb 5, 2025

[float8] Add support for blockwise fp8 quantization scheme used in DeepSeek v3 #1594

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/blockwise fp8 quant #1668

Feat/blockwise fp8 quant #1668

Degnel commented Feb 5, 2025

pytorch-bot bot commented Feb 5, 2025

vkuzo commented Feb 5, 2025

cassanof commented Feb 6, 2025

vkuzo commented Feb 6, 2025

supriyar commented Feb 6, 2025

Feat/blockwise fp8 quant #1668

Are you sure you want to change the base?

Feat/blockwise fp8 quant #1668

Conversation

Degnel commented Feb 5, 2025

pytorch-bot bot commented Feb 5, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1668

❗ 1 Active SEVs

vkuzo commented Feb 5, 2025

cassanof commented Feb 6, 2025

vkuzo commented Feb 6, 2025

supriyar commented Feb 6, 2025