Skip to content

Latest commit

 

History

History
8 lines (6 loc) · 591 Bytes

README.md

File metadata and controls

8 lines (6 loc) · 591 Bytes

Run InternLM-XComposer-2d5-4bit with LMDeploy

Thanks to the LMDeploy team for providing AWQ quantization support (https://github.com/InternLM/lmdeploy/blob/main/docs/en/multi_modal/xcomposer2d5.md#quantization). We compare the memory usage between the FP16 model and the 4-bit model, setting cache_max_entry_count=0.01 to reduce GPU memory usage and better observe memory savings. The program was tested on PyTorch 2.2.2+cu118.

GPU Memory (GB)
IXC2.5-lmdeploy 31.81
IXC2.5-lmdeploy-4bit 23.21