Add Flex Attention Monkey Patch for LLAMA #540

austin362667 · 2025-01-25T17:18:56Z

Summary

We need flex attention for custom attentions/masks to achieve better performance (for example, shared prefix)

Two ways to enable flex attention in liger:

Set the attn_implementation of ModelConfig from PyTorch sdpa/eager to flex_attention (for instance, LlamaConfig). By doing so, we'll switch config._attn_implementation to use flex attention impl.
(This PR) Patch all attention impls dict in HuggingFace to use flex attention. So that we can still use original default attention key, say sdpa(however now it's flex_attention instead).

Testing Done

Hardware Type:
run make test to ensure correctness
run make checkstyle to ensure code style
run make test-convergence to ensure convergence

Signed-off-by: Austin Liu <[email protected]> wip Signed-off-by: Austin Liu <[email protected]> wip Signed-off-by: Austin Liu <[email protected]>

Signed-off-by: Austin Liu <[email protected]>

Signed-off-by: Austin Liu <[email protected]> fix logits tests Signed-off-by: Austin Liu <[email protected]>

Signed-off-by: Austin Liu <[email protected]>

austin362667 added 7 commits January 28, 2025 14:50

wip

c73cf98

Signed-off-by: Austin Liu <[email protected]> wip Signed-off-by: Austin Liu <[email protected]> wip Signed-off-by: Austin Liu <[email protected]>

patch config._attn_implementation

0d15f7f

Signed-off-by: Austin Liu <[email protected]>

clean up

1d82bc1

Signed-off-by: Austin Liu <[email protected]>

add liger_flex_attention

e624940

Signed-off-by: Austin Liu <[email protected]>

wip

0e724d3

Signed-off-by: Austin Liu <[email protected]>

reload mutable sets

f860b97

Signed-off-by: Austin Liu <[email protected]>

re-org

8791f16

Signed-off-by: Austin Liu <[email protected]>

austin362667 force-pushed the austin362667/llama_flex_attn branch from 73c3f2b to 585c765 Compare January 28, 2025 06:50

austin362667 added 2 commits January 28, 2025 14:53

fix logits tests

c350396

Signed-off-by: Austin Liu <[email protected]> fix logits tests Signed-off-by: Austin Liu <[email protected]>

fix bw compatibility issues

7d342a6

Signed-off-by: Austin Liu <[email protected]>

austin362667 force-pushed the austin362667/llama_flex_attn branch from 585c765 to 7d342a6 Compare January 28, 2025 09:56

austin362667 marked this pull request as ready for review January 29, 2025 17:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Flex Attention Monkey Patch for LLAMA #540

Add Flex Attention Monkey Patch for LLAMA #540

austin362667 commented Jan 25, 2025 •

edited

Loading

Add Flex Attention Monkey Patch for LLAMA #540

Are you sure you want to change the base?

Add Flex Attention Monkey Patch for LLAMA #540

Conversation

austin362667 commented Jan 25, 2025 • edited Loading

Summary

Testing Done

austin362667 commented Jan 25, 2025 •

edited

Loading