[BUG]: Pytest with a specific config failed after PR #5868 #5949

GuangyaoZhang · 2024-07-29T09:28:18Z

Is there an existing issue for this bug?

I have searched the existing issues

🐛 Describe the bug

Main repo test_shard_llama fails for these configs:

{'tp_size': 2, 
'pp_size': 2, 
'sp_size': 2, 
'num_microbatches': 2, 
'enable_sequence_parallelism': True, 
'sequence_parallelism_mode': 'ring', 
'enable_flash_attention': True, 
'zero_stage': 1, 
'precision': 'fp16', 
'initial_scale': 1}

{'tp_size': 2,
 'sp_size': 2, 
'pp_size': 2, 
'num_microbatches': 2, 
'enable_sequence_parallelism': True, 
'sequence_parallelism_mode': 'split_gather', 
'enable_flash_attention': False, 
'precision': 'fp16', 
'initial_scale': 1}

The failure message is :

E         File "/home/nvme-share/home/zhangguangyao/ColossalAI/colossalai/shardformer/modeling/llama.py", line 530, in forward                
E           query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin)                                               
E         File "/home/nvme-share/home/zhangguangyao/hf_transformers/src/transformers/models/llama/modeling_llama.py", line 206, in apply_rotary_pos_emb                                                                                                                                     
E           q_embed = (q * cos) + (rotate_half(q) * sin)                                                                                      
E       RuntimeError: The size of tensor a (16) must match the size of tensor b (8) at non-singleton dimension 2

I have found out this failure is introduced after PR #5868 merged. Please take a look.

Environment

No response

The text was updated successfully, but these errors were encountered:

GuangyaoZhang added bug Something isn't working shardformer labels Jul 29, 2024

GuangyaoZhang assigned Edenzzzz Jul 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]: Pytest with a specific config failed after PR #5868 #5949

[BUG]: Pytest with a specific config failed after PR #5868 #5949

GuangyaoZhang commented Jul 29, 2024

[BUG]: Pytest with a specific config failed after PR #5868 #5949

[BUG]: Pytest with a specific config failed after PR #5868 #5949

Comments

GuangyaoZhang commented Jul 29, 2024

Is there an existing issue for this bug?

🐛 Describe the bug

Environment