-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multi-image support for llama3.2 #705
base: habana_main
Are you sure you want to change the base?
Conversation
8477d17
to
08541ff
Compare
@kdamaszk @michalkuligowski @kzawora-intel, can you help review on this PR? Already tested on 11B and 90B. |
c9bfe70
to
7223fb8
Compare
@yma11 I observed accuracy regression on MMMU val dataset and LLama 3.2 11B Vision Instruct. Let's sync offline |
69411f3
to
0c2759e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@PatrykWo please review |
@kdamaszk I see some errors in CI. Please rebase with the base branch so you are up to date and we can confirm that new changes does not influence the PR. |
Signed-off-by: yan ma <[email protected]>
Signed-off-by: yan ma <[email protected]>
Signed-off-by: yan ma <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tested the multi image support and it seems to work properly, but whit Fused SDPA in cross attention layers accuracy is worse - please, revert changes made in SDPA computation
is_causal=False) | ||
output = output.permute(2, 0, 1, 3).reshape( | ||
q_len, self.num_local_heads * self.head_dim) | ||
return output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yma11 This PR works fine but using Fused SDPA in cross attetnion results in worse accuracy. We should better keep F.scaled_dot_product_attention here
No description provided.