-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weโll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support llama3.2vl(WIP). #5555
base: main
Are you sure you want to change the base?
Support llama3.2vl(WIP). #5555
Conversation
a3a23ac
to
b0e6f98
Compare
images = [Image.open(image) if isinstance(image, str) else image for image in images] | ||
image_features = processor.image_processor(images) | ||
_ = image_features.pop("num_tiles") | ||
image_features = {k: v if isinstance(v, torch.Tensor) else torch.tensor(v) for k, v in image_features.items()} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is because we can't get text at get_mm_inputs
how do you think to fix this? Like add a new stage or add text input to get_mm_inputs
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, we should do some work here
๐ What does this PR do?
Support Llama-3.2-11B-Vision.
โ Before submitting
๐ Linked issues
#5549
bitsandbytes 8 bits quantization is not functional. 4 bits is okay but not 8 bits.