Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does it support llava-video? #5

Open
zjukop opened this issue Dec 11, 2024 · 3 comments
Open

Does it support llava-video? #5

zjukop opened this issue Dec 11, 2024 · 3 comments

Comments

@zjukop
Copy link

zjukop commented Dec 11, 2024

No description provided.

@Yangsenqiao
Copy link
Collaborator

In my opinion, LLaVA-Video introduces a high-quality dataset, and the model is still based on the LLaVA architecture. Therefore, I think VisionZip can be applied to LLaVA-Video.

@zjukop
Copy link
Author

zjukop commented Dec 13, 2024

I encountered an error, Could u help me?
File "lmms-eval-main/VisionZip/visionzip/llava_arch.py", line 43, in encode_images_visionzip_multi
image_features, keep_idx = self.get_model().get_vision_tower().forward(images)
File "/root/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1688, in getattr
raise AttributeError(f"'{type(self).name}' object has no attribute '{name}'")
AttributeError: 'ClipEncoderLayer' object has no attribute 'metric

@Yangsenqiao
Copy link
Collaborator

Hi 🤗,

The metric is calculated in this [line](

metric = self.vision_tower.vision_model.encoder.layers[-2].metric
).

Please check if the model correctly executed this line.

If you encounter any further errors, feel free to ask me.

Best,
Senqiao

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants