You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @haotian-liu, I am trying to pretrain and finetune llava model on my custom dataset. But during the fine-tuning, when I load projector.bin, LLava model and Image encoder, when i run train.py with following changes, it only prints the weights of projector (no model weight, no image encoder weights):
total_params = sum(param.numel() for param in model.parameters())
# Total number of trainable parameters
trainable_params = sum(param.numel() for param in model.parameters() if param.requires_grad)
print(f"Total Parameters: {total_params}")
print(f"Trainable Parameters: {trainable_params}")
data_module = make_supervised_data_module(tokenizer=tokenizer,
data_args=data_args)
trainer = LLaVATrainer(model=model,
tokenizer=tokenizer,
args=training_args,
**data_module)
The output is
Total Parameters: 32000000
Trainable Parameters:32000000
This is because DeepSpeed Zero3 shards the model across GPUs, so some model parameters are replaced with empty tensors, which makes param.numel() zero, even though that parameter is still trainable
Question
Hi @haotian-liu, I am trying to pretrain and finetune llava model on my custom dataset. But during the fine-tuning, when I load projector.bin, LLava model and Image encoder, when i run
train.py
with following changes, it only prints the weights of projector (no model weight, no image encoder weights):The output is
Below is my fine tuning script:
Why other weights are not visible here?
The text was updated successfully, but these errors were encountered: