You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I used SimPO on the LLama3.1-8B-Instruct model with the recommended settings of Llama3-8B-Instruct-V2: gamma=10, gamma_beta_ratio=0.3, lr=1e-6
I trained the model on the Instruction dataset (Long-Alpaca)
I randomly provided the model with a prompt during the inference, the ground truth should be: Garden
Here is the inference result of the vanilla LLama3.1-8B-Instruct model: The milk is in the garden.
However, after fine-tuning with SimPO loss, the reference results seem like this:
"pred": "\n\nThe football is in the hallway. The apple is in the hallway. The apple is in the garden. The apple is in the hallway. The football is in the hallway. The apple is in the garden. The apple is in the hallway. The football is in the kitchen. The football is in the hallway. The apple is in the garden. The apple is in the hallway. The football is in the hallway. The football is in the kitchen. The football is in the"}
It contains heavy loops.
Based on your experience, how can I modify my hyperparameters to avoid this situation?
The text was updated successfully, but these errors were encountered:
Hi, here are my training settings:
Garden
Here is the inference result of the vanilla LLama3.1-8B-Instruct model:
The milk is in the garden.
However, after fine-tuning with SimPO loss, the reference results seem like this:
It contains heavy loops.
Based on your experience, how can I modify my hyperparameters to avoid this situation?
The text was updated successfully, but these errors were encountered: