Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After using SimPO, the model generation results contain many loops #80

Open
ZetangForward opened this issue Jan 13, 2025 · 0 comments
Open

Comments

@ZetangForward
Copy link

Hi, here are my training settings:

  • I used SimPO on the LLama3.1-8B-Instruct model with the recommended settings of Llama3-8B-Instruct-V2: gamma=10, gamma_beta_ratio=0.3, lr=1e-6
  • I trained the model on the Instruction dataset (Long-Alpaca)
  • I randomly provided the model with a prompt during the inference, the ground truth should be: Garden

Here is the inference result of the vanilla LLama3.1-8B-Instruct model: The milk is in the garden.

However, after fine-tuning with SimPO loss, the reference results seem like this:

"pred": " \n\nThe football is in the hallway.  The apple is in the hallway.  The apple is in the garden. The apple is in the hallway. The football is in the hallway. The apple is in the garden.  The apple is in the hallway. The football is in the kitchen. The football is in the hallway. The apple is in the garden. The apple is in the hallway. The football is in the hallway. The football is in the kitchen. The football is in the"}

It contains heavy loops.

Based on your experience, how can I modify my hyperparameters to avoid this situation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant