LLM Alignment

LLM Alignment
- Survey
- LLM Alignment
- Projects
- Misc

Survey

LLM Alignment

Alignment faking in large language models, arXiv, 2412.14093, arxiv, pdf, cication: -1

Ryan Greenblatt, Carson Denison, Benjamin Wright, ..., Samuel R. Bowman, Evan Hubinger · (alignment_faking_public - redwoodresearch)
🌟 RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response, arXiv, 2412.14922, arxiv, pdf, cication: -1

Junyu Luo, Xiao Luo, Kaize Ding, ..., Zhiping Xiao, Ming Zhang · (RobustFT - luo-junyu)
SmolTulu: Higher Learning Rate to Batch Size Ratios Can Lead to Better Reasoning in SLMs, arXiv, 2412.08347, arxiv, pdf, cication: -1

Sultan Alrashed
Alignment faking in large language models

· (assets.anthropic)
[10 Dec 2024, NeurIPS // Infer] Post-training for applications

· (𝕏)
🌟 KTO: Model Alignment as Prospect Theoretic Optimization, arXiv, 2402.01306, arxiv, pdf, cication: -1

Kawin Ethayarajh, Winnie Xu, Niklas Muennighoff, ..., Dan Jurafsky, Douwe Kiela
Does your data spark joy? Performance gains from domain upsampling at the end of training, arXiv, 2406.03476, arxiv, pdf, cication: -1

Cody Blakeney, Mansheej Paul, Brett W. Larsen, ..., Sean Owen, Jonathan Frankle
Meta’s Post-Training Pipeline for Llama 3.1
WizardArena: Post-training Large Language Models via Simulated Offline Chatbot Aren

· (arxiv)
Rewarding Chatbots for Real-World Engagement with Millions of Users, arXiv, 2303.06135, arxiv, pdf, cication: -1

Robert Irvine, Douglas Boubert, Vyas Raina, ..., Thomas Rialan, William Beauchamp
IOPO: Empowering LLMs with Complex Instruction Following via Input-Output Preference Optimization, arXiv, 2411.06208, arxiv, pdf, cication: -1

Xinghua Zhang, Haiyang Yu, Cheng Fu, ..., Fei Huang, Yongbin Li
Aligning Large Language Models via Self-Steering Optimization, arXiv, 2410.17131, arxiv, pdf, cication: -1

Hao Xiang, Bowen Yu, Hongyu Lin, ..., Jingren Zhou, Junyang Lin
LOGO -- Long cOntext aliGnment via efficient preference Optimization, arXiv, 2410.18533, arxiv, pdf, cication: -1

Zecheng Tang, Zechen Sun, Juntao Li, ..., Qiaoming Zhu, Min Zhang
Baichuan Alignment Technical Report, arXiv, 2410.14940, arxiv, pdf, cication: -1

Mingan Lin, Fan Yang, Yanjun Shen, ..., Zenan Zhou, Weipeng Chen · (huggingface)

Projects

Misc

Misc

How language model post-training is done today 🎬