-
Alignment faking in large language models,
arXiv, 2412.14093
, arxiv, pdf, cication: -1Ryan Greenblatt, Carson Denison, Benjamin Wright, ..., Samuel R. Bowman, Evan Hubinger · (alignment_faking_public - redwoodresearch)
-
🌟 RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response,
arXiv, 2412.14922
, arxiv, pdf, cication: -1Junyu Luo, Xiao Luo, Kaize Ding, ..., Zhiping Xiao, Ming Zhang · (RobustFT - luo-junyu)
-
SmolTulu: Higher Learning Rate to Batch Size Ratios Can Lead to Better Reasoning in SLMs,
arXiv, 2412.08347
, arxiv, pdf, cication: -1Sultan Alrashed
-
Alignment faking in large language models
· (assets.anthropic)
-
[10 Dec 2024, NeurIPS // Infer] Post-training for applications
· (𝕏)
-
🌟 KTO: Model Alignment as Prospect Theoretic Optimization,
arXiv, 2402.01306
, arxiv, pdf, cication: -1Kawin Ethayarajh, Winnie Xu, Niklas Muennighoff, ..., Dan Jurafsky, Douwe Kiela
-
Does your data spark joy? Performance gains from domain upsampling at the end of training,
arXiv, 2406.03476
, arxiv, pdf, cication: -1Cody Blakeney, Mansheej Paul, Brett W. Larsen, ..., Sean Owen, Jonathan Frankle
-
WizardArena: Post-training Large Language Models via Simulated Offline Chatbot Aren
· (arxiv)
-
Rewarding Chatbots for Real-World Engagement with Millions of Users,
arXiv, 2303.06135
, arxiv, pdf, cication: -1Robert Irvine, Douglas Boubert, Vyas Raina, ..., Thomas Rialan, William Beauchamp
-
IOPO: Empowering LLMs with Complex Instruction Following via Input-Output Preference Optimization,
arXiv, 2411.06208
, arxiv, pdf, cication: -1Xinghua Zhang, Haiyang Yu, Cheng Fu, ..., Fei Huang, Yongbin Li
-
Aligning Large Language Models via Self-Steering Optimization,
arXiv, 2410.17131
, arxiv, pdf, cication: -1Hao Xiang, Bowen Yu, Hongyu Lin, ..., Jingren Zhou, Junyang Lin
-
LOGO -- Long cOntext aliGnment via efficient preference Optimization,
arXiv, 2410.18533
, arxiv, pdf, cication: -1Zecheng Tang, Zechen Sun, Juntao Li, ..., Qiaoming Zhu, Min Zhang
-
Baichuan Alignment Technical Report,
arXiv, 2410.14940
, arxiv, pdf, cication: -1Mingan Lin, Fan Yang, Yanjun Shen, ..., Zenan Zhou, Weipeng Chen · (huggingface)