Skip to content

Latest commit

 

History

History
65 lines (43 loc) · 4.34 KB

llm_alignment.md

File metadata and controls

65 lines (43 loc) · 4.34 KB

LLM Alignment

Survey

LLM Alignment

  • Alignment faking in large language models, arXiv, 2412.14093, arxiv, pdf, cication: -1

    Ryan Greenblatt, Carson Denison, Benjamin Wright, ..., Samuel R. Bowman, Evan Hubinger · (alignment_faking_public - redwoodresearch) Star

  • 🌟 RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response, arXiv, 2412.14922, arxiv, pdf, cication: -1

    Junyu Luo, Xiao Luo, Kaize Ding, ..., Zhiping Xiao, Ming Zhang · (RobustFT - luo-junyu) Star

  • SmolTulu: Higher Learning Rate to Batch Size Ratios Can Lead to Better Reasoning in SLMs, arXiv, 2412.08347, arxiv, pdf, cication: -1

    Sultan Alrashed

  • Alignment faking in large language models

    · (assets.anthropic)

  • [10 Dec 2024, NeurIPS // Infer] Post-training for applications

    · (𝕏)

  • 🌟 KTO: Model Alignment as Prospect Theoretic Optimization, arXiv, 2402.01306, arxiv, pdf, cication: -1

    Kawin Ethayarajh, Winnie Xu, Niklas Muennighoff, ..., Dan Jurafsky, Douwe Kiela

  • Does your data spark joy? Performance gains from domain upsampling at the end of training, arXiv, 2406.03476, arxiv, pdf, cication: -1

    Cody Blakeney, Mansheej Paul, Brett W. Larsen, ..., Sean Owen, Jonathan Frankle

  • Meta’s Post-Training Pipeline for Llama 3.1

  • WizardArena: Post-training Large Language Models via Simulated Offline Chatbot Aren

    · (arxiv)

  • Rewarding Chatbots for Real-World Engagement with Millions of Users, arXiv, 2303.06135, arxiv, pdf, cication: -1

    Robert Irvine, Douglas Boubert, Vyas Raina, ..., Thomas Rialan, William Beauchamp

  • IOPO: Empowering LLMs with Complex Instruction Following via Input-Output Preference Optimization, arXiv, 2411.06208, arxiv, pdf, cication: -1

    Xinghua Zhang, Haiyang Yu, Cheng Fu, ..., Fei Huang, Yongbin Li

  • Aligning Large Language Models via Self-Steering Optimization, arXiv, 2410.17131, arxiv, pdf, cication: -1

    Hao Xiang, Bowen Yu, Hongyu Lin, ..., Jingren Zhou, Junyang Lin

  • LOGO -- Long cOntext aliGnment via efficient preference Optimization, arXiv, 2410.18533, arxiv, pdf, cication: -1

    Zecheng Tang, Zechen Sun, Juntao Li, ..., Qiaoming Zhu, Min Zhang

  • Baichuan Alignment Technical Report, arXiv, 2410.14940, arxiv, pdf, cication: -1

    Mingan Lin, Fan Yang, Yanjun Shen, ..., Zenan Zhou, Weipeng Chen · (huggingface)

Projects

Misc

Misc