LLM Model

LLM Model
- Survey
- LLM Models
- State Space Model
- Projects
- Misc

Survey

LLM Models

RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable).
2024 in Post-Transformer Architectures: State Space Models, RWKV [Latent Space LIVE! @ NeurIPS 2024] 🎬
Bamba: Inference-Efficient Hybrid Mamba2 Model 🤗

· (bamba - foundation-model-stack)
QRWKV6 32B Instruct Preview is one of the largest and strongest RWKV model to date. 🤗
RWKV Flock of Finches 37B-A11B v0.1 Mixture of Experts Model 🤗
This xLSTM-7B was pre-trained on the DCLM and selected high-quality data for in a total of approx. 2.3 T tokens using the xlstm-jax framework. 🤗
Ultra-Sparse Memory Network, arXiv, 2411.12364, arxiv, pdf, cication: -1

Zihao Huang, Qiyang Min, Hongzhi Huang, ..., Ran Guo, Xun Zhou
Bi-Mamba: Towards Accurate 1-Bit State Space Models, arXiv, 2411.11843, arxiv, pdf, cication: -1

Shengkun Tang, Liqun Ma, Haonan Li, ..., Mingjie Sun, Zhiqiang Shen · (𝕏)
Learning to (Learn at Test Time): RNNs with Expressive Hidden States, arXiv, 2407.04620, arxiv, pdf, cication: 19

Yu Sun, Xinhao Li, Karan Dalal, ..., Tatsunori Hashimoto, Carlos Guestrin · (yueatsprograms.github) · (ttt-lm-pytorch - test-time-training)
Wave Network: An Ultra-Small Language Model, arXiv, 2411.02674, arxiv, pdf, cication: -1

Xin Zhang, Victor S. Sheng
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters, arXiv, 2410.23168, arxiv, pdf, cication: -1

Haiyang Wang, Yue Fan, Muhammad Ferjad Naeem, ..., Federico Tombari, Bernt Schiele · (TokenFormer - Haiyang-W)
MrT5: Dynamic Token Merging for Efficient Byte-level Language Models, arXiv, 2410.20771, arxiv, pdf, cication: -1

Julie Kallini, Shikhar Murty, Christopher D. Manning, ..., Christopher Potts, Róbert Csordás
Scaling Diffusion Language Models via Adaptation from Autoregressive Models, arXiv, 2410.17891, arxiv, pdf, cication: -1

Shansan Gong, Shivam Agarwal, Yizhe Zhang, ..., Hao Peng, Lingpeng Kong

· (arxiv) · (DiffuLLaMA - HKUNLP) · (huggingface)

State Space Model

Projects

large_concept_model - facebookresearch

· (ai.meta) · (ai.meta) · (𝕏)

Misc