-
2024 in Post-Transformer Architectures: State Space Models, RWKV [Latent Space LIVE! @ NeurIPS 2024] 🎬
-
Bamba: Inference-Efficient Hybrid Mamba2 Model 🤗
· (bamba - foundation-model-stack)
-
QRWKV6 32B Instruct Preview is one of the largest and strongest RWKV model to date. 🤗
-
RWKV Flock of Finches 37B-A11B v0.1 Mixture of Experts Model 🤗
-
Ultra-Sparse Memory Network,
arXiv, 2411.12364
, arxiv, pdf, cication: -1Zihao Huang, Qiyang Min, Hongzhi Huang, ..., Ran Guo, Xun Zhou
-
Bi-Mamba: Towards Accurate 1-Bit State Space Models,
arXiv, 2411.11843
, arxiv, pdf, cication: -1Shengkun Tang, Liqun Ma, Haonan Li, ..., Mingjie Sun, Zhiqiang Shen · (𝕏)
-
Learning to (Learn at Test Time): RNNs with Expressive Hidden States,
arXiv, 2407.04620
, arxiv, pdf, cication: 19Yu Sun, Xinhao Li, Karan Dalal, ..., Tatsunori Hashimoto, Carlos Guestrin · (yueatsprograms.github) · (ttt-lm-pytorch - test-time-training)
-
Wave Network: An Ultra-Small Language Model,
arXiv, 2411.02674
, arxiv, pdf, cication: -1Xin Zhang, Victor S. Sheng
-
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters,
arXiv, 2410.23168
, arxiv, pdf, cication: -1Haiyang Wang, Yue Fan, Muhammad Ferjad Naeem, ..., Federico Tombari, Bernt Schiele · (TokenFormer - Haiyang-W)
-
MrT5: Dynamic Token Merging for Efficient Byte-level Language Models,
arXiv, 2410.20771
, arxiv, pdf, cication: -1Julie Kallini, Shikhar Murty, Christopher D. Manning, ..., Christopher Potts, Róbert Csordás
-
Scaling Diffusion Language Models via Adaptation from Autoregressive Models,
arXiv, 2410.17891
, arxiv, pdf, cication: -1Shansan Gong, Shivam Agarwal, Yizhe Zhang, ..., Hao Peng, Lingpeng Kong
· (arxiv) · (DiffuLLaMA - HKUNLP) · (huggingface)
-
large_concept_model - facebookresearch