A new architecture where LLMs serve as dynamic, evolving memory for other LLMs. / 以大语言模型作为另一个大语言模型的动态记忆。
Deep Seek really makes GPUs sing.
Model architecture, training methods, and performance evaluation.