Tech
Research
Projects
About
Technical Articles
2025
解读 | Native Sparse Attention: 硬件对齐的稀疏注意力机制
Deep Seek really makes GPUs sing.
2025
DeepSeek V3: Technical Report Explained.
Model architecture, training methods, and performance evaluation.
2025
DeepSeek R1: Technical Report Explained.
Model architecture, training methods, and performance evaluation.