中文整理
新浪微博的 VibeThinker-3B 只有 30 亿个参数,但在数学和编码基准上与 DeepSeek V3.2 和 Kimi K2.5 等模型相匹配。这些模型的尺寸扩大了 333 倍。秘密不在于尺寸,而在于多阶段的后期训练。研究人员根据他们的发现提出了一个假设:逻辑推理可以很好地压缩为小模型,但广泛的世界知识却不能。文章新浪的开放模式...
英文原文摘要
Sina Weibo's VibeThinker-3B has just three billion parameters but matches models like DeepSeek V3.2 and Kimi K2.5 on math and coding benchmarks. Those models are up to 333 times larger. The secret isn't size but multi-stage post-training. The researchers propose a hypothesis based on their findings: logical reasoning compresses well into small models, but broad world knowledge does not. The article Sina's open model…
来源标注
来源:The Decoder。本站仅做资讯整理与来源标注,不在页面提供外站跳转链接。