• 0 Posts
  • 6 Comments
Joined 11 months ago
cake
Cake day: March 22nd, 2024

help-circle
  • Qwen 2.5 is already amazing for a 14B, so I don’t see how deepseek can improve that much with a new base model, even if they continue train it.

    Perhaps we need to meet in the middle, and have quad channel APUs like Strix Halo become more common, and maybe release like 40-80GB MoE models. Perhaps bitnet ones?

    Or design them for asynchronous inference.

    I just don’t see how 20B-ish models can perform like one orders of magnitude bigger without a paradigm shift.