SamotsvetyVIA [any]

Мой адрес – не дом и не улица, мой адрес – Советский Союз.

  • 4 Posts
  • 160 Comments
Joined 7 months ago
cake
Cake day: November 21st, 2024

help-circle


















  • They’ve had distills before this, a more accurate title would be “Newest DeepSeek R1 distill runs on a single GPU like all the previous ones”.

    Also it’s not accurate to say that a Qwen3 distill is the same as the DeepSeek R1 running in the datacenter - that one is still 85x larger than the Qwen3 distill.

    What stands out about DeepSeek-R1-0528-Qwen3-8B is that it only requires a GPU with 40GB to 80GB of RAM to run

    This is just inaccurate. It runs in 16GB of VRAM… because, you know, 8B parameters x 2 bytes (needed to store each parameter) = 16x10^9 bytes = 16GB…