• mub@lemmy.ml
    link
    fedilink
    arrow-up
    1
    arrow-down
    1
    ·
    5 months ago

    It is rare that I fail to get the gist of what is being said in these technical explanations, but this one has me actually wondering about the gist of the gist. Some of it made me feel like it was made up nonsense.

    • ☆ Yσɠƚԋσʂ ☆@lemmy.mlOP
      link
      fedilink
      arrow-up
      2
      ·
      5 months ago

      It seemed pretty clear to me. If you have any clue on the subject then you presumably know about the interconnect bottleneck in traditional large models. The data moving between layers often consumes more energy and time than the actual compute operations, and the surface area for data communication explodes as models grow to billions parameters. The mHC paper introduces a new way to link neural pathways by constraining hyper-connections to a low-dimensional manifold.

      In a standard transformer architecture, every neuron in layer N potentially connects to every neuron in layer N+1. This is mathematically exhaustive making it computationally inefficient. Manifold constrained connections operate on the premise that most of this high-dimensional space is noise. DeepSeek basically found a way to significantly reduce networking bandwidth for a model by using manifolds to route communication.

      Not really sure what you think the made up nonsense is. 🤷