• tequinhu@lemmy.world
    link
    fedilink
    English
    arrow-up
    8
    ·
    4 months ago

    It really depends on the machine that is running the code. Pandas will always have the entire thing loaded in memory, and while 600Mb is not a concern for our modern laptops running a single analysis at a time, it can get really messy if the person is not thinking about hardware limitations

    • naught@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      8
      ·
      4 months ago

      Pandas supports lazy loading and can read files in chunks. Hell, even regular ole Python doesn’t need to read the whole file at once with csv

      • tequinhu@lemmy.world
        link
        fedilink
        English
        arrow-up
        3
        ·
        edit-2
        4 months ago

        I didn’t know about lazy loading, that’s cool!

        Then I guess that the meme doesn’t apply anymore. Though I will state that (from my anedoctal experience) people that can use Panda’s most advanced features* are also comfortable with other data processing frameworks (usually more suitable to large datasets**)

        *Anything beyond the standard groupby - apply can be considered advanced, from the placrs I’ve been

        **I feel the urge to note that 60Mb isn’ lt a large dataset by any means, but I believe that’s beyond the point