• 1 Post
  • 5 Comments
Joined 1 month ago
cake
Cake day: August 16th, 2024

help-circle
  • I’m a data engineer, use parquet all the time and absolutely love love love it as a format!

    arrow (a data format) + parquet, is particularly powerful, and lets you:

    • Only read the columns you need (with a csv your computer has to parse all the data even if afterwards you discard all but one column)

    • Use metadata to only read relevant files. This is particularly cool abd probably needs some unpacking. Say you’re reading 10 files, but only want data where “column-a” is greater than 5. Parquet can look at file headers at run time, and figure out if a file doesn’t have any column-a values over five. And therefore, never have to read it!.

    • Have data in an unambigious format that can be read by multiple programming languages. Since CSV is text, anything reading it will look at a value like “2022-04-05” and say “oh, this text looks like dates, let’s see what happens if I read it as dates”. Parquet contains actual data type information, so it will always be read consistently.

    If you’re handling a lot of data, this kind of stuff can wind up making a huge difference.




  • I’m a data engineer, and have seen an ungodly ammount of 200-but-actually-no-stuff-is-broken errors and it’s the bane of my life!

    We have generic code to handle pulling in api data, and transforming it. It’s obviously check the status code, but any time an API implements this we have to choose between:

    • having code fail wierdly further down the line because can’t parse the status
    • adding in some kind of insane if not response.ok or "actually no there's an error really" in response.content logic

    Every time you ignore protocols and invent your own, you are making everyone sad.

    Will take recommendations of support groups I can join for victims of terrible apis.



  • I though this would be some kind of scifi future Venice type thing, and was pretty stoked. Even more exciting that it’s a real project!

    I surf and it’s amazing just how many beaches aren’t always safe to swim at, let alone city rivers and lakes. I think we forget how surreal it is how little lives in those waters.