Solarpunk media recomendations?

houseofleft@slrpnk.net · 15 days ago

I’m a data engineer, use parquet all the time and absolutely love love love it as a format!

arrow (a data format) + parquet, is particularly powerful, and lets you:

Only read the columns you need (with a csv your computer has to parse all the data even if afterwards you discard all but one column)
Use metadata to only read relevant files. This is particularly cool abd probably needs some unpacking. Say you’re reading 10 files, but only want data where “column-a” is greater than 5. Parquet can look at file headers at run time, and figure out if a file doesn’t have any column-a values over five. And therefore, never have to read it!.
Have data in an unambigious format that can be read by multiple programming languages. Since CSV is text, anything reading it will look at a value like “2022-04-05” and say “oh, this text looks like dates, let’s see what happens if I read it as dates”. Parquet contains actual data type information, so it will always be read consistently.

If you’re handling a lot of data, this kind of stuff can wind up making a huge difference.

houseofleft@slrpnk.net · 19 days ago

Oh nice! I didn’t know about it- thanks for the link

houseofleft@slrpnk.net · 19 days ago

Solarpunk media recomendations?

houseofleft@slrpnk.net · edit-2 19 days ago

I’m a data engineer, and have seen an ungodly ammount of 200-but-actually-no-stuff-is-broken errors and it’s the bane of my life!

We have generic code to handle pulling in api data, and transforming it. It’s obviously check the status code, but any time an API implements this we have to choose between:

having code fail wierdly further down the line because can’t parse the status
adding in some kind of insane if not response.ok or "actually no there's an error really" in response.content logic

Every time you ignore protocols and invent your own, you are making everyone sad.

Will take recommendations of support groups I can join for victims of terrible apis.

houseofleft@slrpnk.net · 25 days ago

I lile this a lot. This reminds me a lot of KQL (a microsoft query language that’s used for a bunch if azure logging).

I use a lot of python pandas/dask- I’ve definitely got used to viewing a table as a series of operations to perform rather than the kind of declarative queries you get in SQL.

At what point is it no longer SQL? If we’re changing fundamental stuff, I’d love a way of writing loops or if statements that isn’t painful too.

houseofleft@slrpnk.net · 28 days ago

I though this would be some kind of scifi future Venice type thing, and was pretty stoked. Even more exciting that it’s a real project!

I surf and it’s amazing just how many beaches aren’t always safe to swim at, let alone city rivers and lakes. I think we forget how surreal it is how little lives in those waters.