Some bad code just broke a billion Windows machines

TimeSquirrel@kbin.melroy.org · 6 months ago

Some bad code just broke a billion Windows machines

ansiz@lemmy.world · 6 months ago

There is no learning, companies just move to different antivirus. The new hotness, the cycle repeats over and over until the new antivirus does this same shit. Look at McAfee in 2010, in fact the CEO of Crowdstrike was the CTO of McAfee then. That easily took down millions of windows XP machines.

rottingleaf@lemmy.world · 6 months ago

in fact the CEO of Crowdstrike was the CTO of McAfee then

The hero of Linux adoption then. All hail - what’s the name of that guy?

Bruhh@lemmy.world · 6 months ago

This isn’t the Windows L you think it is. This can and has happened on Linux. It’s a Crowdstrike/Bad corp IT issue.

rottingleaf@lemmy.world · 6 months ago

I know, but the whole culture of using such things is Windows-centered.

dan@upvote.au · edit-2 6 months ago

Are there really a billion systems in the world that run Crowdstrike? That seems implausible. Is it just hyperbole?

MeekerThanBeaker@lemmy.world · 6 months ago

Probably includes a bunch of virtual machines.

Joelk111@lemmy.world · 6 months ago

Yeah, our VMs completely died at work. Has to set up temporary stuff on hardware we had laying around today. Was kinda fun, but stressful haha.

dan@upvote.au · 6 months ago

Could you just revert VMs to a snapshot before the update? Or do you not take periodic snapshots? You could probably also mount the VM’s drive on the host and delete the relevant file that way.

Encrypt-Keeper@lemmy.world · 6 months ago

Yes you can just go into safe mode on an affected machine and delete the offending file. The problem is it took a couple hours before that resolution was found, and it has to be done by hand on every VM. I can’t just run an Ansible playbook against hundreds of non-booted VMs. Then you have to consider in the case of servers, there might be a specific start up order, certain things might have to be started before other things and further fixing might be required given that every VM hard crashed. At the minimum it took many companies 6-12 hours to get back up and running and on many more it could take days.

dan@upvote.au · 6 months ago

Makes sense - thanks for the details.

Joelk111@lemmy.world · edit-2 6 months ago

Yeah, like the other person said, corporate IT is responsible for that stuff. I guess they’re working through the weekend to try to get it fixed.

biggerbogboy@sh.itjust.works · 6 months ago

I doubt it’s too much of a stretch, since even here in australia, we’ve had multiple airlines, news stations, banks, supermarkets and many others, including the aluminium extrusion business my father works at, all go down, scale this do hundreds of countries with populations tenfold of ours, it puts it into perspective that there may even be more than a billion machines affected

Imgonnatrythis@sh.itjust.works · 6 months ago

Despite how it may seem on Lemmy, most people have not yet actually switched to Linux. This stat is legit.

dan@upvote.au · 6 months ago

I know that Windows is everywhere, I just don’t know the percentage of Windows computers that run Crowdstrike.

TheDarksteel94@sopuli.xyz · 6 months ago

Keep in mind, it’s not just clients, but servers too. A friend of mine works for a decently sized company that has about 1600 (virtual) servers internationally. And yes, all of them were affected.

hglman@lemmy.ml · edit-2 6 months ago

You do realize that linux is something like 80% of servers. Which also well out number personal machines. If you include android linux is easily the most used os on the planet.

Buelldozer@lemmy.today · 6 months ago

It’s 80% of web servers but not 80% of ALL servers.

corsicanguppy@lemmy.ca · edit-2 6 months ago

There is learning here.

As companies, we put faith in an external entity with goals not identical to our own: a lot of faith, and a lot of control.

That company had the power to destroy our businesses, cripple travel and medicine and our courts, and delay daily work that could include some timely and critical tasks.

This is not crowdstrike’s fault; for the bad code yes, but for the indirect effects of that no. We knew - please tell me we had the brains god gave a gnat and we knew - that putting so much control in the hands of outsiders not concerned or aware of our detailed needs and priorities, was a negligent and foolish thing to do.

The lesson is to do our jobs: we need to ensure we have the ability to make the decisions to which were entrusted, and the power that authority gives us that our decisions when accepted are not threatened by a negligent mistake so boneheaded it’s all but the whim of a simpleton. We cannot choose to manage our part of our organization effectively, no matter how (un)important that organization or part is, and then share control with a force that we’ve seen can run roughshod over it.

It’s exactly like the leopards eating our face, except people didn’t see they were leopards. No one blames the leopards, as they’re just conforming to their nature, eventually.

And no one should blame this company for a small mistake, just because we let the jaws get so close to our faces that we became complacent.

Yaztromo@lemmy.world · 6 months ago

That company had the power to destroy our businesses, cripple travel and medicine and our courts, and delay daily work that could include some timely and critical tasks.

Unless you have the ability and capacity to develop your own ISA/CPU architecture, firmware, OS, and every tool you use from the ground up, you will always be, at some point, “relying on others stuff” which can break on you at a moments notice.

That could be Intel, or Microsoft, or OpenSSH, or CrowdStrike^0. Very, very, very few organizations can exist in the modern computing world without relying on others code/hardware (with the main two that could that come to mind outside smaller embedded systems being IBM and Apple).

I do wish that consumers had held Microsoft more to account over the last few decades to properly use the Intel Protection Rings (if the CrowdStrike driver were able to run in Ring 1, then it’s possible the OS could have isolated it and prevented a BSOD, but instead it runs in Ring 0 with the kernel and has access to damage anything and everything) — but that horse appears to be long out of the gate (enough so that X86S proposes only having Ring 0 and Ring 3 for future processors).

But back to my basic thesis: saying “it’s your fault for relying on other peoples code” is unhelpful and overly reductive, as in the modern day it’s virtually impossible to do so. Even fully auditing your stacks is prohibitive. There is a good argument to be made about not living in a compute monoculture^1; and lots of good arguments against ever using Windows^2 (especially in the cloud) — but those aren’t the arguments you’re making. Saying “this is your fault for relying on other peoples stuff” is unhelpful — and I somehow doubt you designed your own ISA, CPU architecture, firmware, OS, network stack, and application code to post your comment.

——- ^0 — Indeed, all four of these organizations/projects have let us down like this; Intel with Spectre/Meltdown, Microsoft with the 28 day 32-bit Windows reboot bug, and OpenSSH just announced regreSSHion.
^1 — My organization was hit by the Falcon Sensor outage — our app tier layers running on Linux and developer machines running on macOS were unaffected, but our DBMS is still a legacy MS SQL box, so the outage hammered our stack pretty badly. We’ve fortunately been well funded to remove our dependency on MS SQL (and Windows in general), but that’s a multi-year effort that won’t pay off for some time yet.
^2 — my Windows hate is well documented elsewhere.

BeardedGingerWonder@feddit.uk · 6 months ago

Have you never worked in corporate IT or something? Of course we should blame Crowdstrike, that way we don’t get a sev 1 on our scorecard.

stephen01king@lemmy.zip · 6 months ago

It’s funny that corporate IT will be one of the groups getting the blame in this case, despite it being in most cases not their decision that a company lacks a separate test and production environment. The executives that decided that usually gets off scot free.

BeardedGingerWonder@feddit.uk · 6 months ago

Hahah, no doubt, while popping in and out of the outage call repeating the phrases “can I get an update?”, " Is there an ETA on recovery?" and “We need to get this back online”

snownyte@kbin.run · 6 months ago

Combing over it’s Wikipedia article, this company already had a series of other issues.

Sucks to anyone who ever relied on them. Oh look at that, they’ve been acquiring other security startups and companies. Perhaps that should also be looked into as well?

Phoenixz@lemmy.ca · 6 months ago

And for the 451855528th time: switch to Linux already. Why do people keep paying for this shit? Every time I get excuses. I switched to a Linux desktop 20 years ago. There were enout moments that I needed to tweak things to make it work but for the last decade, I haven’t had any issues.

If you’re dum enough to use windows for servers then you just deserve to burn, if you make that decision then its all on you.

LainTrain@lemmy.dbzer0.com · 6 months ago

You make Linux users look bad

NostraDavid@programming.dev · 6 months ago

You didnt say he was wrong though.

stephen01king@lemmy.zip · 6 months ago

Sure, but damaging the sentiment of the position that he is arguing for makes him stupider than simply being wrong.

Phoenixz@lemmy.ca · 6 months ago

“stupider than simply wrong”

What are you? 5?

My sentiment is that it’s a crazy situation where people are defending a multi billion dollar company that we all coninously pay, who spies and serves ads despite said payments, that time after time willfully neglects security, anything in the name of profits, over a free system that works better, more reliably, is open, and dependable.

Your response: you’re stupid

stephen01king@lemmy.zip · 6 months ago

Yeah, if the only thing you can do to support your sentiment is to make it look unappealing to the majority of normal users, you would be pretty stupid. Or maybe you’re actually 5 and that was just a projection on your part, I wouldn’t know.

rottingleaf@lemmy.world · 6 months ago

They wouldn’t if they were consistent and had also left degenerate social media (which Lemmy is part of, despite being much better than corporate alternatives). But then they also wouldn’t because we wouldn’t read it here.

LainTrain@lemmy.dbzer0.com · 6 months ago

Very interesting. What are some other things that are degenerate?

rottingleaf@lemmy.world · edit-2 6 months ago

Production of computer hardware being centralized, the accepted amount of complexity and obscurity in that and customer software.

A desktop system should involve a lot of standardized coprocessors at least. Like in Amiga architecture.

It’s a bit sad that with RISC-V the seemingly accepted direction of development for desktops is replacing Intel\AMD with the same paradigm.

EDIT: I mean, a person asking this and apparently thinking that the word can only be used in fascist context, can be called degenerate in their education too =)

JeeBaiChow@lemmy.world · edit-2 6 months ago

Whoda thunk automatic updates to critical infrastructure was a good idea? Just hope healthcare life support was not affected.

Toribor@corndog.social · 6 months ago

Many compliance frameworks require security utilities to receive automatic updates. It’s pretty essential for effective endpoint protection considering how fast new threats spread.

The problem is not the automated update, it’s why it wasn’t caught in testing and how the update managed to break the entire OS.

Joe@discuss.tchncs.de · edit-2 6 months ago

It is pretty easy to imagine separate streams of updates that affect each other negatively.

CrowdStrike does its own 0-day updates, Microsoft does its own 0-day updates. There is probably limited if any testing at that critical intersection.

If Microsoft 100% controlled the release stream, otoh, there’d be a much better chance to have caught it. The responsibility would probably lie with MS in such a case.

(edit: not saying that this is what happened, hence the conditionals)

Toribor@corndog.social · 6 months ago

I don’t think that is what happened here in this situation though, I think the issue was caused exclusively by a Crowdstrike update but I haven’t read anything official that really breaks this down.

barsquid@lemmy.world · 6 months ago

Some comments yesterday were claiming the offending file was several kb of just 0s. All signs are pointing to a massive fuckup from an individual company.

Wiz@midwest.social · 6 months ago

Which makes me wonder, did the company even test it at all on their own machines first?

LainTrain@lemmy.dbzer0.com · 6 months ago

Nah EDR is pointless like all of cybersecurity. All these compliance frameworks are just a further grift to get a slice of B2B procurement budgets. The practice of cybersecurity has caused a more severe widespread outage than any malware ever could.

mriormro@lemmy.world · 6 months ago

lol, ok

fishpen0@lemmy.world · 6 months ago

OP is not entirely wrong. At least in Linux land you can now implement EDR like functionality entirely with EBPF without installing a fucking rootkit. So traditional EDR products are a grift if you are on the bleeding edge.

jumjummy@lemmy.world · edit-2 6 months ago

Ok Russian comrade. Security in companies is terrible. You’re right. It’s just a giant grift.

Now, go buy some limited time offer fight fight fight shoes from agent orange.

Kairos@lemmy.today · 6 months ago

Hospital stuff was affected. Most engineers are smart enough to not connect critical equipment to the Internet, though.

Dr. Arun Wadhwa@lemmy.world · 6 months ago

I’m not in the US, but my other medical peers who are mentioned that EPIC (the software most hospitals use to manage patient records) was not affected, but Dragon (the software by Nuance that we doctors use for dictation so we don’t have to type notes) was down. Someone I know complained that they had to “type notes like a medieval peasant.” But I’m glad that the critical infrastructure was up and running. At my former hospital, we used to always maintain physical records simultaneously for all our current inpatients that only the medical team responsible for those specific patients had access to just to be on the safe side.

RunningInRVA@lemmy.world · 6 months ago

This is pretty much correct. I work in an Epic shop and we had about 150 servers to remediate and some number of workstations (I’m not sure how many). While Epic make not have been impacted, it is a highly integrated system and when things are failing around it then it can have an impact on care delivery. For example if a provider places a stat lab order in Epic, that lab order gets transmitted to an integration middleware which then routes it to the lab system. If the integration middleware or the lab system are down, then the provider has no idea the stat order went into a black hole.

Phoenixz@lemmy.ca · 6 months ago

Also:

Crowd strike should be held responsible, and with that I don’t mean the developmers who were forced to do this shit, I mean the ceo, the CTO.

Jail them.

If you are so critical you better not fuck around and I can guarantee you, they were fucking around, pushing bad practices, etc. why do I say that? Because its lways like that

That comp ay should be dissolved, the C suite jailed.

Also, STOP USING WINDOWS FOR DESKTOP FOR FRACK SAKE. Switch to Linux already, I’m getting tired of having to read this shit.

If you’re using windows for servers then you deserve your place right next to those C suite guys and gals

Eggyhead@kbin.run · 6 months ago

How about holding an investigation first? You know, just to see where the wrongdoing happened and who actually perpetrated it. (It just might have been a bitter developer or something.)

Also, if people want to use windows, it’s their choice and their consequences. Government and corporate services might do well to consider Linux, but most people don’t even know what a command line is.

Phoenixz@lemmy.ca · edit-2 6 months ago

Because they’ve done that countless times before and its always the same. A few motbsh ago there was a senate hearing on ehy Microsoft knowingly caused the Chinese government hacking the US government by deciding not to fix critical security bugs to avoid losing contracts and thus,.money.

What is the result, every damn time?

Weeeeewwweee sowweeeyyyy, but the CEO is on it this time! THIS time we won’t fuck you over! That was what, a month ago?

Meanwhile I say, fuck Microsoft, stop paying for that corrupt badly built spyware shit, switch to Linux, and then I’m the bad guy.

Edit: judging from the downvotrs here, it’s fair to say that a lotmof people are perfectly fine with paying to get screwed over

werefreeatlast@lemmy.world · 6 months ago

It’s because Windows is crap software. Just stop using anything Microsoft makes.

MetaCubed@lemmy.world · 6 months ago

This was very much not caused by windows

General_Shenanigans@lemmy.world · 6 months ago

This happened because a file that CrowdStrike pushed out, which by their own processes is not one that is signed, was immediately pushed out with one of their updates. This update was pushed directly through CrowdStrike’s own method, not via Windows Update. CrowdStrike maintains this capability in order to quickly respond to and prevent security threats. The fact that they have .sys files that aren’t signed is crazy on its own, and a huge screwup by CrowdStrike. So many companies relied upon and trusted this company because up until now, everybody considered it a great product, so it was extremely popular and prevalent. It’s been a huge wake up call for everybody in I.T.

MetaCubed@lemmy.world · edit-2 6 months ago

I’m not sure if you intended to reply to me, but I am aware of this. Thanks for checking my understanding though :)

General_Shenanigans@lemmy.world · edit-2 6 months ago

lol you are correct. I meant to reply to the other guy. Low on sleep like many of us here

werefreeatlast@lemmy.world · 6 months ago

I don’t hear about billions of Linux or Mac computers going down all at the same time. I’m hearing that windows allows a simple text file change to bring down all of them at the same time.

MetaCubed@lemmy.world · 6 months ago

Calling a kernel mode driver a “simple text file” sure is interesting

werefreeatlast@lemmy.world · 6 months ago

Even if you write assembly code straight out like a total hacker, it’s still a text file. Literally jump 0x12345 is text. And if it’s just a few kilobits long, then it’s a simple text file yes. Got anything else to ad? Specially if the file actually doesn’t work and the system made to run it “windows” is such shit that every copy of it got halted.

Treczoks@lemmy.world · 6 months ago

In a way, it was. If Windows was not as crappy as it is, external solutions would not be needed.

stephen01king@lemmy.zip · 6 months ago

Linux machines also require Crowdstrike because of business requirements. That does mean Linux is just as crap as Windows then?

Treczoks@lemmy.world · 6 months ago

Do they really require it, or is this just the usual security theatre?

MetaCubed@lemmy.world · 6 months ago

Not to jump at you in another comment thread, but any OS that is deployed in a business environment should have some form of endpoint protection installed unless it is fully airgapped + isolated.

Despite the myth that “Linux doesn’t get malware”, it absolutely does and should have protection installed. Even if the OS itself was immune to infection, any possible update can introduce a vulnerability to that.

Additionally, again, even if the OS (or kernel in the case of linux) couldn’t be infected or attacked, the packages or services installed can be attacked, infected, or otherwise messed with and should be protected.

Some bad code just broke a billion Windows machines

Some bad code just broke a billion Windows machines

- YouTube