Today I'm grateful I'm using Linux - Global IT issues caused by Crowdstrike update causes BSOD on Windows

Thorned_Rose@sh.itjust.works · 1 year ago

Today I'm grateful I'm using Linux - Global IT issues caused by Crowdstrike update causes BSOD on Windows

aard@kyu.de · 1 year ago

The annoying aspect from somebody with decades of IT experience is - what should happen is that crowdstrike gets sued into oblivion, and people responsible for buying that shit should have an epihpany and properly look at how they are doing their infra.

But will happen is that they’ll just buy a new crwodstrike product that promises to mitigate the fallout of them fucking up again.

0x0@programming.dev · 1 year ago

decades of IT experience

Do any changes - especially upgrades - on local test environments before applying them in production?

The scary bit is what most in the industry already know: critical systems are held on with duct tape and maintained by juniors 'cos they’re the cheapest Big Money can find. And even if not, There’s no time. or It’s too expensive. are probably the most common answers a PowerPoint manager will give to a serious technical issue being raised.

The Earth will keep turning.

goodgame@feddit.uk · 1 year ago

some years back I was the ‘Head’ of systems stuff at a national telco that provided the national telco infra. Part of my job was to manage the national systems upgrades. I had the stop/go decision to deploy, and indeed pushed the ‘enter’ button to do it. I was a complete PowerPoint Manager and had no clue what I was doing, it was total Accidental Empires, and I should not have been there. Luckily I got away with it for a few years. It was horrifically stressful and not the way to mitigate national risk. I feel for the CrowdStrike engineers. I wonder if the latest embargo on Russian oil sales is in anyway connected?

0x0@programming.dev · 1 year ago

I wonder if the latest embargo on Russian oil sales is in anyway connected?

Doubt it, but it’s ironic that this happens shortly after Kaspersky gets banned.

ik5pvx@lemmy.world · 1 year ago

Unfortunately falcon self updates. And it will not work properly if you don’t let it do it.

Also add “customer has rejected the maintenance window” to your list.

MyNameIsRichard@lemmy.ml · 1 year ago

Turns out it doesn’t work properly if you do let it

HumanPenguin@feddit.uk · 1 year ago

Not OP. But that is how it used to be done. Issue is the attacks we have seen over the years. IE ransom attacks etc. Have made corps feel they needf to fixed and update instantly to avoid attacks. So they depend on the corp they pay for the software to test roll out.

Autoupdate is a 2 edged sword. Without it, attackers etc will take advantage of delays. With it. Well today.

0x0@programming.dev · edit-2 1 year ago

I’d wager most ransomware relies on old vulnerabilities. Yes, keep your software updated but you don’t need the latest and greatest delivered right to production without any kind of test first.

HumanPenguin@feddit.uk · 1 year ago

Very much so. But the vulnerabilities do not tend to be discovered (by developers) until an attack happens. And auto updates are generally how the spread of attacks are limited.

Open source can help slightly. Due to both good and bad actors unrelated to development seeing the code. So it is more common for alerts to hit before attacks. But far from a fix all.

But generally, time between discovery and fix is a worry for big corps. So why auto updates have been accepted with less manual intervention than was common in the past.

SayCyberOnceMore@feddit.uk · 1 year ago

I would add that a lot of attacks are done after a fix has been released - ie compare the previous release with the patch and bingo - there’s the vulnerability.

But agree, patching should happen regularly, just with a few days delay after the supplier release it.

shirro@aussie.zone · edit-2 1 year ago

I isn’t even a Linux vs Windows thing but a competent at your job vs don’t know what the fuck you are doing thing. Critical systems are immutable and isolated or as close as reasonably possible. They don’t do live updates of third party software and certainly not software that is running privileged and can crash the operating system.

I couldn’t face working in corporate IT with this sort of bullshit going on.

rozodru@lemmy.world · 1 year ago

This is just like “what not to do in IT/dev/tech 101” right here. Every since I’ve been in the industry for literally decades at this point I was always told, even when in school, “Never test in production, never roll anything out to production on a Friday, if you’re unsure have someone senior code review” of which, Crowdstrike, failed to do all of the above. Even the most junior of junior devs should know better. So the fact that this update was allowed go through…I mean blame the juniors, the seniors, the PM’s, the CTO’s, everyone. If your shit is so critical that a couple bad lines of poorly written code (which apparently is what it was) can cripple the majority of the world…yeah crowdstrike is done.

magic_lobster_party@kbin.run · 1 year ago

It’s incredible how an issue of this magnitude didn’t get discovered before they shipped it. It’s not exactly an issue that happens in some niche cases. It’s happening on all Windows computers!

This can only happen if they didn’t test their product at all before releasing to production. Or worse: maybe they did test, got the error, and they just “eh, it’s probably just something wrong with test systems”, and then shipped anyway.

This is just stupid.

☂️-@lemmy.ml · 1 year ago

I couldn’t face working in corporate IT with this sort of bullshit going on.

im taking you don’t work in IT anymore then?

KillerTofu@lemmy.world · 1 year ago

There are state and government IT departments.

CalcProgrammer1@lemmy.ml · edit-2 1 year ago

It’s also a “don’t allow third party proprietary shit into your kernel” issue. If the driver was open source it would actually go through a public code review and the issue would be more likely to get caught. Even if it did slip through people would publically have a fix by now with all the eyes on the code. It also wouldn’t get pushed to everyone simultaneously under the control of a single company, it would get tested and packaged by distributions before making it to end users.

Aphelion@lemm.ee · 1 year ago

deleted by creator

Phoenix3875@lemmy.world · edit-2 1 year ago

deleted by creator

Aceticon@lemmy.world · edit-2 1 year ago

More generally: delegate anything critical to a 3rd party and you’ve just put your business at the mercy of the quality (or lack thereof) of their own business processes which you do not control, which is especially dangerous in the current era of “cheapest as possible” hiring practices.

Having been in IT for almost 3 decades, a lesson I have learned long ago and which I’ve also been applying to my own things (such as having my own domain for my own e-mail address rather than using something like Google) was that you should avoid as much as possible to have your mission critical or hard to replace stuff dependent on a 3rd Party, especially if the dependency is Live (i.e. activelly connected rather than just buying and installing their software).

I’ve managed to avoid quite a lot of the recent enshittification exactly because I’ve been playing it safe in this domain for 2 decades.

msage@programming.dev · 1 year ago

So it’s Linux vs Windows

SayCyberOnceMore@feddit.uk · 1 year ago

No it’s Crowdstrike… we’re just seeing an issue with their Windows software, not their Linux software.

TCB13@lemmy.world · 1 year ago

While I don’t totally disagree with you, this has mostly nothing to do with Windows and everything to do with a piece of corporate spyware garbage that some IT Manager decided to install. If tools like that existed for Linux, doing what they do to to the OS, trust me, we would be seeing kernel panics as well.

tenchiken@lemmy.dbzer0.com · 1 year ago

Hate to break it to you, but CrowdStrike falcon is used on Linux too…

kautau@lemmy.world · 1 year ago

And if it was a kernel-level driver that failed Linux machines would fail to boot too. The amount of people seeing this and saying “MS Bad,” (which is true, but has nothing to do with this) instead of “how does an 83 billion dollar IT security firm push an update this fucked” is hilarious

Badabinski@kbin.earth · 1 year ago

Falcon uses eBPF on Linux nowadays. It’s still an irritating piece of software, but it no make your boxen fail to boot.

Skull giver@popplesburger.hilciferous.nl · 1 year ago

Even if it doesn’t kernel panic, a broken eBPF program can break all networking and I/O and effectively cripple a “running” system.

eBPF is better in a lot of aspects, but it won’t prevent software intended to block syscalls from breaking your machines if the code breaks.

The solution posted everywhere, simply delete the broken driver files, isn’t difficult or time consuming, except for situations where tens of thousands of devices stop responding at once, or where every machine is asking you for the encryption key because you’ve altered your boot parameters. Linux’ saving grace here may be that Bitlocker-style encryption is a pain to set up so Linux servers typically don’t do the encryption at all, but the recovery process for enterprise customers would still be very manual and time consuming.

NaN@lemmy.sdf.org · 1 year ago

It was panicking RHEL 9.4 boxes a month ago.

Badabinski@kbin.earth · 1 year ago

Were you using the kernel module? We’re using Flatcar which doesn’t support their .ko, and we haven’t been getting panics on any of our machines (of which there are many).

NaN@lemmy.sdf.org · 1 year ago

Nah it was specifically related to their usage of BPF with the Red Hat kernel, since fixed by Red Hat. Symptom was, you update your system and then it panics. Still usable if you selected a previous kernel at boot though.

Aniki 🌱🌿@lemmings.world · 1 year ago

You’re asking the wrong question: why does a security nightmare need a 90 billion dollar company to unfuck it?

magic_lobster_party@kbin.run · 1 year ago

What’s your solution to cyberattacks?

Aniki 🌱🌿@lemmings.world · edit-2 1 year ago

Linux in the hands of professionals. There’s a reason IIS isn’t used anymore.

magic_lobster_party@kbin.run · 1 year ago

That doesn’t solve anything. Linux is also subject to cyberattacks.

DigitalDilemma@lemmy.ml · 1 year ago

And Macs, we have it on all three OSs. But only Windows was affected by this.

biscuitswalrus@aussie.zone · 1 year ago

Hate to break it to you, but most IT Managers don’t care about crowdstrike: they’re forced to choose some kind of EDR to complete audits. But yes things like crowdstrike, huntress, sentinelone, even Microsoft Defender all run on Linux too.

TCB13@lemmy.world · 1 year ago

Yeah, you’re right.

Mikina@programming.dev · 1 year ago

I wouldn’t call Crowdstrike a corporate spyware garbage. I work as a Red Teamer in cybersecurity, and EDRs are bane of my existence - they are useful, and pretty good at what they do. In the last few years, I’m struggling more and more to with engagements we do, because EDRs just get in the way and catch a lot of what would pass undetected a month ago. Staying on top of them with our tooling is getting more and more difficult, and I would call that a good thing.

I’ve recently tested a company without EDR, and boy was it a treat. Not defending Crowdstrike, to call that a major fuckup is great understatement, but calling it “corporate spyware garbage” feels a little bit unfair - EDRs do make a difference, and this wasn’t an issue with their product in itself, but with irresponsibility of their patch management.

Aniki 🌱🌿@lemmings.world · 1 year ago

How is it not a window problem?

DigitalDilemma@lemmy.ml · 1 year ago

The fault seems to be 90/10 CS, MS.

MS allegedly pushed a bad update. Ok, it happens. Crowdstrike’s initial statement seems to be blaming that.

CS software csagent.sys took exception to this and royally shit the bed, disabling the entire computer. I don’t think it should EVER do that, so the weight of blame must lie with them.

The really problematic part is, of course, the need to manually remediate these machines. I’ve just spent the morning of my day off doing just that. Thanks, Crowdstrike.

Jako301@feddit.de · 1 year ago

Why should it be? A faulty software update from a 3rd party crashes the operating system. The exact same thing could happen to Linux hosts as well with how much access those IPSec programms usually get.

Aniki 🌱🌿@lemmings.world · edit-2 1 year ago

But that patch is for windows, not Linux. Not a hypothetical, this is happening.

jet@hackertalks.com · 1 year ago

Your fixated on the wrong part of the story. Synchronized supply chain update takes out global infrastructure isn’t a windows problem, this happens on linux too!

Just because a drunk driver crashes their BMW into a school doesn’t mean drunk driving is only a BMW vehicle problem.

limelight79@lemm.ee · 1 year ago

I love how quickly everyone has forgotten about that xz attack.

I use and love Linux and have for over two decades now, but I’m not going to sit here and claim that something similar to the current Windows issue can’t happen to Linux.

Aniki 🌱🌿@lemmings.world · 1 year ago

xz attack

That has nothing to do with this. That was a security vulnerability, solved in record time, blame where it was due, and patched in hours.

limelight79@lemm.ee · 1 year ago

You’re missing the point. That compromised xz made it into some production distributions. The point here is that shit can happen to Linux, too.

Aniki 🌱🌿@lemmings.world · edit-2 1 year ago

If BMW makes a car that has square wheels and needs to have everyone install round wheels so the fucking thing works you can’t blame a company for making wheels.

It’s a Microsoft problem through and through.

jet@hackertalks.com · edit-2 1 year ago

Your counter to the BMW Drunk driver example didn’t address drunk driving in volvos, toyotas, fords… you just introduced a variable that your upset with. BMW’s having weird wheels has nothing to do with Drunk Driving incidents.

Again your focused on the wrong thing, this story is a warning about supply chain issues.

Your just memeing on the hate for windows.

Have you never seen a DNS outage, a ansible outage, a terraform outage, a RADIUS outage, a database schema change outage, a router firmware update outage?

Aniki 🌱🌿@lemmings.world · 1 year ago

Again, you’re talking about something I am not. I am talking about THIS problem, right here, that is categorically a windows problem, in that it’s not on the linux kernel stack, or mac. How is this NOT a windows problem??

fin@sh.itjust.works · 1 year ago

That’s hell of a strike to the crowd

nickiam2@aussie.zone · edit-2 1 year ago

I work in hospitality and our systems are completely down. No POS, no card processing, no reservations, we’re completely f’ked.

Our only saving grace is the fact that we are in a remote location and we have power outages frequently. So operating without a POS is semi-normal for us.

axzxc1236@lemm.ee · edit-2 1 year ago

I am born too late to understand what Y2K problem was, this (the result) might be what people thought could happen.

cannedtuna@lemmy.world · 1 year ago

Kinda I guess. It was about clocks rolling over from 1999 to 2000 and causing a buffer overflow that would supposedly crash all systems everywhere causing the country to come to a hault.

caseyweederman@lemmy.ca · 1 year ago

And it was okay because a lot of people worked really really hard to make it be okay.

Hildegarde@lemmy.world · 1 year ago

Most old systems used two digits for years. The year would go from 99 to 0. Any software doing a date comparison will get a garbage result. If a task needs to be run every 5 minutes, what will the software do if that task was last run 99 years from now? It will not work properly.

Governments and businesses spent lots of money and time patching critical systems to handle the date change. The media made a circus out of it, but when the year rolled over, everything was fine.

Aceticon@lemmy.world · 1 year ago

Also a lot of people were “on call” to handle any problems when the year changed, so the few problem that had passed unnoticed when doing the fixed and did pop up when the year changed, got solved a lot faster than they normally would.

cannedtuna@lemmy.world · 1 year ago

We also got the worst version of Windows ever, ME. Tho maybe with all the BS they’ve done with 11 that might change.

zod000@lemmy.ml · 1 year ago

I’m not sure I’d stick to calling it the worst version “ever” since MS is trying really hard to out do themselves.

HumanPenguin@feddit.uk · edit-2 1 year ago

Yep pretty much but on a larger scale.

1st please do not believe the bull that there was no problem. Many folks like me were paid to fix it before it was an issue. So other than a few companies, few saw the result, not because it did not exist. But because we were warned. People make jokes about the over panic. But if that had not happened, it would hav been years to fix, not days. Because without the panic, most corporations would have ignored it. Honestly, the panic scared shareholders. So boards of directors had to get experts to confirm the systems were compliant. And so much dependent crap was found running it was insane.

But the exaggerations of planes falling out of the sky etc. Was also bull. Most systems would have failed but BSOD would be rare, but code would crash and some works with errors shutting it down cleanly, some undiscovered until a short while later. As accounting or other errors showed up.

As other have said. The issue was that since the 1960s, computers were set up to treat years as 2 digits. So had no expectation to handle 2000 other than assume it was 1900. While from the early 90s most systems were built with ways to adapt to it. Not all were, as many were only developing top layer stuff. And many libraries etc had not been checked for this issue. Huge amounts of the infra of the world’s IT ran on legacy systems. Especially in the financial sector where I worked at the time.

The internet was a fairly new thing. So often stuff had been running for decades with no one needing to change it. Or having any real knowledge of how it was coded. So folks like me were forced to hunt through code or often replace systems that were badly documented or more often not at all.

A lot of modern software development practices grew out of discovering what a fucking mess can grow if people accept an “if it ain’t broke, don’t touch it” mentality.

sep@lemmy.world · 1 year ago

Was there patching systems and testing they survived the rollover months before it happened.
One software managed the rollover. But failed the year after. They had quickly coded in an explicit exception for 00. But then promptly forgot to fix it properly!.

caseyweederman@lemmy.ca · 1 year ago

Y2K was going to be the end of civilisation. This was basically done by the time I woke up today.

Strit@lemmy.linuxuserspace.show · 1 year ago

It’s also reported in Danish news now: https://www.dr.dk/nyheder/udland/store-it-problemer-flere-steder-i-verden

Maxy@lemmy.blahaj.zone · 1 year ago

Dutch media are reporting the same thing: https://nos.nl/l/2529468 (liveblog) https://nos.nl/l/2529464 (Normal article)

stoy@lemmy.zip · 1 year ago

I just saw it on the Swedish national broadcaster’s website:

https://www.svt.se/nyheter/snabbkollen/it-storningar-varlden-over-e1l936

DigitalDilemma@lemmy.ml · 1 year ago

Am on holiday this week - called in to help deal with this shit show :(

jet@hackertalks.com · 1 year ago

i hope you get overtime!

Botzo@lemmy.world · 1 year ago

Don’t worry, George Kurtz (crowdstrike CEO) is unavailable today. He’s got racing to do #04 https://www.gt-world-challenge-america.com/event/95/virginia-international-raceway

catculation@lemmy.zip · 1 year ago

Even 911 is impacted

The Ramen Dutchman@ttrpg.network · 1 year ago

That’s potentially life threatening. I wonder if 112 in other countries is affected, it shouldn’t be but at this point I’m afraid it is.

1 year ago

In the Netherlands 112 is fine, most critical systems are. It’s mostly airports that are getting fucked by this it seems.

Banks and PSPs are fine here too.

NaN@lemmy.sdf.org · 1 year ago

In the US 911 is decentralized, so widespread things will always affect it in some places. Solarwinds hack was another one.

Assuming the entire phone system isn’t down, there are typically very shitty to deal with workarounds for CAD outages.

Asidonhopo@lemmy.world · 1 year ago

US and UK flights are grounded because of the issue, banks, media and some businesses not fully functioning. Likely we’ll see more effects as the day goes on.

sabreW4K3@lazysoci.al · 1 year ago

Is there a chance that this makes organisations move to Linux?

shirro@aussie.zone · 1 year ago

Windows usage isn’t the cause of dysfunction in corporate IT but a symptom of it. All you would get is badly managed Linux systems compromised by bloated insecure commercial security/management software.

magic_lobster_party@kbin.run · edit-2 1 year ago

Not really. This isn’t a Windows problem. This is a faulty software problem. People can write faulty software on Linux too.

aasatru@kbin.earth · 1 year ago

I guess they would want some cybersecurity software like Crowdstrike in either case? If so, this could probably have happened on any system, as it’s a bug in third party software that crashes the computer.

Not that I know much about this, but if this leads to a push towards Linux it would be if companies already wanted to make the switch, but were unwilling because they thought they needed Crowdstrike specifically. This might lead them to consider alternative cybersecurity software.

Aniki 🌱🌿@lemmy.zip · 1 year ago

You’d think maybe not being reliant on a 90 billion dollar company to un-fuck security would be a bigger deal than it is.

abbiistabbii@lemmy.blahaj.zone · 1 year ago

We’re all going to be so smug.

SeattleRain@lemmy.world · 1 year ago

It’s proving that POSIX architecture is necessary even if it requires additional computer literacy on the part of users and admins.

The risk of hacking (which is what Crowdstrike essentially does to get so deeply embedded and be so effective at endpoint protection) a monolithic system like Windows OS is if you screw up the whole thing comes tumbling down.

Skull giver@popplesburger.hilciferous.nl · 1 year ago

As Nvidia proves regularly, a Linux kernel driver can make a system unbootable just as easily as a broken Windows driver can.

NaN@lemmy.sdf.org · 1 year ago

It happens on Linux too: https://access.redhat.com/solutions/7068083

DigitalDilemma@lemmy.ml · 1 year ago

That’s an old alert. We run CS on Linux as well and have not encountered this issue in the two years we’ve had it going.

NaN@lemmy.sdf.org · 1 year ago

It was affecting RHEL 9.4 users within the last two months.

DigitalDilemma@lemmy.ml · 1 year ago

This specific issue was triggered today by a microsoft update - that’s something else.

Agree it may be indicative of poor quality software control, but it’s not this.

NaN@lemmy.sdf.org · edit-2 1 year ago

This specific issue is different than the other specific issue, correct.

The point is, “this could only happen on windows” is wrong.

Simmy@lemmygrad.ml · 1 year ago

I’ve heard not all Windows versions are effect by Crowdstrike depending if it was recently updated or not. It’s not clear which versions are effected. One other thing I thought Windows has a micro Kernel, and Linux is monolithic.

Thorned_Rose@sh.itjust.works · 1 year ago

For reference, this was the article I first read about this on: https://www.nzherald.co.nz/nz/bank-problems-reports-bnz-asb-kiwibank-anz-visa-paywave-services-down/R2EY42QKQBALXNF33G5PA6U3TQ/

isolatedscotch@discuss.tchncs.de · 1 year ago

after reading all the comments I still have no idea what the hell crowdstrike is

TimeSquirrel@kbin.melroy.org · edit-2 1 year ago

Seems to be some sort of kernel-embedded threat detection system. Which is why it was able to easily fuck the OS. It was running in the most trusted space.

Ok_imagination@lemmy.world · 1 year ago

AV, EDP they offer other solutions as well. I think their main selling point is tamper-proof protection as well.

Today I'm grateful I'm using Linux - Global IT issues caused by Crowdstrike update causes BSOD on Windows

Today I'm grateful I'm using Linux - Global IT issues caused by Crowdstrike update causes BSOD on Windows

Latest Crowdstrike Update Causes Blue Screen Of Death On Microsoft Windows, Multiple Users Affected