Github seems to be down.

Edit: After I made this, their status page finally updated to indicate an issue.

Update - We are experiencing interruptions in multiple public GitHub services. We suspect the impact is due to a database infrastructure related change that we are working on rolling back.

  • thesmokingman@programming.dev
    link
    fedilink
    arrow-up
    1
    ·
    1 month ago

    This is a common problem. Same thing happens with AWS outages too. Business people get to manually flip the switches here. It’s completely divorced from proper monitoring. An internal alert triggers, engineers start looking at it, and only when someone approves publishing the outage does it actually appear on the status page. Outages for places like GitHub and AWS are tied to SLAs that are tied to payouts or discounts for huge customers so there’s an immense incentive to not declare an outage even though everything is on fire. I have yelled at AWS, GitHub, Azure, and a few smaller vendors for this exact bullshit. One time we had a Textract outage for over six hours before AWS finally decided to declare one. We were fucking screaming at our TAM by the end because no one in our collective networks could use it but they refused to declare an outage.

    • RegalPotoo@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 month ago

      Or, alternatively, coms management is important and formally declaring an incident is an important part of outage response - going from “hey Bob something isn’t looking right can you check when you get a sec” to “ok, shits broken, everyone put down what you are working on and help with this. Jim is in charge of coordinating the technical people so we don’t make things worse, and should feed updates to Mike who is going to handle comms to non-technical internal people and to externals” takes management input

      • thesmokingman@programming.dev
        link
        fedilink
        arrow-up
        1
        ·
        1 month ago

        To be clear, usually there’s an approval gate. Something is generated automatically but a product or business person has to actually approve the alert going out. Behind the scenes everyone internal knows shit is on fire (unless they have shitty monitoring, metrics, and alerting which is true for a lot of places but not major cloud or SaaS providers).