Article 679R3 It Took Just Four Days From Elon Gleefully Admitting He’d Unplugged A Server Rack For Twitter To Have A Major Outage

It Took Just Four Days From Elon Gleefully Admitting He’d Unplugged A Server Rack For Twitter To Have A Major Outage

by
Mike Masnick
from Techdirt on (#679R3)

I know, I know. Some of the more angry commenters around here keep insisting that I should stop talking about Elon Musk and Twitter, and I want to do exactly that. I planned to do exactly that and not write another post about it all until next week. And then... Twitter crashed hard last night. Downdetector has the receipts:

image-85.png?resize=752%2C316&ssl=1

Here's what happened when I went to visit Twitter:

image-86.png?resize=547%2C468&ssl=1

I especially like that it's not your fault" bit, because, well, yeah. It's not.

As I write this, there hasn't been anything official about what happened, but I'm assuming that Elon will show up at some point to blame the woke mind virus" or the federal reserve or SBF or Anthony Fauci.

And, it may be a total coincidence, but it was just four days ago that he bragged about pulling the plug on an important server rack."

image-87.png?resize=548%2C735&ssl=1

Separately, there have been reports that Musk decided (with little to no notice, and almost no planning) to shut down its Sacramento data center and massively downsize their Atlanta data center. Twitter only has one other data center in the US, in Portland, Oregon. Twitter's use of data centers rather than the cloud is something that's been discussed over the years, and two years ago the company did sign a deal to start using Amazon Web Services, though I don't think the company relies too heavily on it yet, and the first link in this paragraph notes that Elon has been trying to renegotiate the AWS contract as well (which might mean he's also stopped paying the bills as he seems to have done that with many vendors as part of his renegotiation" efforts).

Separately, I've heard from three separate people that Elon more or less ordered the shutdown of the an entire data center (presumably the Sacramento one) with basically one day's notice and no planning.

And, with that in mind, I'll remind people that one part of former Twitter security chief Peiter Mudge" Zatko's whistleblower report noted that the company had a deep need for more redundancy, not less:

Insufficient data center redundancy, without a plan to cold-boot orrecover from even minor overlapping data center failure, raising the risk ofa brief outage to that of a catastrophic and existential risk for Twitter'ssurvival

That report also presented a redacted version of the threat matrix" Mudge claims he wanted to show the Board, though was urged only to give a high level overview, orally, rather than present a more complete written report. It again notes that a data center failure could be catastrophic.

image-88.png?resize=759%2C446&ssl=1

Later in the report, Mudge notes that this almost happened in the past:

Cascading data center problems: In or around the spring of 2021, Twitter'sprimary data center began to experience problems from a runaway engineeringprocess, requiring the company to move operations to other systems outside of thisdatacenter. But, the other systems could not handle these rapid changes and alsobegan experiencing problems. Engineers flagged the catastrophic danger that allthe data centers might go offline simultaneously. A couple months earlier inFebruary, Mudge had flagged this precise risk to the Board because Twitter datacenters were fragile, and Twitter lacked plans and processes to cold boot." Thatmeant that if all the centers went offline simultaneously, even briefly, Twitter wasunsure if they could bring the service back up. Downtime estimates ranged fromweeks of round-the-clock work, to permanent irreparable failure.

Black Swan" existential threat: In fact, in or about Spring of 2021, just such anevent was underway, and shutdown looked imminent. Hundreds of engineersnervously watched the data centers struggle to stay running. The senior executivewho supervised the Head of Engineering, aware that the incident was on the verge of taking Titer offine for weeks, months or permanently, insisted the Board of Directors be informed of an impending catastrophic Black Swan" event. BoardMember [REDACTED] responded with words to the effect of Isn't this exactlywhat Mudge warned us about?" Mudge told [REDACTED] that he was correct. In the end, Twitter engineers working around the clock were narrowly able to stabilize the problem before the whole platform shut down.

That's not to say that this has anything to do with the outages last night, but at the very least there are strong arguments that Twitter's infrastructure is inherently fragile, and shutting down sensitive" server racks or closing down entire data centers without careful planning seems like the sort of thing that could, well, backfire pretty badly.

Meanwhile, the only comment so far from Musk appears (it's tough to know because Twitter only loads intermittently) is him responding to someone saying works for me" when they asked about site problems. Also, in context, Musk is replying to a joke about the site being down, rather than a legitimate concern (someone asks if anyone can see or respond to their tweet, and one of Musk's biggest fans tweeted I can't see or respond to it" (obviously making light of the whole thing) and then Musk responds with works for me."

image-89.png?resize=585%2C419&ssl=1

So it's not entirely fair to say this is a comment directly about the widespread outages. Assuming Musk realizes Billy is joking, then... it could just be a weak attempt at playing along? But here's the actual funny part. The Guardian has an article about Musk's tweet saying stuff works for me" except that stuff isn't working, because the Twitter embed is not showing properly, but instead is showing in failover mode, where if the embed won't load it just shows the alt-text in as tweet-like" a form as possible. This screenshot is just pure irony.

image-90.png?resize=632%2C380&ssl=1

I eagerly await the comments from folks who were insisting to me just yesterday that Twitter under Musk was functioning much better than before, and that this all proved he was right to get rid of approximately 75% of the workforce who obviously did nothing...

Oh and just as this post was being completed, Elon has a new story, claiming that Twitter was just rolling out significant backend server architecture changes" and that Twitter should feel much faster" (it doesn't, unless you're talking about the difference from not working at all... to kinda working some of the time?).

image-95.png?resize=586%2C114&ssl=1

Even if that was the cause of the outage (and... I'm doubtful), that still raises all sorts of questions about how the company prepared for the switchover, if it caused such a massive disruption in the process. That's... not how any of this should work.

External Content
Source RSS or Atom Feed
Feed Location https://www.techdirt.com/techdirt_rss.xml
Feed Title Techdirt
Feed Link https://www.techdirt.com/
Reply 0 comments