T-Mobile’s Neville Ray explains what caused yesterday’s network outage

tmobiletowersmagenta

Yesterday T-Mobile suffered a pretty serious network outage, and now the carrier has shed a bit more light on exactly what happened.

Neville Ray, T-Mobile’s president of technology, explains that the event that triggered the outage was a leased fiber circuit failure from a third-party provider in the Southeast. That’s the part of the country where we first started seeing complaints of network issues yesterday.

Ray goes on to say that this happens with every network and that T-Mo built redundancy and resiliency to prevent the issue from affecting customers. However, that redundancy failed, too, causing an overload that was then compounded by other factors.

All of these issues caused an IP traffic storm that spread from the Southeast to create “significant capacity issues” across the IMS (IP Multimedia Subsystem) core network that supports VoLTE calls. T-Mobile said in Q1 2020 that 91% of the total voice calls on its network were VoLTE, so its no surprise that an issue involving VoLTE calling would have such a major effect on customers.

Unsurprisingly, T-Mobile has taken steps to prevent this issue from happening again, including adding permanent additional safeguards.

The outage began at around 9:00 am PT/12:00 pm ET yesterday and T-Mo says that the issues were resolved at 10:00 pm PT/1:00 am ET, so this outage knocked out calling and texting most of the day for many customers. Because of this, it’s good to see Ray and T-Mobile be transparent about what caused the outage.

Source: T-Mobile

Tags: , , ,

  • Brenden Morris

    Notice how T-Mobile shifts the blame on someone else and can’t own up & take the heat.
    T-Mobile sucks!!!

    • Trevnerdio

      Which part? The one where their redundant systems failed?

    • WeAreAllOne

      How many times have you taken blame for something you were not responsible for? I hope you do every day. Do you have proof that TM claim is wrong?

    • dcmanryan

      I look at it from a construction stand point. If a sub contractor does something wrong at the end of the day it’s the general contractors problem to fix it and he excepts the blame and any fines and it’s up to him to recoup his costs from the sub contractor that caused the issues. It’s no different here.

  • Willie D

    You know, I could believe a fiber optic cable being severed maybe once in a blue moon but this has literally happened multiple times with TMo, not to mention other regional outages. Neville claims all carriers and networks have these issues, yet it seems TMobile specifically has either data and voice issues that happen 2-4x a year or its our data stolen and their system still hasn’t stopped that happening more than any other carrier. They talk about redundency but with as many fiber lines crossing the country you would think TMo let alone the 3rd party fiber optic provider would have rerouted traffic nationally away from this outage and tbrough non severed optical lines. Sorry Im not buying what TMo is trying to sell. Too many times has issues come up with them and network.

    • Mike Smith

      Not surprising as they’re the most advanced and most aggressive at upgrading and rolling out new tech? They’re leading in IPv6, they’re leading in percentage of calls over voLTE, they’re leading in 5G, etc.

      • Shaun Michalak

        You forgot, they lead the way with new technology.. Neither AT&T or Verizon really are using any kind of 5G technology on their towers.. Only T-Mobile is.. and let not forget that both of them also waited until MetroPCS worked out the details, and came out with 4GLTE and VoLTE before both, Verizon and AT&T used it.. Those 2 companies just sit back and wait for everyone else to fix, design, and debug technologies, before they use them.. T-Mobile is different, then as they are now with 5G..

    • agent_smith

      Look at the careful phrasing of the message too, deflect to a third-party provider but don’t name or shame. Phrasing is loose too. Did the circuit fail because of the third party? A dig gone wrong? A T-Mobile engineer misconfigured the link?

      Then as you mentioned, their lack of redundancy is clearly apparent here. One fiber link failing can not, and should not cause an entire nationwide network to collapse. They’ve always run a lean network, minimal deployment in new markets, minimal amounts of backup, if any. Even the backup 3G voice only runs on band 2 or band 4 and 2G voice on band 2, so if IMS/VoLTE goes down their voice footprint falls through the floor.

      This behavior was an acceptable trade-off when they were a cheap provider. With their pricing where it is now, customers deserve better quality.

      I was a customer of 10 years but feared this exact unreliability would play out with the Sprint merger and their high speed drive to roll out 5GNR. They were unreliable frequently enough to be an annoyance when they were first deploying LTE. Did not want to experience that again.

    • Mike Thaler

      Never have experienced an outage that lasted more than 5 or 10 min. until this. Have 8 lines around the country.
      The 3 in Miami were out from noon or 1PM til sometime after 11PM bedtime. (Oakland numbers)
      2 Miami numbers out same amount of tme.
      1 Seattle number worked
      2 lines temporarily in Vt. may have been out for a while.

      • Shaun Michalak

        Same here.. First time for me too, in over 10 years..

  • Daniel Cross

    From my understanding most of the mobile networks went down. Not just tmobile.

    • resource

      Nope, calls to T-Mobile from other networks failed, only T-Mobile had an issue

    • Jason Caprio

      It was just T-Mobile. My wife and her sister have T-Mobile and I have Verizon. Their phones were able to call mine, but when I tried to call them, it would say the number could not be completed as dialed. People who didn’t know any better probably thought it was on their end if they were on another carrier. T-Mobile phones could not call each other. They would drop from LTE to 3G/HSPA and then just hang on the dialer.

      I understand carriers can have outages, but 13 hours straight is unacceptable in my opinion. T-Mobile is going to lose a lot of customers over this.

      • Mike Smith

        Nah. Most customers didn’t even notice.

        • Shaun Michalak

          From what I know, It was already out for a few hours before I even noticed.. I was trying to tell my mom something and she was at the other end of the house.. I had to walk all the way across the house to tell her it.. Oh, the horrors.. lol

        • dcmanryan

          Lol, how many millions of customers did you poll to come up with that answer? Most means more than 50%. You’re obviously speculating. Stop it.

      • DominiMMIV

        Not going anywhere…

      • riverhorse

        ┬┐And when the carriers they switch to suffer an outage, move back to TMO?

      • Steve

        I’ve been a TMo subscriber since the beginning and they’ve been a good choice for me… But I always tell friends who are married especially with kid/s to have the parents on 2 different carriers (if possible), just incase something affects their signal whether they’re traveling and hit a bad patch of cell service or a carrier goes down for some reason.

  • Carlos

    Things happen – All carriers have issues. They acknowledged it and have fixed it. I’ve been with every single carrier and T-Mobile has treated me the best. They have always been transparent. Back when I lived in Wyoming I had Verizon and the towers in our area were out for over a month. No credit no reason given nothing. So for this only being a day… Not a big deal. Thanks TMO for continuing to shine. I’m sure I’ll get some anti TMO replies I really don’t care.

    • Shaun Michalak

      Someone accidentally hit a Verizon cable that was buried, and it took months for them to fix it here..

      I can not say that I have been with all carriers.. I have had AT&T, T-Mobile, MetroPCS (after T-Mobile bought them), Cellular One.. and I would have to agree with you on how I have been treated..

      My sister has Verizon cell service, and at least once a month, almost every month, she would have problems with her phone service where she lives..

  • Frankwhitess

    This is Madness !!!! I need a $500 credit to my account please .. also, a free line will smooth the stresses that I had to deal with due to the lost of service .

    • Shaun Michalak

      $500.. I want that ocean front property in Arizona that everyone says that they have for sale..

  • Mike Smith

    As an iPhone user I was completely unaffected. The Apple Messages app worked fine as did Facetime audio for calls. Didn’t even notice.

    • dcmanryan

      So you were affected as you had to use FaceTime and rely on iMessages.

      • Shaun Michalak

        That is what I was thinking.. It is like saying, hey, I used google duo instead of T-Mobile for service for talking, but hey, I was not effected because I use an android instead of an apple phone.. Kind of does not make sense..

        • Mike Smith

          Sure it does. I don’t know what “Google Duos” is but Messages is the default app on all iPhones. Most iPhone users wouldn’t even notice the outage unless they make a lot of calls. For those who just use data and text like me, no impact at all.

        • Shaun Michalak

          Google Duo is a way of calling each other, with video and sound, to communicate back and forth.. It is very similar to Apple’s Facetime app.. Since it does not use the actual phone companies talk abilities, only network data, it is a way to talk when data works, but talk does not.. You can use Google Duo on tablets, laptops, apple or android phones, so it is not specific to any certain type of device or brand like Facetime is with apple only products..

        • Mike Smith

          Well as it’s not a default app I am not sure it matters.

        • Shaun Michalak

          It comes preinstalled on the phones I got, but it is not mandatory to use it.. When you load it up, it will ask you to agree to their terms of service, and that is all that you need to do to use it.. Some of the doctors were using this to do video conferencing for face to face doctor visits during the COVID shutdown.. If you are just using it on your phone, you do not even have to have a google account to use it.. I tried with no google account connected to my phone, and it worked just fine..

        • Mike Smith

          That’s awesome. It’s great that non Apple users are getting options too!

        • Shaun Michalak

          Getting?? It has been out for 4 years now.. Yea, it may not have been out for 8 years like facetime has been, but at the same time, it is not a new release either. and to be perfectly honest, Duo is, in some ways, better then facetime.. To use facetime, you have to have an apple product.. There is no compatibility outside of that.. Duo on the other hand, works on computers, tablets, and it does not matter if you have an apple or an android phone, it works with both, which makes Duo much more compatible with all people then facetime is.. that is unless all the people you ever talk to, or know, only have apple phones..

      • Mike Smith

        Not at all, that’s what I would use anyway. There’s no way not to use Messages, and I prefer Facetime for audio calls anyway.

        • dcmanryan

          You used iMessages and obviously texted other iPhone users or it would not have worked. If you would have texted an Android user it would not have worked. Also you can 100% turn off iMessages so you’re not forced to use it and you say. You can send regular SMS/MMS messages. FaceTime would not have worked to call Android users either obviously.

        • Steve

          “completely unaffected” Is a stretch… As an iPhone user there were plenty of people that I could still communicate with. But as a smart phone user I communicate with a big mix of IOS & Android users.

        • Mike Smith

          I don’t, I don’t know any. And I have an S20 Ultra too.

        • Mike Smith

          That’s my point, most Apple users didn’t notice and/or were unaffected. I don’t really know any Android users. I know there are a lot of Android user but not in my are or social circles.

  • KMB877

    The damaged fiber optic in South-East shout down phone service in Vancouver WA?! And in New York City!? And in New Hampshire, and in Vermont… I’m not buying this explication.
    However, yesterday morning I heard on radio that all 4 were under a serious cyber attack (but nothing at the evening news). I’m inclined to believe this.

    • riverhorse

      I would think 247 they and all US targets are under attack by right states and groups.

    • Joe

      Umm…It’s been proven that there was no DDOS attack and that the entire problem steamed from T-mobiles network and did not effect Verizon, not to sure about ATT (there are mixed reports if this effected ATT or not), and did not affect sprint. Those reports were just early on theory’s.

    • Steve

      Your question is definitely something think about! It shows that possibly trouble in WA could affect the whole country too… Why would there be something so important in the southeast anyway?? With all the weather etc… there.

  • Joss Edbrards

    Having worked in Network Operation Centers for years, this root cause analysis does not surprise me. ISP’s and telcos rely on third parties to provide redundancy and sometimes they fail. This last weekend AT&T had a major Internet failure in the South and no general outcry. Out for 24 hours. T-Mobile’s was 12. It’s messed up, but it’s more common than you think.

    • Shaun Michalak

      What are you talking about.. All the nay sayers all swear that T-Mobile was the
      “only” one that had any kind of problems.. So I guess that must have just been your imagination.. lol

  • Chuan Ren

    I barely use calls normally but Monday was the day I really needed it when the service was totally off. I first thought it’s my phone’s problem until I couldn’t solve it after restarting it multiple times while the data indeed was working fine. I was travelling in the southern states and none of the states worked. Fortunately the data still worked so did my GPS.

    I don’t think the explanation makes much sense, especially this part: redundancy also fails — that’s what we set up redundancy for, and for both to fail at the same time, the chance is nearly 0 — unless there is NO redundancy or no testing at all. As my data worked fine, why VoLTE couldn’t work? I thought it is based on data only. I also tried google voice, it failed as well.

    So really I seriously doubt this part too:

    “Unsurprisingly, T-Mobile has taken steps to prevent this issue from happening again, including adding permanent additional safeguards.”

    • Shaun Michalak

      Actually, it could make perfect sense.. Say they put the redundancy up 4 years ago.. and it has not been checked to work, or used, in that 4 years.. It is very possible that something has happened in that length of time.. Look at the power outage in the North East 15?? years ago.. One breaker failed (I think in Ohio), the next one was supposed to stop the serge, and that one failed to do so, all the way up the line until it was up in Canada.. That protection was there, and was “supposed” to work, but they had been there so long, without being re-verified that they will work, that they all just failed, one after another.. You can not tell me that this type of thing is possible of happening on the electric grid, but not with phone service..

  • Dharharr

    Customers and observers (looking at you Jason Caprio, Willie D, Brenden Morris) can be so unforgiving at times. It’s too bad if people this reliant on cell phone connectivity can’t survive without it. Here’s a tip, get a HAM radio license and build an AREDN node. Decentralized VoIP calling, data, email, etc. It’s cheap to buy and operate. Should something ever happen to 1 carrier or all, people could still communicate.

  • Shaun Michalak

    To all those people blaming T-Mobile for screwing something up while doing the merger.. Did you read that.. Third party.. It was not them like all you blamed them for.. HA HA HA.. Sucks to be wrong, doesn’t it.. lol

  • Shaun Michalak

    IF I remember, I think this same thing happened down in Florida a few years ago too.. But that was with either verizon, or the cable company..

  • Notpoliticalyet

    Done with their excuses over the years. Monday was the last straw. I’m still having issues with some messages today. They’re full of it. And their support team with their canned opening responses we know that your internet connection is very important blah blah blah. 14 year customer going to Verizon. It was a okay ride but now I’m looking for a better ride. You guys keep drinking the T-Mobile Cool Aid.

  • Mike

    They must not have a good redundant system in place if the cause was a single fiber line. It’s really hard to believe a line in the southeast effected the whole nation. Now if they mentioned a bad fiber line going to NSA, that could make more sense. This single area problem don’t add up to much. (This also showed why analog is better than digital).

    • Shaun Michalak

      here is my question.. they said a “fiber line”, but how many lines in that fiber line?? You make it sound like a single fiber connection, which would be inaccurate.. When they run fiber, they usually do not run a single fiber connection.. It is multiple lines run at once, in one cluster.. to quote

      “Fiber count in cables ranges from 6 to 24 near residences and individual businesses to more than 1,000 on backbone routes.”

      “For the first time ever, data has been transmitted through a fiber optic network at a speed of 500 gigabits per second over a single wavelength channel.”

      Take that into consideration, then say it was all 6 or more lines that went down.. that is equivalent to up to 3 Tb of data down.. But they said the real problem went from IP Traffic congestion..

      You say this shows why analog is better?? I have no idea where you are going with that, because you can not run analog over fiber, it has to be digital.. Analog signals can only be run over copper lines, and you do know how many copper lines you would have to have to even compare to one single fiber strand to get the same amount of speed?

    • Rod R Knighten Jr

      Its hard to believe because you have no idea what your talking about.

      1. Analog losses to digital in every single metric. Analog cables are highly susceptible to interference from magnetic fields, electricity, other cables, the weather, the radio, and just about anything that comes near the line. Think when you run your fingers up and down a cheap headphone cable and you hear feedback or when the house phone cuts out when the microwave is on. Interference leads to packet degradation and eventually packet loss which means that you have to build out extra capacity for error correction (essentially sending every packet multiple times) or lost packets will cause the entire connection to fail. Digital has none of these issues and it allows for scaling in real time via compression.

      2. The fiber line going down isn’t actually what caused the outage.Image the fiber line is a freeway. Under normal conditions with all lanes open traffic flows smoothly and occasionally under heavy load it slows down but there’s a frontage road that can accommodate some overflow traffic. If there’s an accident on the frontage road and 3 lanes on the freeway are closed due to construction traffic will effectively come to a standstill as the freeway is overloaded. So even though the exits 3miles up the road are clear no cars will be able to utilize it because so much traffic is flowing toward the bottleneck that traffic backs up in the entire city. This is what happened to the network. One of their primary routes failed and the redundancy went down with it and it caused a cascading outage. There are 100M people in the SE US, and due to the Pandemic the network was already under heavier than normal load so as the network scrambled to balance the load a fail safe probably kicked it that shut it down and directed it to prioritize traffic deemed essential and kick everything else

      3. Redundancies are not designed to be used to in place of the thing they are a redundancy for. Going back to the Freeway example; the frontage road is a redundancy for the freeway but the frontage road can not and should not replace the freeway.