On the ground with AT&T's Network Disaster Recovery team

AT&T was running a disaster simulation here in Chicago a few weekends ago, and they invited us to their site to check out how they practice dealing with total network failures in times of disasters. AT&T says with some confidence that it's the only telecom provider in the world with this kind of capability, and it was pretty impressive to see how the Network Disaster Recovery team deployed, set up, and managed just a small subset of the $500M worth of emergency gear it has stashed all over the world.

AT&T's NDR crew had set up in one of Soldier Field's parking lots for the exercise, which was a simulation of a disaster in Peoria, Illinois that totally obliterated the local network connection office. (In order to be sensitive, AT&T doesn't come up with a specific disaster scenario, it just tells its team that a certain office has been wiped out.) When a local office goes down, AT&T loses basically everything in the area -- landlines, network trunks, cell service, you name it -- so the NDR team's goal is to go in and quickly restore normalcy within 168 hours from the time the call goes out. As one of the engineers on-site told us, things should be up "by the time people have made sure their families are safe and the cash registers are ringing again," which is usually about a week, but can be much shorter -- after 9/11, the NDR team was able to restore affected services in 53 hours. If things go according to plan, AT&T's network will see the NDR gear as having completely replaced the destroyed office, and it can stay in the field until the brick-and-mortar building is repaired and functional again.

As you can imagine, bringing an entire network office from the trunk to consumer wireless services back online requires a fair bit of equipment, and this exercise was no exception, featuring 17 semi-trailers full of gear, 10 support trailers, a couple specialized flatbeds, and one light-truck-based mobile cell station, pictured above. Here's a video of the team setting up for a similar exercise in Dallas:

Funnily enough, however, one of the most important vehicles was actually just an ordinary Suburban packed to the gills with satellite gear that provides the field team with communications -- it's one of the most flexible rigs we've ever seen, using a 4Mbps down / 2Mbps up satellite connection to do everything from providing landline dialtones to offering secure VPN connectivity to patching phone calls over UHF and VHF radios provided by government agencies. (We asked if the lag was low enough to support Xbox Live, but that just drew a laugh. We were serious!)

That truck gets patched into the command center trailer pictured above, which looks just like a normal office. Once the ground team has communications, the next step is to patch into the local fiber trunks, which can be as easy as parking next to the offline office or involve backhoes and digger gear to pull it out of the ground, depending on the scenario. Several of the NDR team is hazmat-certified, so they can even don special suits to go into dangerous areas and begin the process if necessary.

Since the goal of the team is to completely replace a destroyed office, the equipment on hand has to be equivalent to ATT's largest and most state-of-the-art CO. Router and switch configurations are stored in offsite backups, and once the site is set up, the team begins to flash each node with duplicate copies of the config files, effectively cloning them onto the network.

Power is a major issue, so the team brings along generators that plug into large battery bays that go through line filters -- if power from the grid goes out, the network equipment isn't affected while the generators are brought online. Similarly, all the trailers are heavily climate-controlled -- it was a fairly warm day when we toured the site, but inside the trailers it was positively brisk.

One of the cooler pieces of gear we saw was what the ATT tech charmingly referred to as a "POP in a Box" -- a specialized cargo container that contains enough gear to do the work of the entire site but still fit into a cargo plane and on the back of a flatbed. Designed for AT&T worldwide enterprise customers, it's the sort of capability that the reps were eager to show off, since most companies can't simply load up and deploy disaster services worldwide using standardized gear. We didn't understand half of the acronyms the tech threw at us when we were checking it out, but suffice it to say that it's one densely-packed little box -- it can take over for an entire remote office if necessary.

We tried our best to get these guys to slip up and drop some details about LTE deployment, but it wasn't going to happen, sadly. Even still, we came away impressed with how seriously AT&T takes this kind of capability -- the company runs four simulations a year (the next one is in Seattle), and the NDR team is composed entirely of volunteers from within the company who self-identify as NDR. Although the focus during the simulation was more on restoring backbone services, this is the gear that trunks consumer services like cell phones, landlines, and internet services together, and it was fairly amazing to see it all rolled into a parking lot so casually for this simulation. Let's just hope AT&T doesn't have much call to deploy it for real anytime soon.