Today has been one of those days. I’m aware that most of my readers probably aren’t interested in the nuts and bolts of server hardware technology, but I have a need to vent.

The Blade Center to which I am referring is made by IBM. It is a chassis for running blade servers, thin server computers without the typical peripheral ability of regular servers. These blade servers share the peripheral hardware, i.e. disk storage and input-output (I/O) units such as network ports that are hosted on the chassis itself. Our particular model, the S Chassis, has six blades and 8TB of storage. And as might be imagined, a fully loaded chassis such as ours has quite a power requirement.

Today was supposed to be a very big day in the life of the Blade Center. It is currently hosting about 2/3 of the developer environment infrastructure on a Hyper-V cluster, the place business groups go to test new products and services or solve support issues without the overhead of trying to do it in production. Over the past few months I have been preparing to move it from a small computer lab setting to one of the main corporate datacenters where I could rebuild it with new versions of software, and today was the day I was to start migrating the developer environment virtual servers to its sister chassis so it could be shut down and placed on the truck for the move across town. But fate decided to intervene.

A gentleman who works in the area and who shall remain nameless took a trip to the lab where the chassis lives this morning. And I haven’t quite gotten the whole story, but the gist of it is that he was helping out the IP telephony folks with their equipment and pulled two batteries out of the running backup power supply for the Blade Center. Of course the Blade Center, needing a certain amount of constant wattage in order to run, sensed that it did not have enough power and started shutting itself down, and I arrived at work to a situation of technical support mayhem.

The story does not end there, however. Once we found an alternate source of power, we found that the network I/O module was fried. The replacement for it will not arrive until tomorrow morning, and I will not know the full extent of the damage until I can get to a place where the entire chassis can power up and have a look. I do not have high hopes as it is a delicate piece of technology that has highly interdependent parts including SAS storage controllers that tend not to play well with disorderly dismounts. It could mean data loss.

Once I get through this, I will be thoroughly relieved when the chassis is safe in a secure datacenter where stray lab personnel can’t just walk in and basically pull the plug.

[Update]

I am very relieved that this happened in the process of decommissioning and preparation to move and rebuild the chassis and not under normal conditions. We dodged the bullet in terms of having the Blade Center come back up and replacing the network I/O allowed me to get most of the virtual machines off of it on time.  As it is, the cluster quorum was damaged which would have been a riskier fix.  But it also came with a sort of blessing in disguise as has been an engineering challenge to make Windows 2012 R2 work with the older network I/O and this new one has more modern capabilities than the last; and my hope is that it will be much easier to deal with than its sister Blade Center.

Advertisements