grouper.ieee.org (
ieee802.org)
experienced an outage today, 3 April 2013, that expanded well beyond
the scheduled downtime. The scheduled work involved replicating the
virtual machine to the offsite datacenter and bringing it back. IEEE-SA
is implementing a plan for disaster recovery to an offsite datacenter
(in another region of the country) in the event that the IEEE datacenter
is forced to shutdown, such as with Hurricane Sandy.
This
exact process had been tested by IT on a non-critical SA machine with
flawless results. That VM was replicated offsite, brought live at the
remote datacenter, tested to ensure all applications were working while
it was fully available to users, and then migrated back to NJ. Total
downtime for that exercise was about 10-15 minutes over an hour period.
Confidence was high that the migration process could be reproduced with
all other SA VMs.
Unfortunately,
today, after replicating the machine offsite, there were some problems
with VMware and IT had to contact the vendor for support. The need to
have external resources involved resulted in the increased outage time.
A plan will be established to mitigate this potential risk.
_____________________________________
Luigi Napoli
Sr. Technology Community Specialist
IEEE Standards Association
IEEE. Advancing Technology for Humanity.
_____________________________________