This week I’m trying to automate some pieces of our disaster recovery test, which happens in less than two weeks. Previous tests have taken 4-6 hours to bring the BCP environment online and in the event of a true disaster, the faster we can bring it online the the better, so in the next two weeks whatever can get scripted… gets scripted.
Yesterday I wrote and tested a series of scripts to stop/disable and enable/start the IIS services on the web servers. There are multiple services that all need to be running, some of them take 5-10 seconds to start up, and some of them are dependent on others. In my testing, I was getting errors. I needed to insert a pause.
So I inserted a ping in the startup_iis.bat script.
sc \Web1 config IISADMIN start= auto
sc \Web1 start IISADMIN
ping 1.1.1.1
sc \Web1 config HTTPFilter start= auto
sc \Web1 start HTTPFilter
ping 1.1.1.1
sc \Web1 config W3SVC start= auto
sc \Web1 start W3SVC
The IP address 1.1.1.1 is invalid and not reachable, but the ping command will try anyway. It created a pause of about 20 seconds, enough time for one service to finish starting before I attempted to enable and start the next service.
As an Application Administrator, it’s ingrained into you the religion of Change Management and the love of uptime. So I cannot even begin to describe how counter-intuitive it is to forcibly take down the Production environment. How awkward it is to write scripts specifically designed to create a disastrous outage. It’s the closest I will come to being a Mad Scientist.
UPDATE: a friend on Facebook pointed out some the parameters accepted as options in ping.exe. His example:
ping 1.1.1.1 -n 1 -w 300000>nul
This makes one ping request with a timeout of 300,000 milliseconds (5 minutes). Nice if you are waiting for a server to reboot.