Why we monitor.
Apr. 3rd, 2009 01:22 pmAt each physical location of my employer, we have a machine which, among other things, runs monitoring software. The monitors consist of little pokes across the network: Are you there? Really? Are you doing anything? Do you know what time it is? Do you have to go to the bathroom?
Yeah, computers are like small children.
Early this morning, a monitor found a problem. It tapped one of our phones on the shoulder and didn't get a response.
After this happened a few times in a row, the monitor sent me email.
So I looked up the phone and saw it was assigned to a particular office, and I walked over and picked up the phone. No dialtone.
I'll shorten a long diagnostic process into the results: the local switch got terminally confused and wasn't letting through any traffic, including the desktop machines and phones. Power-cycling it fixed it.
All this happened before either of the developers in the office showed up, and so neither one knows that they had a problem.
Yeah, computers are like small children.
Early this morning, a monitor found a problem. It tapped one of our phones on the shoulder and didn't get a response.
After this happened a few times in a row, the monitor sent me email.
So I looked up the phone and saw it was assigned to a particular office, and I walked over and picked up the phone. No dialtone.
I'll shorten a long diagnostic process into the results: the local switch got terminally confused and wasn't letting through any traffic, including the desktop machines and phones. Power-cycling it fixed it.
All this happened before either of the developers in the office showed up, and so neither one knows that they had a problem.
(no subject)
Date: 2009-04-03 10:42 pm (UTC)There is almost no attention given to reliability and recovery from unexpected happenings. Just yesterday the scroll wheel on my mouse stopped working properly, it would work in some windows and not others, I had to reboot twice before it regained its sanity.
As more and more vital things depend upon this level of quality we can expect to see bigger and bigger disasters. The last two wide spread power blackouts were essentially software failures.
Vendors see no reason to build in reliability, it's not a selling point. Of course the same thing goes for other areas - pistachios, spinach or peanuts anyone?