I made my first trip to Israel this past week. It was agreat trip. Through our acquisitions, we now have several local development teams.  I was impressed with the innovation capabilities not only coming from our own teams but from the overall startup environment in general. There is a real energy here for innovation.

On the trip out we were leaving Newark on a non-stop to Tel Aviv. We had started our takeoff and were going about 150 mph; everything seemed just like the thousands of flights in the past. Except, just as we were about to take off, we heard a “boom.” It sounded like a tire blowing. A second or two went by and then the pilot slammed on the brakes – hard. Hard enough that stuff came out from under places that had clearly not been cleaned in a while. Stuff like magazines from 1998 and long lost cell phones. Being the analytic person that I am, I was thinking about if the aircraft designer had sized the brakes to stop a fully fueled and loaded plane with maybe 1/3 of the runway left. The answer to that question was clearly yes as we came to a stop at the end of the runway. Impressive.

After a few minutes the pilot came on and said that we had had a “catastrophic engine failure” – being the curious person that I am, I wanted to know what constituted “catastrophic.” The pilot’s reply was “that’s when you leave pieces of your engine back on the runway.”

The simple lesson is, no matter how good we make anything, it still can and will fail from time to time. I have been flying for decades, flown millions of miles and, thankfully, never experienced any situation like this one. Ironically, this event actually increased my confidence with our air travel systems. As an engineer, I know things will fail. Failures are not, in themselves, a reason to be more concerned. What is critical is that the systems have been properly designed to deal with failures; what I will simply describe as “resiliency.”

For IT leaders, I believe the bigger problem is when we try to have too much reliance on “reliability” and not enough reliance on “resiliency.” Sure, we don’t want to have components failing all of the time but we also better not be betting our business on the fact that hardware won’t fail.

The tolerance for downtime and “catastrophic” failures in IT is evaporating. In this case we should take a lesson from the airline industry and make resiliency the prime objective.

Mark…

Read More