discipline banner.gif (11841 bytes)

 

Columns

i. Architecture Teams

ii. Architectural Requirements

iii. Minimalist Architecture

Related White Papers

i. Software Architecture (.pdf)

ii. Visual Architecting Process (.pdf)

For more on Architecture Action Guides see

i. Downloads

ii. Action Guide book draft

iii. Software Architecture Workshop

For more on Architecting see

Architecting Process

Book List

Project Wipe-out: Big Failures

Architects help manage risk. We learn from mistakes, so the best mistakes to learn from are someone else's! Here are some big failures to learn from.

System faults and failures--some examples that made the news

Software and systems project failure:

Software system failure:

Classic examples of system failures

Bugs, Slugs and Work-arounds

Learning from failure:

Research papers

Learning from failure

Funny fail

From Ruth Malan's Journal

By analogy

What to do

Gems and Keepers

The most precious gem found on this mining expedition:

"...a club that began in 1945 when engineers found a moth in Panel F, Relay #70 of the Harvard Mark II system.The computer was running a test of its multiplier and adder when the engineers noticed something was wrong. The moth was trapped, removed and taped into the computer's logbook with the words: 'first actual case of a bug being found.'" from History's Worst Software Bugs, Simson Garfinkel, Wired, Nov 08, 2005.

I might keep this one around for when my kids are teenagers:

"Good judgment comes from experience, and experience comes from bad judgment." Barry LePatner

And this is a good one for a chuckle in workshops:

"None of us is as dumb as all of us." Jeff Atwood

These are neat too:

"The nicest thing about not planning is that failure comes as a complete surprise and is not preceded by a period of worry and depression." - Sir John Harvey-Jones 

"Not only are a system’s desired operating modes influenced by its architecture, but so are some of its failure modes. Thus an architecture that permits only one path between elements may fail if a leg of any path breaks. All of a tree below a broken node is isolated from the rest of the tree."

-- Edward Crawley, Olivier de Weck, Steven Eppinger, Christopher Magee, Joel Moses, Warren Seering, Joel Schindall, David Wallace and Daniel Whitney, "The Influence of Architecture in Engineering Systems,"
MIT esd, March 2004

"Success breeds complacency. Complacency breeds failure. Only the paranoid survive." -- Andrew Grove

“If automobiles had followed the same development cycle as the computer, a Rolls-Royce would today cost $100, get a million miles per gallon, and explode once a year, killing everyone inside” -- Robert Cringely

Editorial Comment on Software System Failures

These horror stories (I can't let them enter my mind when I'm driving my car!), neglect several points of note:

  • software is everywhere, and most of the time, we don't even notice it. That means it is doing its job well—most of the time.
  • small software failures do not make the news. They may add up to big losses for our economy and for our businesses, but they are under the radar.

Even when these failures cost nothing more than hours, it is a big drain on productivity, and a high stress burden. The gain is high. The pain is sporadic, and usually hits below our pain-versus-gain tolerance threshold and we grumble, but put up with it.

Still, ignoring the pain is not a very resourceful way to approach our future health!

Software development is a complex, human endeavor. Even the very smartest, best people make mistakes. It is a hard truth, but bugs are hard to eliminate—entirely. Some of the approach to bug control is architectural:

  • reduce complexity: separation of concerns, partitioning the problem, encapsulation, ...
  • bug elimination: use proven parts where available; insist parts come with test suites and bug reports; design tests for the interactions among parts; ...
  • bug damage control: identify failure modes and failure consequences and create strategies to reduce or at least contain damage. We're getting better at doing this for security. We need to do it with all kinds of failure modes. We need to explore scenarios like "what happens if the Web server goes down?" so we don't have this situation: "One day, one of the credit bureaus' Web servers went down for hours. When Lydian Trust's 'get credit' service tried to make the call, there was no answer. Because the connection to the server was loosely linked, the system didn't know what to do. 'Get credit' hadn't been built to make more than one call. So while it waited for a response, hundreds of loan applications stalled." (Koch, Christopher, "A New Blueprint For The Enterprise," CIO Magazine, May 1, 2005.)

And some of the approach to bug control is through process (as poisonous as that might be to some):

  • have another pair of eyes help detect bugs (e.g., pair programming in XP, design and code reviews, etc.)
  • start testing before any code is written, and test every day from then on!

And some of the approach to bug control is through management discipline:

  • set realistic schedules and don't overestimate what can be done
  • set realistic expectations (up and down the organization)
  • don't press to move to the next iteration (feature, storyline, etc.) if quality is already slipping; call a holt to decide on the severity of the situation and approach to moving forward. Eric Sink's (Why we all sell code with bugs) approach to the fix-and-delay versus release-and-face-the-music decision is pragmatic. But we need to have less of those bugs to decide upon! Because if there are bugs we know about, how many more are lurking behind the cladding?

 

 

Back to Top

 References

Restrictions on use: All material that is copyrighted Bredemeyer Consulting and published on this page and other pages of our site, may be downloaded and printed for personal use. If you wish to quote or paraphrase fragments of our work in another publication or web site, please properly acknowledge us as the source, with appropriate reference to the article or web page used. If you wish to republish any of our work, in any medium, you must get written permission from the lead author. Also, any commercial use must be authorized in writing by Bredemeyer Consulting.

Copyright © 2006-2010 Bredemeyer Consulting
URL: http://www.bredemeyer.com
Page Author: Ruth Malan
Page Created: June 2, 2006
Last Modified: August 04, 2010