Columns
i. Architecture
Teams
ii. Architectural
Requirements
iii. Minimalist
Architecture
Related
White Papers
i. Software
Architecture (.pdf)
ii. Visual
Architecting Process (.pdf)
For more on Architecture
Action Guides see
i. Downloads
ii.
Action Guide book
draft
iii. Software Architecture
Workshop
For more on
Architecting see
Architecting
Process
Book
List
|
Project Wipe-out: Big Failures
Architects help manage risk. We
learn from mistakes, so the best mistakes to learn from are someone
else's! Here are some big failures to learn from.
System faults and failures--some examples that made the news
Software and systems project failure:
-
Standish Group's
Chaos Report
-
Standish: Project Success Rates Improved Over 10 Years,
SoftwareMag.com, Jan 15, 2004.
-
2010 IT Project Success Rates, Scott Ambler, August 02, 2010
[added 8/4/10]
-
Imagination, process failures doom software projects, by David Worthington,
November 2, 2009
-
Why projects fail? It's all the business' fault, Matt Deacon
-
Failure of
imagination, wikipedia
-
Software Project Failure: The Reasons, The Costs, Carmine Mangione
-
Software
Hall of Shame, Robert N. Charette, IEEE Spectrum,
September 2005.
-
Failure
Rate, summarizing statistics on IT project failures, by IT
Cortex.
-
Failure Examples,
collected by IT Cortex.
- Kmart:
Code Blue
by David F. Carr and Edward Cone, Baseline,
November/December 2001.
-
McDonald's: McBusted by Larry Barrett and Sean Gallagher,
Baseline, July 2, 2003.
-
Oops!
Ford and Oracle mega-software project crumbles by Patricia Keefe,
ADTMag, November 11, 2004.
- FBI:
Who Killed the
Virtual Case File?, IEEE Spectrum, September 2005.
Software system failure:
-
Who Needs Hackers, John Schwartz, The New York Times
-- recommended!
-
Top 25 programming errors in software security, Ellen Messmer
-
Why Software Fails, Robert Charette
-
Big IT Projects Fail Worldwide, podcast by Robert Charette
-
System
Resonance and the Stock Market, Grady Booch
-
Intermittent Behavior and
Intermittent Behavior Revisited, Grady Booch
-
Television for Software Engineers, Dan Prichett
-
The RISKS Digest:
ACM Forum on Risks to the Public in Computers and Related Systems.
Volume 24, Issue
29, May 26, 2006.
-
History's Worst Software Bugs, Simson Garfinkel,
Wired,
November 08, 2005.
-
Software
Horror Stories, Nachum Deshowitz, Tel Aviv University
-
Software failure cited in August blackout investigation,
Computerworld, November 20, 2004.
-
Oxford
Health Plans Case Study, Robert N. Charette, IEEE
Spectrum, September 2005.
-
Teched-out Cars Bug Drivers, by Julia Sheers,
Wired, June
29, 2004.
-
Glitch in iTunes deletes drives, by Farhad Manjoo,
Wired,
November 05, 2001.
Classic examples of system failures
Bugs, Slugs and Work-arounds
Learning from failure:
-
Learning from
Software Failure, IEEE Spectrum, September 2005.
-
Why Software Fails,
Robert N. Charette, IEEE Spectrum, September 2005.
-
Failure
Causes, summarizing statistics on causes of IT project failures,
by IT Cortex.
-
To Engineer is Human: The Role of Failure in Successful Design,
by Henry Petroski, 1992.
-
Success through Failure: The Paradox of Design, by Henry
Petroski, Princeton University Press, 2006.
-
Fail
Early, Fail Often, Jeff Atwood, May 1, 2006
Research papers
Medical Devices:
Robert Wears, and Richard Cook,
Automation, Interaction, Complexity and Failure
NASA
database: D.R. Kuhn, D.R. Wallace, A.J. Gallo, Jr., Software
Fault Interactions and Implications for Software Testing,
IEEE Trans. on Software Engineering, vol. 30, no. 6, June, 2004.
Learning from failure
Funny fail
From Ruth Malan's Journal
By analogy
What to do
-
Steve McConnell's "Classic
Mistakes Enumerated" is a well-researched treatise on mistakes to
avoid on software projects.
-
2010 CWE/SANS Top
25 Most Dangerous Software Errors --
recommended!
-
Common Weakness
Enumeration
-
Assessing the Odds of Catastrophe; see also
unknown unkown,
wikipedia; see also
Dilbert 6/26/10
-
The Rugged Software Manifesto.
See also:
'Rugged Manifesto' promotes secure coding NetworkWorld, 2/28/10, and
The
Rugged Software Manifesto, Vikas Hazrati, InfoQ, June 22, 2010
-
Architecture Reviews, Grady Booch, ♫IEEE
Software on architecture #25, July 2010
Gems and Keepers
The most precious gem found on this
mining expedition:
"...a club that began in 1945 when engineers
found a moth in Panel F, Relay #70 of the Harvard Mark II system.The
computer was running a test of its multiplier and adder when the
engineers noticed something was wrong. The moth was trapped, removed and
taped into the computer's logbook with the words: 'first actual case of
a bug being found.'" from
History's Worst Software Bugs, Simson Garfinkel, Wired,
Nov 08, 2005.
I might keep this one around for when
my kids are teenagers:
"Good judgment comes from
experience, and experience comes from bad judgment." Barry
LePatner
And this is a good one for a
chuckle in workshops:
"None of us is as dumb as all of
us."
Jeff Atwood
These are neat too:
"The nicest thing about not
planning is that failure comes as a complete surprise and is
not preceded by a period of worry and depression." - Sir
John Harvey-Jones
"Not only are a system’s
desired operating modes influenced by its architecture, but
so are some of its failure modes. Thus an architecture that
permits only one path between elements may fail if a leg of
any path breaks. All of a tree below a broken node is
isolated from the rest of the tree."
-- Edward Crawley, Olivier de Weck,
Steven Eppinger, Christopher Magee, Joel Moses, Warren
Seering, Joel Schindall, David Wallace and Daniel Whitney,
"The
Influence of Architecture in Engineering Systems,"
MIT esd, March 2004
"Success breeds complacency. Complacency breeds failure.
Only the paranoid survive." -- Andrew Grove
“If automobiles had followed the same development cycle as
the computer, a Rolls-Royce would today cost $100, get a
million miles per gallon, and explode once a year, killing
everyone inside” -- Robert Cringely
Editorial Comment on Software
System Failures
These horror stories (I can't let them
enter my mind when I'm
driving my
car!), neglect several
points of note:
- software is everywhere, and most of
the time, we don't even notice it. That means it is doing its job
well—most of the time.
- small software failures do not make
the news. They may add up to big losses for our economy and for our
businesses, but they are under the radar.
Even when these failures cost nothing
more than hours, it is a big drain on productivity, and a high stress
burden. The gain is high. The pain is sporadic, and usually hits below
our pain-versus-gain tolerance threshold and we grumble, but put up
with it.
Still, ignoring the pain is not a very
resourceful way to approach our future health!
Software development is a
complex, human endeavor. Even the very smartest, best people make
mistakes. It is a hard truth, but bugs are hard to eliminate—entirely. Some of the approach to bug control is
architectural:
- reduce complexity: separation of
concerns, partitioning the problem, encapsulation, ...
- bug elimination: use proven parts
where available; insist parts come with test suites and bug reports;
design tests for the interactions among parts; ...
- bug damage control: identify failure
modes and failure consequences and create strategies to reduce or at
least contain damage. We're getting better at doing this for
security. We need to do it with all kinds of failure modes. We need
to explore scenarios like "what happens if the Web server goes
down?" so we don't have this situation:
"One day, one of
the credit bureaus' Web servers went down for hours. When Lydian
Trust's 'get credit' service tried to make the call, there was no
answer. Because the connection to the server was loosely linked, the
system didn't know what to do. 'Get credit' hadn't been built to
make more than one call. So while it waited for a response, hundreds
of loan applications stalled." (Koch, Christopher, "A
New Blueprint For The Enterprise," CIO Magazine, May 1, 2005.)
And some of the approach to bug control
is through process (as poisonous as that might be to some):
- have another pair of eyes help
detect bugs (e.g., pair programming in XP, design and code reviews,
etc.)
- start testing before any code is
written, and test every day from then on!
And some of the approach to bug control
is through management discipline:
- set realistic schedules and don't
overestimate what can be done
- set realistic expectations (up and
down the organization)
- don't press to move to the next
iteration (feature, storyline, etc.) if quality is already slipping;
call a holt to decide on the severity of the situation and approach
to moving forward. Eric Sink's (Why
we all sell code with bugs) approach to the fix-and-delay
versus release-and-face-the-music decision is pragmatic. But we need to
have less of those bugs to decide upon! Because if there are bugs we
know about, how many more are lurking behind the cladding?
|
"Good judgment comes from experience, and experience comes from bad judgment." Barry LePatner
And this is a good one for a chuckle in workshops:
"None of us is as dumb as all of us." Jeff Atwood
These are neat too:
"The nicest thing about not planning is that failure comes as a complete surprise and is not preceded by a period of worry and depression." - Sir John Harvey-Jones
"Not only are a system’s desired operating modes influenced by its architecture, but so are some of its failure modes. Thus an architecture that permits only one path between elements may fail if a leg of any path breaks. All of a tree below a broken node is isolated from the rest of the tree."
-- Edward Crawley, Olivier de Weck, Steven Eppinger, Christopher Magee, Joel Moses, Warren Seering, Joel Schindall, David Wallace and Daniel Whitney, "The Influence of Architecture in Engineering Systems,"
MIT esd, March 2004
"Success breeds complacency. Complacency breeds failure. Only the paranoid survive." -- Andrew Grove
“If automobiles had followed the same development cycle as the computer, a Rolls-Royce would today cost $100, get a million miles per gallon, and explode once a year, killing everyone inside” -- Robert Cringely