The Artima Developer Community
Sponsored Link

Computing Thoughts
Seven Steps to Disaster
by Bruce Eckel
April 17, 2009
Summary
Retrospectives allow you to analyze what worked and what didn't about a project so you can do better in the future. How often, though, do people have either the resources or the wherewithall to do a retrospective on a complete disaster -- arguably where we'll learn the most?

Advertisement

I'm reading Outliers: The Story of Success by Malcom Gladwell, which is yet another of his the-world-is-not-how-you've-always-thought-it-was books. I love these kinds of insights not just because it makes the world fresher with possibility, but because it's possible that, by destroying some of my assumptions, I might discover a new way of doing things.

This isn't how I came upon Open Spaces conferences, for example. The structure was already there, and we (myself and Martin Fowler -- primarily, of course, stimulated by Martin's bigger way of thinking about things) only had to do the experiment. But this was a scary thing to try -- as anyone who has organized an open spaces event knows, thinking that everyone is just going to show up and put up interesting discussions on the board, and that all will go well ... it's an act of faith.

If you take the leap, the result of Open Spaces is nothing short of magical. It goes against everything you know, and yet it's better than any conference you've ever experienced.

After doing something like this, you begin to wonder "what other assumptions am I carrying around, thinking they are hard facts, that might be disassembled to produce equally astounding (and satisfying) results?" Which is why I find Gladwell and similar new-thinkers to be so intriguing. The high concept of Outliers is buried in a paragraph towards the back of the book: "The world could be so much richer than the one we have settled for."

In chapter seven (page 184), he makes a fascinating observation about disasters, in the context of airliner crashes:

The typical accident involves seven consecutive human errors. One of the pilots does something wrong that by itself is not a problem. Then one of them makes another error on top of that, which combined with the first error still does not amount to catastrophe. But then they make a third error on top of that, and then another and another and another and another, and it is the combination of all those errors that leads to disaster.

These seven errors, furthermore, are rarely problems of knowledge or flying skill. It's not that the pilot has to negotiate some critical technical maneuver and fails. The kinds of errors that cause plane crashes are invariably errors of teamwork and communication. One pilot knows something important and somehow doesn't tell the other pilot. One pilot does something wrong, and the other pilot doesn't catch the error. A tricky situation needs to be resolved through a complex series of steps -- and somehow the pilots fail to coordinate and miss one of them.

Doesn't this sound like software development disasters? Indeed, one of the maxim's of Gerald Weinberg's Secrets of Consulting is Things are the way they are because they got that way ... one logical step at a time. By themselves, each individual decision seems logical and reasonable, within the small context in which it was made. Each decision by itself doesn't cause the disaster. It's the accumulation of errors that does it.

Not just the accumulation of errors, but the lack of review that might notice that there's a problem. This is a different kind of faith than the one required to try Open Spaces. This kind of faith says:

Let's not waste time looking for problems (that's negative thinking, after all). Let's just plow forward and hope that, this time, everything will come out OK all by itself. Even though we have no historical experience that this can happen, let's just hope that it will. After all, if it does magically go that way, we'll have saved all that time and money we would have otherwise wasted doing reviews and pair programming and any of the other processes designed to evaluate our work.

When I began writing, I had this same kind of ideal in mind. I would sit down and the words would just magically flow from my fingertips, and lay themselves onto the page in wonderful order. I believed, hard, in the idea of inspired creativity. Moreso, I felt that anything else was rather corrupt -- if you weren't in the the creative flow, then there was something wrong with the result.

There are certainly periods of creative flow, and the results are inspired. Or at least, while you're writing, it seems that way. After it's rested for awhile and you look at it again, you find places where it can be improved. With experience, you discover that everything can always be improved, and even that it's never that good until you've gone over it a few times. As many writers are fond of saying "writing is rewriting."

Yet the myth persists of the inspired flow of words: Jack Kerouac writing "On the Road" in 24 hours. We seem drawn to the apparently magical overnight successes -- everyone fixates on Bill Gates doing a quick deal to produce MSDOS and not on all the time and effort he put in before and after that moment.

That's a fundamental point that Gladwell makes in Outliers: People who are really good at something, who seem to have "overnight success," in fact have put in about 10,000 hours of deep immersion in the topic that has apparently produced this miracle. It's not a miracle at all, but rather something that has come as a result of a tremendous amount of focused effort.

I think the problem is that software is infused with magical properties. Look at what we can create with it -- magical worlds (games) that make more money than the movie industry (the previous king of magical worlds). Killer apps that make the creators fortunes, apparently overnight. This is our mythology, so when we start trying to build a piece of software, it's natural that we want to assume that such magic will apply to us (even if, as Gladwell shows, this magic doesn't exist), and that we don't need to use any silly review practices. Such (costly, time-consuming) things are for negative thinkers, not success-oriented go-getters.

An especially damaging aspect of this approach is that review is not part of the culture of an organization. In the worst cases, everyone is taught to believe that each piece of code "belongs to" the person who wrote it, and it's not OK at all to make comments about other people's work. This is a perfect setup for disaster, because you'll never know until after the fact that anything is wrong. At least airline cockpit crews are supposed to make suggestions when things are going wrong -- if you have a corporate culture that says "each programmer is an isolated silo" then you'll guarantee that there will be no clues before or after the project is toppling.

I've talked before about the negative characteristics that a team member can have. Indeed, it's tricky just to put together a neutral team, one without any toxic characteristics. But even harder is creating a positive team, and I would say the one feature that determines whether a team is "positive" is that vague, overused word "communication." This doesn't mean documentation (although that, done right, is important), or having meetings or sending out memos. It means the way that people are able to talk to each other. In short, does pointing out a potential flaw constitute a disastrous breach of protocol, or is it just part of a conversation? Gladwell introduces the concept of the Power-Distance Index (PDI) when talking about plane crashes -- if the cockpit crew comes from a culture where there is a high PDI, it means that it is very hard for a subordinate to question the pilot or suggest that something is wrong. This causes accidents.

The knife-edge in the software world is intelligence. In a negative software culture, the designers and architects are gods who cannot be questioned -- here we have a high PDI, and questions are seen as challenges to authority and power, and are thus actively discouraged. The assumption is that the "leaders" are so smart they don't make mistakes (an idea promoted by those same "leaders"), and so we don't need to look for mistakes -- indeed, it's a waste of time and resources to do so.

In a positive software culture with a low PDI, everyone's input is valid. From the most experienced to the least experienced, everyone makes mistakes, so the question is not whether mistakes exist, but how do we discover and minimize them? We know disasters happen so we are on the lookout to capture and record risk factors. And the leaders put their effort not into showing how infallible they are, but in appreciating input that helps them continue to learn and understand. This culture not only produces software more reliably, it develops better software engineers and better leaders.

"Communication" in a positive software culture means that everyone knows that anyone can make a mistake (so it's not a terrible thing) and anyone kind find a mistake (so it's not a big deal). Most importantly, there's no way to prevent mistakes; they are just part of the world of software development. So we can't just hope they don't happen, we have to actively work with the knowledge that they are always there. We must track and test against them, and every time we learn something, everyone on the team learns that thing and we all get better because of it.

This brings up two important questions:

  1. Is it possible to convert a team with a negative software culture into one with a positive software culture?

  2. If so, how?

Seven weeks ago I broke my leg skiing. In prior weeks, I had been thinking that it was time to take another few lessons, to learn to ski in better control. That is, to get some review and direction in my skiing style. I knew I was skiing out of control, but it just seemed like I could go a bit faster, and even a little bit faster, etc. I had enough control to be able to fly down the mountain and I seemed to be able to do that and get away with it.

The failure, when it happened, was a series of minor incidents, ones that I had survived unscathed many times before. But this time, combined with the fact that I had become used to skiing slightly out of control, these incidents caused a small disaster that I shall continue recovering from in weeks to come. Although I was a functional skier, on my path of slightly-out-of-control behiavior, it was statistically likely that I would eventually combine the wrong mistakes and have a disaster. I "only" broke a bone (didn't tear anything) so I can learn from my mistake without suffering too much from it.

Do people only change after trauma? The answer seems to be "yes, mostly." It's also possible to change through inspiration, but that appears to be the rare exception. Usually you have to get bashed about before you'll think about changing.

I've had two types of consulting clients. The first, and most common, appears when something is starting to go wrong, or has already gone wrong. Project failures seem to go through the "stages of grief" (hilariously depicted here), usually getting stuck at "denial." Bringing a consultant into a failing project is typically either a last-ditch attempt to save the project (by imagining that the consultant has magical powers) or the beginning of the process of assigning blame. Neither case is much fun.

But sometimes a client will be smart enough to understand how bad the odds of success are, and attempt to inject goodness into the project early on, while there's still a chance of it making a positive difference. These are the good consulting jobs.

And the very best jobs are when the client has already taken steps towards success (for example, by choosing a powerful, dynamic language like Python) and you're brought in to amplify those steps (rather than trying to motivate them in the first place). This is where the fun happens. This is where I want to live.

All these questions I keep asking are really the same: "Is change possible?" The answer, of course, is "Yes... but it's unlikely." Even the traumatized would prefer to think of it as a bad dream, and to go back to the way things were, blissfully thinking that everything is working OK. To quote Weinberg again: "People hate change. They really, really hate change."

Look at the structure of stories. The main character is pushed out of their comfort zone by the "inciting incident" (whatever happens to get the story rolling). But the only logical thing for that character to do is to try to return to their original comfort zone. Otherwise the character is unbelievable -- obviously they weren't comfortable if they didn't try to return to that zone, so why weren't they already trying to change? Doesn't make sense.

Except in the world, it does. You may hate your cubicle, but you keep going back to it. If you get kicked out, you try to get back to it or to another cubicle somewhere else. Cubicles are the devil you know, and that culture collects other people who are equally willing to resist change to keep out the uncertainty of the unknown. It's our nature to keep to certainty, and put up with rather a lot of pain to do so.

In the face of human propensity towards sameness, how do we introduce change?

  1. The change must fit the ability of the team to deal with change. A consultant that shows up and says "you need to adopt (my favorite flavor) of Agile" does not fit my definition of a consultant. That's really more what a trainer does, so the client has already made the hard decision. But a good trainer figures out how much of the training the team is ready for. Trying to shove a one-size-fits-all seminar into a team regardless of what they can handle is a recipe for unhappiness (I learned this the hard way). It it better to introduce something small that makes a difference.

  2. The change must be introduced subtly, so as not to engage the limbic system. Years ago, I took a workshop on Kaizen with Robert Maurer. This is "personal kaizen" rather than what you'll normally find about industrial process improvement, but because of that it seems a better fit for software teams. My biggest takeaway from the workshop was this: your "primitive brain" is easily frightened and always scanning for changes (Maurer used the example of monkeys fleeing for the trees when a leopard jumps out). The main trick of "personal kaizen" is to introduce changes that are so un-threatening that your primitive brain laughs. For example, I was having trouble flossing. Saying "I'm going to floss three times a day" was a big, frightening change, so the primitive brain goes into flight mode. But I could safely commit to "I'm going to floss once a month" without causing any reaction (the dentist in our group was thrilled). And each time I did floss, it turned out not to be such a big deal, so over time I became a much more frequent flosser.

  3. The force for change must be steady, to overcome the natural tendency to return to the original resting state. As noted earlier, our impulse when disturbed is to return to our comfort zone (even if it's not really comfortable, but only familiar). So if a consultant shows up and introduces a new technique and then leaves, it seems interesting in theory "but we have work to do" and so it's just too easy to return to what you always do. Even if the consultant helps you restructure your project for the new technique, the minute there's some pressure and you don't know what to do, you'll fall back into your old ways. Some consulting firms solve this by providing consultants nonstop for the duration of the project -- while probably effective, this seems like expensive overkill (and a common pattern seems to start with primary consultants, later substituted with secondary consultants at the same fee). A more moderate and economic solution is repeated consulting visits to provide regular course corrections. This has the added benefit that you're more likely to be able to work with the same primary consultant for each visit, which provides better continuity.

Talk Back!

Have an opinion? Readers have already posted 1 comment about this weblog entry. Why not add yours?

RSS Feed

If you'd like to be notified whenever Bruce Eckel adds a new entry to his weblog, subscribe to his RSS feed.

About the Blogger

Bruce Eckel (www.BruceEckel.com) provides development assistance in Python with user interfaces in Flex. He is the author of Thinking in Java (Prentice-Hall, 1998, 2nd Edition, 2000, 3rd Edition, 2003, 4th Edition, 2005), the Hands-On Java Seminar CD ROM (available on the Web site), Thinking in C++ (PH 1995; 2nd edition 2000, Volume 2 with Chuck Allison, 2003), C++ Inside & Out (Osborne/McGraw-Hill 1993), among others. He's given hundreds of presentations throughout the world, published over 150 articles in numerous magazines, was a founding member of the ANSI/ISO C++ committee and speaks regularly at conferences.

This weblog entry is Copyright © 2009 Bruce Eckel. All rights reserved.

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use