Define the Problem, then Resolve It

August 12, 2011

When you or your organization encounters a problem, does everyone start generating solutions? How do you evaluate which one to choose?

Choosing can be difficult, particularly in software development where problems are inherently complex and dynamic – involving both people and technology. But there is way around this.

Albert Einstein is attributed to providing an interesting answer to the following question: “You have one hour to solve a problem; how do you proceed?” His answer was reportedly, “I would spend fifty-five minutes defining the problem, and five minutes finding the solution.”

I looked like a hero a few months ago on a problem by doing just while Einstein suggested. I admit that I had a great leg up; those involved had narrowed the problem to being one of two possibilities. But they had burned up a lot of time throwing solutions at the issue before getting more specific about the nature of the problem.

My first order of business was to determine exactly what was wrong. We were experiencing an issue with one of our web services, and a lot of options had been tried in an effort to "resolve the problem quickly."

To give you some additional background, one of our web services that had been operational for some time suddenly stopped working. No one had performed any upgrades or changes to either our software or the underlying infrastructure. The web service should have been working, but it wasn't.

I suggested that we exercise the database that the web services were writing to using older, client/server software that we had available to us. The code was different than the web services code being used, but if there was a configuration problem at the database level, I pointed out that the problem might reveal itself through the tried and true error-handling that we had built into the older software over the years. (Definition of legacy software: Code from the past, maintained because it works.Michael Feathers)

This is precisely what happened. We received a very clear error message about where the problem originated from, and I did recognize one aspect that was not as well known to others (due to my years of experience with the code base). However, if I wasn't in the room at this point, someone else would have been able to look at the legacy code and they would have understood what was going on.

The solution was easy once the problem was identified, and the lesson is simple: It’s better to know precisely what is broken before throwing fixes at the problem. The time invested in clearly identifying the problem always pays off.

The same goes for process problems. Understand the situation fully and as deeply as possible first and a solution will present itself. And don’t make multiple changes at once! The dynamics will shift with one or two changes. Making too many changes at once is wasteful in two ways:
  1. It is unproductive because multiple changes wastes time and effort.
  2. Fixing a problem by throwing multiple changes will mask the true nature of the issue and what resolved it. There won’t be any learning.
I did violate a general principle: it is better that a leader avoid providing solutions to problems. I chose to handle the problem directly because this was a production issue and the outage had occurred for an entire day. Had I been involved earlier, I would have had the opportunity to coach versus using the direct-and-solve approach that I took.

And even though I directed people towards a resolution, I was completely open and articulated my thought process to everyone involved. I'm confident that those involved filed my approach away for future reference.