Triaging bugs - what's that all about?

Making difficult decisions is (prepare yourself)… difficult. Duh! I’d prefer all my decisions to be easy ones but if I must make a difficult one I’m the kind of guy that believes there is wisdom in numbers. Of course the potential for diminishing returns does exist if you include too many people so it’s not just a numbers thing. It’s really a perspective thing. The more unique perspectives I can consider, the more likely I am to make a good decision. I prefer to make good decisions over bad ones. My dad always said that I had an astounding grasp of the obvious - so I’ve got that going for me which is nice…

In my previous post, I discussed the nature of the difficult decisions we must make when it comes to shipping the next version of Office. If you were wondering how we make those decisions - this post is for you. As you might have guessed from my intro, we believe that the best way is via a group that includes a wide range of perspectives. We call that group process triage.

Personally, I was first exposed to the term triage via the TV show M*A*S*H. I know that I’m dating myself when I say that but I have to be honest – I learned a lot about life from that show. As you probably already know, triage is typically a medical term referring to “the assignment of degrees of urgency to wounds or illnesses to decide the order of treatment of a large number of patients” (that’s the Dictionary talking, not me). I would define it as answering the question “what the heck should we do?”

The concept of triage doesn’t necessarily imply a group decision but for us that’s what it’s all about. Again, it’s not just about sheer numbers. It’s about bringing people together who have the kind of experience that brings passion to their perspective. Here’s a look at who’s included in that group.

First, there’s the feature development team – designers, developers and testers. This is the group that has put the most effort into the feature. Their feature is their baby. They want it to be perfect because no one wants to hear that their kid is ugly. It can be especially hard for this group to be objective, but it’s also the group that has the most extensive knowledge of the code and the potential trade-offs associated with any code changes. These aren’t necessarily the most senior members of triage, but in the Office culture it’s important for the new guy to have a prominent voice to prevent us from becoming stale. Kind of like the chip-clips of the process if you will.

Next, there are our Customer Support Specialists – people who work directly with our customers and have their fingers on the pulse of what they need from our products. They understand enterprise software like no one else in the industry and they make sure we always keep customer needs at the top of the list. Other specialists may also be called in for issues particular to things like international markets, content, security, accessibility, and sustaining just to name a few.

Finally, there are the group managers who have extensive experience with shipping software. They have lived through shipping a quality product in a timely manner many times over so they bring a wealth of experience when it comes to understanding tradeoffs. The triage team is comprised of managers and senior engineers from all of the products in Office, some of who have been working on Office for over 20 years. They bring a maturity of perspective that is invaluable to the process.

Make no mistake about it - different perspectives means differing opinions. I can assure you that these triage meetings are not like a big group hug between friends gathering around the campfire to sing a rousing rendition of Kumbaya. It’s a little more like a verbal wrestling match involving passionate, knowledgeable people who (brace yourself again) believe that their perspective is the right one. Not like you and I would fall into such a trap as that... As a result, it is all but guaranteed that there will be disagreement. Disagreements result in discussions (some might call them heated arguments) that can be very intense and time-consuming, making you wonder if the whole process is worth it. Time and time again we are reminded just how worth it the triage process is. Multiple perspectives are required to make good decisions. Good decisions are required to finish the product well. More often than not, that process will take a good argument or two (hundred…), and at the end of the process, everyone recognizes how valuable those discussions are to the resulting product quality.

There’s no magic formula to triage decisions. There are important details to consider like the severity of the issue and how likely it is to run across the scenario but reducing it to a formula would never work. Every issue has particular customer scenarios behind it involving levels of decision making that require humans to evaluate the issue and make a decision. While not formulaic, we do know this -- making the best decision will always boil down to asking good questions before jumping to hasty responses. As we get close to shipping, the triage group will look at every issue and the proposed change and ask the most basic and important of questions, “does this make sense?” I wish I could get my 9 year old son to ask that question more often…

When the dust settles after each triage discussion/argument, the result should always be the same. When answering the questions “what should we do?” the group should believe that the final decision is the one that makes the most sense. It may mean that one triage member will have to trust that another member’s perspective is the more significant one for the given issue. Trust me, that’s really hard to do but the group helps us to see beyond ourselves and our own perspective. Not necessarily a fun exercise, but undoubtedly a necessary one.

Has the triage process guaranteed that we always make the right decision? Nope. But I guarantee you that it dramatically increases our odds of doing so. We’ll spend thousands of man-hours over the course of any one release in triage because we know that each decision made impacts hundreds of millions of customers. It’s definitely worth it, but I’m glad we don’t have to do it all the time. Oh, but wait…

Once we ship a product, triage is a way of life. In fact, we have to be even more diligent about our decision-making since changes will be introduced not into a pre-release product that customers will be evaluating, but rather an in-market product that customers are using. We’ve even seen cases where fixing a problem causes a customer solution to break because their implemented solution expected the bug to happen, and when it doesn’t, something fails. For reasons such as those, we are committed to being a world-class servicing organization. How and why we do that will be the subject of my next blog. Stay tuned…