Troubleshooting Series - Part 1 - First steps

I had the privilege of attending a session at the MS Sharepoint 2008 Conference covering how MS Support troubleshoots MOSS issues. The session really hit home on how depending on a person’s experience and confidence, they will either make lots of changes hoping to get things fixed or they will do a lot of analysis and end up taking too long to find a solution.

Below is the initial process that I follow when someone asks me for help in solving a problem. Understanding that each issue/problem is unique, this first post of the Troubleshooting Series will cover only the first steps, which is the same regardless of the issue. Later posts in the series will focus on specific problems or areas that are unique to an issue based on the results of this first analysis stage.

hope this helps

wayne

Preliminary Analysis

1. Before starting anything, understand the problem at hand. This includes:

a. What exactly is the issue – the more detailed the info, the more effective you will be at quickly solving the issue

b. What is the urgency of the problem? For example, is it a:

                                          i. Break/fix (i.e. is it an urgent issue where you need to drop everything to fix)

                                         ii. Roadblock (i.e. is it causing a work stoppage that doesn't need to be solved right away)

                                        iii. Nice to have (someone is trying to do something new or is looking for some consulting type of advice)

c. Who exactly is affected or impacted by this issue

                                          i. How many people are having this problem or are affected by the issue

                                         ii. Where are they located? Locally or somewhere remote?

d. What is the timeline of the issue

                                          i. When did the issue start happening

                                         ii. What was done so far to troubleshoot this

                                        iii. How can I repro the issue

 

2. 2. Once you have the facts, put it in perspective. In other words, make sure everyone involved is on the same page:

a. Confirm everyone is clear what the problem is

b. Confirm everyone is clear what the desired end result is and when it is needed by

c. Don't feel like you have to know all the answers. Get help if needed (i.e. from other end users, coworkers, management, MS Support, etc…)

3. Make a plan of action. At this stage, you know what the issue is, you’ve put it in perspective and can start working on resolving it. It is important to factor in your personal life here so that you are setting realisitc expectations that are acheivable.

Preliminary Investigation

1. Try to repro the issue or watch the user repro the issue. With XP, Vista, Live Meeting, Communicator, etc…, you can share a session with anyone in the world, anytime.

2. Even if you can’t repro the issue, you can start ruling things out. I’ve noticed that everything usually ends up into one of these categories:

a. Knowledge/Perception

b. Client computer

c. Front End Server

d. SQL

e. Network

f. Active Directory

g. Sharepoint

h. IIS

i. Farm Topology

 

3. 3. Keep track of what you see and do and at what times. The better notes you take on the work you’ve done, the easier it is to get help from others, write a post mortem and document it for future reference. One thing I learned from the MS Support session at the conference was to document everything, even the steps that didn’t work.

Resolving the issue:

Most people like to jump right to this step, however they end up wasting time looking at the wrong things or in the wrong places. In the next parts of the series I'll break down how I tell which category is the problem and how I drill down deeper to get to the root cause.