TMG services hang at startup due to third party service

 

This post is, once again, about an issue I worked on few days back.  Before I start discussing the issue, and how I resolved it, I would like to outline the objective of this post.

The objective of this post is to make TMG administrators aware of issues like this; and what can be done to resolve them. Discovering the root cause of this issue required a User Mode Dump analysis.

Performing a User Mode Dump analysis requires “Symbol” files (which are private). My goal is not to provide specific instruction on User Mode Dump analysis, but instead to show what kind of information can be gathered, and how it can be used, to help troubleshoot boot-time “service issues” on a TMG server.

For those that are not familiar with dump analysis terms like process, threads and its stack, I will elaborate further as I explain the steps.

Issue:

TMG server admin was rebooting the server and at the time of reboot TMG services were hanging and were not starting. A similar issue was reported pre TMG sp2 but it was fixed post sp2. In this scenario TMG was updated to latest build i.e. TMG sp2 RU2.

Troubleshooting:

Some background: It should be noted that quite a bit of troubleshooting had taken place prior to my involvement in the case. This includes the steps in the following Knowledge Base article:

Forefront Threat Management Gateway 2010 services do not start as expected when the FTMG 2010 servers are in a workgroup array

During startup, the following System Event was logged…

_____________________________________________________________________________________________

Log Name:      System
Source:        Service Control Manager
Date:          09/11/2012 17:42:30
Event ID:      7022
Task Category: None
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      server1
Description:
The Microsoft Forefront TMG Firewall service hung on starting.

Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="Service Control Manager" Guid="{555908d1-a6d7-4695-8e1e-26931d2012f4}" EventSourceName="Service Control Manager" />
    <EventID Qualifiers="49152">7022</EventID>
    <Version>0</Version>
    <Level>2</Level>
    <Task>0</Task>
    <Opcode>0</Opcode>
    <Keywords>0x8080000000000000</Keywords>
    <TimeCreated SystemTime="2012-11-09T17:42:30.378163900Z" />
    <EventRecordID>344470</EventRecordID>
    <Correlation />
    <Execution ProcessID="716" ThreadID="720" />
    <Channel>System</Channel>
    <Computer>server1</Computer>
    <Security />
  </System>
  <EventData>
    <Data Name="param1">Microsoft Forefront TMG Firewall</Data>
  </EventData>
</Event>

_____________________________________________________________________________________________

Data collection:

During the course of troubleshooting we collected a User Mode Dump while reproducing the issue.

User mode dumps collection reference: http://msdn.microsoft.com/en-us/library/ff420662.aspx

Data analysis:

Note: The approach taken in this post is very similar to guidelines given in the following link about debugging a deadlock as we were in a scenario similar to a deadlock: http://msdn.microsoft.com/en-us/library/windows/hardware/ff540592(v=vs.85).aspx

In the dump, I found following critical section was locked :

clip_image001

Note: For more information about critical section and locked critical section, please refer to: http://msdn.microsoft.com/en-us/library/windows/hardware/ff541979(v=vs.85).aspx

Then I located the owning thread of this locked critical section. In following snapshot we can see the stack of this thread. The stack is read from bottom to top. From this call stack it appears that wspsrv (firewall service) is trying to load a filter called XSISAPI. It appears TMG has deferred its filters’ startup until this filter (i.e. XSISAPI)is loaded.

clip_image001

I then checked the module for this filter (i.e. XSISAPI) and found that it’s a filter called “Afaria” from Sybase.

clip_image005

Solution:

We configured the XSISAPI filter service to delayed start. After this change, the TMG services started normally after reboot.

Author:

Suraj Singh:

Security Support Escalation Engineer - MSD Security Team

Reviewer:

Richard Barker

Sr. Security Support Escalation Engineer - MSD Security Team