Microsoft Message Queuing (MSMQ) plays an important role in the Microsoft Lync Server 2010 Monitoring/Archiving server infrastructure: in a distributed network environment, MSMQ is used to transmit data from agents located on other servers (such as Front End Servers) to Monitoring/Archiving servers. The purpose of this article is to help you discover the root cause of any MSMQ problems that you might encounter, and to provide suggested ways to fix those problems.
Authors: Weiming Shen and Xu Liu
Publication date: April 2011
Product version: Lync Server 2010
Microsoft Message Queuing (MSMQ) plays an important role in the Microsoft Lync Server 2010 Monitoring/Archiving server infrastructure: in a distributed network environment, MSMQ is used to transmit data from agents located on other servers (such as Front End Servers) to Monitoring/Archiving servers.
Note. If you are not familiar with how MSQM works with the Lync Server Monitoring/Archiving servers, take a look at the article Lync Server 2010 Monitoring/Archiving MSQM Reliability Enhancements for more information.
Based on customer feedback, we know that people occasionally encounter MSMQ-related problems when deploying, operating, or maintaining their Monitoring/Archiving servers. The purpose of this article is to help you discover the root cause of any MSMQ problems that you might encounter, and to provide suggested ways to fix those problems. This article addresses the following issues:
- Monitoring/Archiving server deployment failures
- Messages accumulating in the outgoing queue on Front End Servers
- Messages accumulating in the dead letter queue
- Messages accumulating in the target queue
- Problems recorded in the Windows event logs
Monitoring/Archiving server deployment failures
If you encounter failures when deploying Monitoring/Archiving server, information in the deployment guide for these features might suggest that the failure was caused by problems with MSMQ. If so, the first thing you should do is check the following items to see if the problems are due to an improper installation of MSQM or to issues with MSQM permission settings.
1. Ensure that MSMQ is installed and running.
Microsoft Message Queuing must be installed and ready for use before you deploy Monitoring/Archiving server. To verify that MSMQ has been installed, do the following:
a. Click Start and the click Run. In the Run dialog box, type ServerManager.msc and then press ENTER.
b. In Server Manager, expand the Features node. Verify that Message Queuing is listed as one of the installed features. If it is not, click Add Features, then, in the Add Features Wizard, select Message Queuing and click Next to begin the installation of MSMQ.
2. Ensure that MSMQ is configured for Directory Services integration.
Another common issue that occurs during the deployment of Monitoring/Archiving server is that MSMQ is installed but has not been configured to use Directory Services integration. (If you simply clicked the Next buttons when installing MSMQ then Microsoft Message Queuing will be installed in workgroup mode.) Monitoring/Archiving server requires Directory Services integration mode, because the service uses public queues rather than private queues for message delivery. (This a major change from Microsoft Office Communications Server 2007 R2.) To verify that you are running in Directory Services integration mode, click the Features node in Server Manager. In the Features Summary list, you should see Directory Service Integration listed under Message Queuing. If you do not, click Add Features. In the Add Features Wizard, expand Message Queuing, expand Message Queuing Services, and then select Directory Service Integration. Click Next to finish configuring MSMQ.
3. Verify that MSQM permissions have been set correctly.
Permission issues are most likely to occur if you are trying to deploy Monitoring/Archiving server with an account that does not have Administrator permissions. For example, when Monitoring/Archiving server is deployed, Setup must create a public queue on the local computer (a process that requires access to Active Directory). If you do not have the required permissions to create the queue on the local computer, or if you do not have the permissions needed to access Active Directory, setup will fail. We won't discuss MSMQ permission settings in any detail in this article; for more information see the article Security Considerations for Message Queuing. However, as a quick check we suggest that you do the following, using the same account you used when you tried to deploy Monitoring/Archiving server:
a. Using the procedures outlined in steps 1 and 2, verify that MSMQ is running and that it has been configured for Directory Services integration.
b. Verify that you can view the properties of the MSMQ messaging queues. In Server Manager, expand the Features node and then expand the Message Queuing node. Click on each of the nodes listed under Message Queuing and verify that you can view the properties of those nodes.
c. Verify that you can create a public queue. To do this, follow the procedure in step b, right-click the Public Queues node, point to New, and then click Public Queue. In the New Object – Public Queue dialog box, enter a name for your new queue, select the Transaction checkbox, and then click OK.
If you are able to complete all three of these steps then your inability to deploy Monitoring/Archiving server is probably not due to MSMQ permission issues.
Messages accumulating in the outgoing queue on Front End Servers
MSMQ uses the outgoing queue on local computers (such as Front End Servers) as a place to temporarily store messages before those messages are sent to the target queue. From time-to-time a small number of messages might accumulate in the outgoing queue; typically that is not a problem. However, if you have a large number of messages that sit in the queue for a long time that means that you have a delivery problem. In turn, that usually means a problem with the target queue or with the network connection.
If messages are accumulating in the outgoing queue the first thing you should do is verify that the target queue exists. In Server Manager, under the Message Queuing node, click the Outgoing Queues node. In the Outgoing Queues pane, you will see information about all of your MSMQ outgoing queues. Verify that your target queue appears in that list of queues.
If the target queue does exist, the next step is to verify that the appropriate permissions have been added to the target queue. If correctly configured, the deployment process should have assigned the local group RTC Server Local Group the following permissions on each data queue:
- Get Properties
- Get Permissions
- Send Message
To verify that the correct permissions have been applied to the data queues, start Server Manager on the Monitoring/Archiving server. Under the Outgoing Queues node, right-click each individual node and then click Properties. In the Properties dialog box, on the Security tab, verify that the appropriate permissions have been given to RTC Server Local Group:
If the permissions are correct then your problem might lie in the network connection between the Front End Server and the target queue. If you have a WAN (wide area network) connection then connection issues are likely to occur from time-to-time. If the WAN connection is down for a relatively short amount of time, then messages will automatically be sent from the outgoing queue to the target queue as soon as the connection is restored. If the connection is down for a longer time (for example, 3 to 4 hours), messages will automatically be moved to the dead letter queue. After the connection is restored, those messages will then be sent to the target queue.
Messages accumulating in the dead letter queue
When the Front End service is initially started on a computer, a dead letter queue for archiving and Call Detail Recording is automatically created on that Front End Server. Dead letter queues are used to temporarily store messages that cannot be delivered or processed in a timely fashion. If everything is working properly all messages in the dead letter queue will eventually be delivered to and processed by the Monitoring/Archiving server.
If a large number of messages have accumulated in the dead letter queue, that could be an indication of one of the following problems:
- The network connection between the Front End Server and the Monitoring/Archiving server is down.
- The Archiving service or the Call Detail Recording service is down, or the Monitoring/Archiving server is unavailable.
- The Front End Server does not have the permissions needed to send messages to the target queue.
- Authentication between the Front End Server and the Monitoring/Archiving server has failed.
After the problem is resolved, the MSMQ agent running on the Front End Server should automatically begin sending letters from the dead letter queue to the target queue.
Messages accumulating in the target queue
If the MSMQ service is working correctly you typically will not see messages accumulating in the target queue. If you do see a large number of messages in the target queue then you should:
- Make sure the required services are running.
- Check to see how many messages are being processed each second.
- Verify that messages are being logged into the correct data store.
Problems with the SQL store can prevent the target queue from being able to receive and process messages. You can use the performance counter LS:CDR Service – 01 – READ to see how many messages are being picked up from the queue and how many operations have been aborted:
You can also use the LS:CDR Service – 02 – WRITE performance counter to determine how many messages have actually been written to the SQL store:
You should also check permissions to verify that the Archiving/CDR service is able to read from the queues. In order for that to happen, the local group RTC Component Local Group must have the following permissions on the target queue:
- Receive Message
- Peek Message
- Get Properties
- Set properties
- Get Permissions
It is also possible that a heavy workload is preventing the Monitoring/Archiving server from processing all the messages it receives. If so, you will see that messages are being taken from the queue and written to the database, but the target queue is still accumulating messages; that's because the queue is receiving new messages faster than it can process the old messages. In this case, we suggest you try to improve database performance either by upgrading the computer hardware or by tuning the database to achieve better throughput. In the future, we'll write an article that discusses ways to tune the Monitoring/Archiving databases to achieve better throughput.
Problems recorded in the Windows event logs
In the following sections we'll discuss the actions you might need to take based on events recorded in the Windows event logs.
1. Create queue failure events.
At runtime, Monitoring/Archiving server creates two kinds of queues: ACK queues (also known as administration queues) and dead letter queues. A Create queue failure event will be recorded if either if these queue types cannot be created.
Create queue failures are typically due to an improper MSQM installation or to permission issues on the computer running the MSQM agent (for example, the Front End Server). To resolve either of these issues, follow the steps outlined in the Monitoring/Archiving server deployment failures section of this document. That will help ensure that MSQM and the required MSQM permissions have been correctly configured on the Front End Server.
2. Open queue failure events.
Monitoring/Archiving server relies on MSQM to transmit data between agents and services. Because of this, it is critical that all MSMQ queues are properly opened on both the Front End Servers and the Monitoring/Archiving server. In the case of an Open queue failure error, the first thing you should do is verify that the MSQM service is running on all your Front End Servers as well as on your Monitoring/Archiving server. If this is the first time you have encountered an Open queue failure event you should also double-check to ensure that the MSMQ service has been configured for Active Directory integration.
If you continue to experience Open queue failure events, verify that the account the MSMQ service is running under has the permissions needed to open the queue.
3. Queue path not set events.
The Queue path not set event typically shows up if you have enabled a feature (Archiving, Call Detail Recording, or QoE) but have not deployed the corresponding server in your Lync Server topology. In a case like that, the MSMQ agents running on the Front End Server do not know where to deliver messages. As a result, all MSMQ messages will be dropped until the appropriate server has been deployed.
4. Message not queued events.
The Message not queued event occurs when NACK (negative acknowledgement) messages are received on the queue. Typically, the event description will include the name of the acknowledgement class received as well as detailed information about the problem.
NACK messages can be triggered by certain administrative actions (for example, purging messages from the target queue). Alternatively, NACK messages can also be sent in response to connection or authentication failures, or problems with message encryption. For more information on the acknowledgement class, see the article Acknowledgement Message Classes.
5. MSMQ quota warning events.
MSMQ quota warning events are designed to alert administrators to the fact that the Front End Server has used up most of its disk space allotted for MSQM messages. If disk space usage reaches 100%, the system will not be able to process new messages until sufficient disk space has been cleared.
By default, 1 GB of disk space is reserved for MSQM messages. If you need additional quota space for MSQM, right-click the Message Queuing node in Server Manager and then click Properties. In the Message Queuing Properties dialog box, on the General tab, type the desired quota size into the Limit message storage to (KB). The default value is 1048576 kilobytes.
You can also change the warning level threshold. By default, a warning is issued when you have used 95% of your allotted disk space. To allow additional time to address the issue, you can change this to a lower value (for example, 60%). To change the warning level threshold, modify the registry value HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\RTCSRV\Parameters\QuotaHighThreshold.
6. Message expiration for delivery or processing events.
Message expiration events occur any time a server or network is unavailable; these events simply inform the administrator that message delivery and/or processing is not in a healthy state. After the problem has been fixed (for example, after the network connection has been restored), the MSMQ agent will report that the healthy state has been recovered. Note, however, that it might take some time before the healthy state recovered message is recorded.
7. Processing dead letter failure events.
Processing dead letter failure events report on failures that occurred when the system attempted to resend messages from the dead letter queue; this event indicates that something has blocked the resending messages found in the dead letter queue.
Dead letter failures can be due to a number of different problems, most of which will be reported in other event log events. For example, it's possible that the dead letter queue could not be opened; if so, there should be a previous event recorded in the event log that tells you this. The Error Code reported in the dead letter failure event can also be useful in pinpointing the root cause of the failure.
One issue that will not be reported by dead letter failures is this one: no healthy queue can be found on the Front End Server (perhaps because of maintenance, a network outage, or a similar problem). The fact that this problem is not reported is by design; the assumption is that the dead letters will be sent as soon as the computer and network are available. Note, too, that dead letters are not lost if this event occurs. If the dead letters cannot be sent those letters will remain in the queue, and will be resent when the underlying problem has been fixed.
Lync Server Resources
- Lync Server 2010 documentation in the TechNet Library
- DrRez blog
- Lync Server and Communications Server resources