Error joining IM/LiveMeeting conferences in Office Communications Server 2007

You may find that all users within an OCS Pool are unable to join a three-way IM conference or join a Live Meeting conference in Office Communications Server 2007.  In examining the logs of the OCS 2007 Front End server, you may see the following related events:

Event Type: Error
Event Source: OCS MCU Infrastructure
Event ID: 61030
User: N/A
Computer: OCS1
Description:
The process RtcHost(5432) did not receive a certificate from the client.

Event Type: Error
Event Source: OCS MCU Infrastructure
Event ID: 61013
User: N/A
Computer: OCS1
Description:
The process DataMCUSvc(2596) failed to send health notifications to the MCU factory at https://OCS1.contoso.com:444/LiveServer/MCUFactory/.
Failure occurrences: 3491, since 3/24/2009 10:05:18 PM.

Event Type: Error
Event Source: OCS MCU Infrastructure
Event ID: 61013
User: N/A
Computer: OCS1
Description:
The process IMMcuSvc(1404) failed to send health notifications to the MCU factory at https://OCS1.contoso.com:444/LiveServer/MCUFactory/.
Failure occurrences: 3491, since 3/24/2009 10:05:03 PM.

Event Type: Error
Event Source: OCS User Services
Event ID: 30988
User: N/A
Computer: OCS1
Description:
Sending C3P request failed. Conferencing functionality will be affected if C3P messages are failing consistently.  Sending the message to https://OCS1.contoso.com:444/LiveServer/MCUFactory/ failed. Error code is 2EFE.
Resolution: Check the destination server to see that it is listening on the same URI and it has certificate configured for MTLS. Other reasons might be network connectivity issues between the two servers.

If you run the Web Conferencing validation wizard from the OCS Pool, you may find the following error in the output log:

MCU Type: meeting
URL: https://OCS1.contoso.com:444/LiveServer/MCUFactory/
HTTP Connectivity Error : ReceiveFailure
HTTP Connectivity Error : Receive failure typically indicates that the connection was closed by
the remote host. This can happen if the remote server does not trust the certificate presented by the
Local Server.

HTTP Connectivity Error : Ensure that the certificate of the local server and remote server are both
valid, have not expired, and contain valid subject name. In addition, ensure that the certificate chain
of both Server(s) are valid. Ensure that the certificate chain of the local server is installed
on the remote server and vice-versa. The most up-to date certificate chain that was used to issue
the server certificate must be present.

 

When you see errors like these, it usually indicates that a certificate-related authentication problem exists with the OCS Pool (or with a particular OCS Front End server).  Most of the time, this turns out to be a problem with the certificate from an issuing Certification Authority.  To troubleshoot this issue, you would typically perform the following steps:

 

1. Log in to the affected OCS 2007 Front End server either locally or remotely using Remote Desktops.

2. Click Start > Run and type in MMC.exe.  Press Enter.  This should launch the Microsoft Management Console.

3. From the menu bar at the top, click File > Add/Remove Snap-in…

      If you are running Windows 2003:

    • Click Add
    • Scroll down the list of available snap-ins and choose Certificates, then click Add
    • Choose Computer Account
    • Choose Local Computer: (the computer this console is running on)
    • Click Finish
    • Click Close
    • Click OK

      image 

      If you are running Windows 2008:

    • Scroll down the list of available-snap-ins and choose Certificates, then click Add
    • Choose Computer account
    • Choose Local Computer: (the computer this console is running on)
    • Click Finish
    • Click Ok

      image

4. Within the Certificates console, expand Personal > Certificates

5. On the right, double-click on the certificate that is assigned to your OCS Pool or Standard Edition server

6. Click on the Certification Path tab

7. Note the name of the issuing Certificate Authority (the previous CA in the list, as highlighted) and the certificate status

      image

8a. If the issuing CA is a Root CA (the top of the list), expand Trusted Root Certification Authorities > Certificates

8b. If the issuing CA is an Intermediate CA (not the top of the list), expand Intermediate Certification Authorities > Certificates

9. From the list of CA certificates, right click on the certificate highlighted in yellow above and choose Properties

10. Under the General tab, verify that Enable all purposes for this certificate is selected (or, if Enable only the following purposes is selected, verify that both Server Authentication and Client Authentication are enabled)

11a. Click OK to close the properties of the CA certificate.

11b. If this was an Intermediate CA certificate, repeat steps 6 through 10 until these settings from all certificates in the trusted certification chain are verified

12.  Close the Certificates Management Console (be sure to restart services if you made any changes)

      image

 

While these steps will resolve this issue 99% of the time, there are no guarantees.  Sometimes you just get burned…

 

See, support engineers in Microsoft CSS are generally at a disadvantage at the outset of every call.  We hone our investigative skills with each call, probing for answers to questions that help us define the scope of a given problem, and digging through piles of debug logs for clues as to why a particular component may be failing to work per specification.  There are three questions that should be asked by a Microsoft engineer during every support call:

  1. Did it ever work?
  2. When did it stop working?
  3. What changes were made in your environment?

Sometimes changes are made within a customer’s environment that result in problems with our products.  Often these turn out to be undocumented changes that result in hours of investigative work and troubleshooting engagements in an effort to resolve the problem.  This is why proper change control and change documentation is critical for the success of an IT infrastructure.  Otherwise, you sometimes can (and will) get burned. 

 

I was burned while troubleshooting this exact issue on March 25, 2009.

 

After troubleshooting this issue for almost two hours (and getting nowhere), I had to hand off the call to one of my colleagues Martin Barron due to a previously scheduled appointment.  I just knew it had to be certificate-related, due to the following errors logged by the OCS Front End server:

TL_ERROR(TF_COMPONENT) [0]0AB0.0F80::03/25/2009-21:33:33.808.00000058 (MCUInfra,ExceptionTracer.WriteLine:1243.idx(26))
An exception 'System.ObjectDisposedException' was thrown while processing '15794481' at '
at System.Net.Security.SslState.ValidateCreateContext(... X509Certificate serverCertificate, X509CertificateCollection clientCertificates...)
at System.Net.TlsStream.ProcessAuthentication(LazyAsyncResult result)
at System.Net.TlsStream.BeginWrite(Byte[] buffer, Int32 offset, Int32 size, AsyncCallback asyncCallback, Object asyncState)
at System.Net.PooledStream.BeginWrite(Byte[] buffer, Int32 offset, Int32 size, AsyncCallback callback, Object state)
at System.Net.ConnectStream.InternalWrite(Boolean async, Byte[] buffer, Int32 offset, Int32 size, AsyncCallback callback, Object state)
at System.Net.ConnectStream.BeginWrite(Byte[] buffer, Int32 offset, Int32 size, AsyncCallback callback, Object state)
at Microsoft.Rtc.Server.McuInfrastructure.HttpRequestContext.BeginWrite(Stream stream, AsyncCallback callback, Object state)
at Microsoft.Rtc.Server.McuInfrastructure.HttpTransport.GetStreamCallback(IAsyncResult asyncResult)'.
Message 'Cannot access a disposed object. Object name: 'SslStream'.'

Upon receiving the call, the first thing Martin did was browse to the web site logged in the error output from the Validation Wizard: 

https://OCS1.contoso.com:444

Although I had previously opened telnet.exe and verified that port 444 was listening, I had not thought to browse to it using a web browser.  Guess what answered?

image

Without the customer’s knowledge, someone had installed the Communicator Web Access component directly on the OCS Front End server, bound to port 444 in IIS (the same port used by the MCU Factory).  This caused all conference escalations to fail with the errors listed above.  Once the CWA component was removed and services were restarted, the issue was resolved.

 

I will not soon forget this one… :-/

 

-- Dave