Analyzing Connectivity for Office Communications Server 2007 R2 Audio/Video Sessions: Part 1

This article is the first part of a two-part series. This part provides you with an overview of how to analyze and troubleshoot problems with audio/video sessions when using Microsoft Office Communications Server 2007 R2, Microsoft Office Communicator 2007 R2, and Microsoft Office Live Meeting 2007 by providing you with a look into the following:

  • Session Initiation Protocol (SIP) signaling and Session Description Protocol (SDP) is used to provision the components needed for media communications
  • How audio and video codecs utilize media information
  • How the real-time transport protocol (RTP) manages audio/video communications on your network

Part 2 of Analyzing Connectivity for Office Communications Server 2007 R2 Audio Video sessions introduces you to using the Pre Call Diagnostic Tool to analyze problems with audio video communications.

Author: Mike Adkins

Publication date: October 2010

Product versions: Microsoft Office Communications Server 2007 R2, Microsoft Office Communicator 2007 R2, and Microsoft Office Live Meeting 2007

The Office Communications Server 2007 R2 clients, Office Communicator 2007 R2, and the Live Meeting 2007 client all provide their user’s with the ability to enjoy audio and video conferencing that is hosted by a Office Communications Server 2007 R2 Audio Video (A/V) server. However, sometimes the clients may provide their users with a degraded A/V experience. Poor A/V quality may be caused by network conditions that impair the delivery of the A/V data streams that are shared between the peers. Understanding some of the basics about how audio and video sessions are managed for the peers can lead to a more straightforward resolution for these types of issues.

This article points out the types of network connectivity that are required to establish the inter-unified communications (UC) client A/V session, how RTP is designed to use adaptive measures to help ensure the quality of A/V playback, and how to use the Communications Server 2007 R2 Resource Kit Pre Call Diagnostic Tool to analyze network connectivity issues that may affect the users of the client A/V experience.

SIP Signaling

The initiation of an A/V conference by a client begins with a series of SIP requests and responses that provide the exchange of security, media port, and supported A/V codec information that is used by the clients and the Communications Server A/V server during an A/V conference. This initial communication is known as SIP signaling, which requires Transmission Control Protocol/Internet Protocol (TCP/IP) connectivity between the clients, the Communications Server A/V server, and the internal edge of the Communications Server A/V Edge Server.

Securing the Audio/Video Session

The SIP signaling procedure that is used to initiate an A/V conference provides the parameters that are needed to secure the communication between the peers that are joined during the conference. These SIP communications require that the Communications Server 2007 R2 A/V server (audio/video multi-party conferencing) and the clients (peer to peer audio/video communication) can make the necessary Transmission Control Protocol (TCP) and User Datagram Protocol (UDP) connections to the Media Relay Service (RTCMEDIARELAY) and Media Relay Authentication Service (MRAS) and is hosted on the Communications Server 2007 R2 A/V Edge Server. Figure 1 and Figure 2 show what the SIP signaling information should look like.

Note   The following information was taken from the Communicator 2007 R2 client (communicator-uccapi.uccplog) that initiated the A/V session with another internal Communicator 2007 R2 peer. For brevity, only the Session Description Protocol (SDP) information from the SIP SERVICE Request packet is shown.

The SIP SERVICE request for MRAS that is routed to the Communications Server 2007 R2 A/V server as shown in Figure 1

Figure 1. SIP SERVICE Request for MRAS

The SIP SERVICE Response for MRAS that is routed to the Communications Server 2007 R2 A/V server is shown in Figure 2.

Figure 2. SIP SERVICE Response for MRAS

The information in green in Figure 2 (<mediaRelayList></mediaRelayList>) demonstrates the need for specific TCP and UDP port connectivity between the internal Communicator 2007 R2 clients and the Communications Server 2007 R2 Audio Video/Edge Server.

For more information, see Firewalls for Office Communications Server 2007 R2 about the following:

  • Internal firewall configuration that is needed to support Communications Server 2007 R2 A/V communications
  • Deploying the Edge server to your perimeter

Establishing the Audio/Video Session

After the security requirement for the A/V session has been fulfilled, it is the responsibility of the peers to provide each other with their media connectivity address information and their list of available audio codec and video codec information. The Interactive Connectivity Establishment (ICE) client prioritizes the media connection information by using UDP as the preferred candidate’s transport for the client media connection. Because RTP can use TCP as well as UDP, the ICE client adds TCP as an additional transport option for the client’s media connection to the preferred candidates list, but with a lower priority than the UDP transport. Providing the options for both TCP and UDP transports helps to ensure a media connection between the Communications Server 2007 R2 audio and video endpoints on networks where TCP is the only routable transport protocol. The ICE client advertises separate preferred candidate pair connection and codec information for audio and video sessions. This is because the difference in the technologies that are used by audio and video codecs.

Note   The following information was taken from the Communicator 2007 R2 client (communicator-uccapi.uccplog) that initiated the A/V session with another internal Communicator 2007 R2 peer. For brevity, only a limited amount of the SDP information from the SIP INVITE Request packet that creates the A/V session is shown.

The SIP INVITE information, if successful, is acknowledged with a 200 OK response from the Communicator 2007 R2 peer for the A/V session. The responding Communicator 2007 R2 peer provides the requesting peer with the media connectivity information that it prefers to use for its media connection. This allows both clients to have knowledge of its peer(s) as shown in Figure 3 and Figure 4:

  • Preferred IP address and UDP or TCP port information for the media connections
  • Audio codec and video codec information

Figure 3. SIP Invite Request for audio

Figure 4. SIP INVITE request for video

For more information about using Office Communicator logging to help analyze SIP signaling issues with A/V sessions, see Client Logging in Communicator.

How RTP Manages Media Communications

Audio and video communications for the clients requires the support of RTP. It uses a dynamic feature set that complies with audio and video codec definitions to manage a consistent media stream between all endpoints in an A/V conference on a computer network. This section of the article provides some definitions of the RTP’s header information that is used to define the parameters for coordinating each audio/video session.

RTP Sequence

As packets are prepared and sent from the sender, they are labeled with a sequence number so that the receiver can identify the packet order and determine if a packet was lost or received out of order. The numbers are assigned sequentially by the sender, but the starting value is always random.

RTP Time Stamp

When an audio payload is added to the RTP packet, a time stamp is added to the sample so that the delay between packets can be calculated by the receiver. The Sender cannot know how long transit latency will be, or what order the receiver will be receiving the packets in. The receiving device will use the time stamp value to build a buffer for consistent replay of the audio stream in the order that it was sent.

The starting time stamp value for RTP packets delivering an audio payload is a randomly generated number. It is incremented by the size of the audio payload sample in each of the subsequent RTP packets for a specific audio session. For more information, see the “RTP Audio” section of this article.

The starting timestamp value for RTP packets delivering a video payload is a randomly generated number. However, the timestamp value that is used manage the playback of the RTP packets video payload are parameterized to meet the requirements of the RTP video stream. For more detailed information see the “RTP Video” section of this article.

SSRC Synchronization Source Identifier

The Synchronization Source (SSRC) ID is a randomly generated number that is added as a header to each RTP packet for each independent media session that is generated by a client or a server. It’s possible to have multiple media sessions associated with the same client; the RTP packets SSRC value is used to identify these separate media sessions.

Real-Time Control Protocol

The real-time control protocol (RTCP) provides per-session management for each audio or video session that uses RTP. RTCP provides all the endpoints that are joined in the media session with information that allows them to take adaptive measures, which corrects the flow of RTP packets to endpoints on the network. This flow-control mechanism is managed through the use of RTCP Sender and Receiver reports. These reports are delivered to all the endpoints that are joined in a media session to help ensure the consistent delivery of the media stream.

RTCP Sender Report

The RTCP Sender report provides detailed information about the following:

  • Round-trip time
  • Network latency
  • Packet counts
  • Network Time Protocol (NTP)
  • RTP time stamps that are used for calculating dynamic jitter buffer settings
  • SSRC value that identifies each separate media session

Each time a client or server initiates an audio or video session, a Sender report is sent (on a periodic basis) to all the peers who are receiving streaming media from that client or server.

RTCP Receiver Report

The RTCP Receiver report includes information that is similar to the Sender report. It contains information such as fractional packet loss, jitter, an NTP time stamp of the sender reports, and the SSRC value that is specific to the media session. This information can be used by the sender to make adjustments to the way the sender shapes and sends packets to the receivers on the network. Each client that has been receiving streaming media from its peers sends a periodic Receiver report to those peers.

RTP Video

Figure 5 shows the content of an Ethernet frame that contains the RTP information that defines the video portion of the A/V session. Analysis of this RTP packet provides us with detailed information about how our video session is being managed and how RTP helps provide a consistent delivery of the RTP audio packets.

Figure 5. Ethernet Frame containing RPT information that defines the video portion of the A/V session

There are a few substantial differences between the encoding and decoding of the audio and video streams that are defined by the design of the individual codecs that are used. These differences are reflected in the RTP traffic that is used for each stream. Following are RTP headers that are used by both audio streams and video streams. Notice that an RTP video stream manages its time-stamp process differently than an RTP audio stream. An RTP time-stamp frequency of 90,000 Hz is typically used with video codecs as follows:

  • The RTP time stamp encodes the sampling instant of the video image that is contained in the RTP data packet. If a video image occupies more than one packet, the time stamp is the same on all those packets. Packets from different video images have different time stamps.
  • The last packet of an RTP video frame should have the marker bit set to the number one (1). All other frames are by default set to zero (0). The marker bit set to one (1) is a clear indicator that the frame is complete. There is no reason to wait for another frame before processing and displaying the current in-buffer frame.

RTP Audio

Figure 6 shows the contents of an Ethernet frame that contains the RTP information that defines the audio session that is described in the “SIP Invite Request for Audio” section of this article. The analysis of this RTP packet provides us with more information about how our audio session is being handled. The information should further clarify how RTP provides a consistent delivery of audio packets to the peers that are involved in the A/V session.

Figure 6. Ethernet frame that contains RPT information that defines the audio session

There are a variety of audio codecs on the market that are designed to be used with specific audio applications. As noted in the “SIP Invite Request for Audio” section of this article, we can see that the Windows client operating systems, which host the client, have a list of approximately eight audio codecs. This variety of audio codecs helps ensure that the Windows client operating systems can participate in audio conferences that are hosted by different audio-enabled clients and servers.

Audio codecs use one of two methods for determining the interval for the RTP time stamp-sample or frame. The following example describes the use of the RT Audio codec, which uses the sample method. The audio codec used in this example functions at a sample rate of 16 kHz, and the playback duration (ptime) of one packet is 20 ms. This particular correlation increments the time stamp value by 320.

The following shows a simple way to calculate the time-stamp value and the frames per second for RTP traffic that has a ptime value of 20 ms.

Hz = 16000
R = ptime (20 ms)
Y = Packet size
X = Packets per second

Y = (Hz *R) or (16000 * .02) = 320 bits or 40 bytes
X = (Hz/Y) or (16000/320) = 50 packets per second

Now let’s dig deeper into the RTP information that we have available through our network capture as we saw in Figure 6.

Frame: Number = 1294
– Rtp: PayloadType = Audio, Codec: RT Audio, ClockRate: 16000, P-Times: 20,40,60, Channels: 1, SSRC = 1502982158, Seq = 25779, TimeStamp = 2803428209

Locating the next RTP packet by its sequence number allows us to determine the difference between each RTP packet’s time-stamp values. This allows us to know the ptime that is currently being used for our audio session.

Frame: Number = 1296
Udp: SrcPort = 13561, DstPort = 17707, Length = 102
– Rtp: PayloadType = Audio, Codec: RT Audio, ClockRate: 16000, P-Times: 20,40,60, Channels: 1, SSRC = 1502982158, Seq = 25780, TimeStamp = 2803428529

Collecting the time stamps from each of the packets and taking the difference of 2803428529 – 2803428209 = 320, which means we have a ptime of 20 ms.

X = ptime
>X = 320/16000
X = .02 (20 milliseconds)

The PayloadType field in Frame 1296 shows that we have P-Times 20,40,60 available for dynamic use with the RT Audio codec. The use of multiple ptime values throughout an audio session provides RTP with the flexibility to adjust the payload of the audio RTP packets to help ensure consistency in buffered playback.


Jitter is a variation in packet transit delay. The typical causes of jitter are queuing, contention, and serialization effects on the path through a network. Higher bandwidth networks tend to have less jitter; slower networks tend to have more congestion and therefore more jitter.

Advanced Time-Warping Jitter Buffer

The advanced time-warping jitter buffer dynamically adjusts the audio play-out speed to optimize both quality and latency under network jitter as a function of the actual jitter conditions. The dynamic capabilities of the time-warping jitter buffer minimizes the buffering impact on latency in low jitter conditions, and then smoothly transitions to and from high jitter conditions. This is down by varying the buffer length and the playing speed in a manner that is barely noticeable to the listener.

Legacy Jitter Buffer

To compensate for delays in RTP packet delivery and to ensure a smooth reconstruction of the audio stream, a legacy jitter buffer is constructed. Its size is calculated by adding the average packet delivery delay into the jitter buffer. The difference between the time-stamp values that those two frames contain represents the size of the jitter buffer as follows.

Buffer Size = Frame Y timestamp – Frame X timestamp

Our previous packet capture example gave two time stamps that resulted in a ptime value: 2803428529 – 2803428209 = 320 or 20 ms.

Unfortunately, legacy jitter buffers introduce an incremental delay, which can negatively impact the audio playback experience. Legacy jitter buffers typically contain about 20 to 40 ms of voice. Values of jitter in excess of the buffer length result in packets being discarded.

Note   The jitter buffer is a separate function that is introduced by the audio codec, which is designed to process an RTP audio stream. The jitter buffer itself is not defined by RTP.

For additional information about analyzing RTP traffic on your network and tools to help solve issues with that have poor A/V performance, see Troubleshooting Network-Related Voice Quality Issues.


Successfully troubleshooting A/V communications between peers is based on the following:

  • How the clients use the components that build an A/V session between peers on a network
  • How to use the available tools to help define the different types of connectivity failures that lead to A/V communications issues

This article is intended to provide you with an understanding of how to use the tools that Microsoft provides to identify some of the causes of A/V communication failures that take place on a Office Communications Server 2007 R2 network.

Additional Information

To learn more, check out the following information:

Note: To review communicator-uccpapi.uccplog on the local Windows client, install the Office Communications Server 2007 R2 Resource Kit by using the instructions in the Snooper Tool article.

Communications Server Resources

Comments (1)

  1. Anonymous says:

    The above analysis is done automatically, when you use the