Issue with A/V Session with Remote or Federated User

I've run into this with a couple of customers recently.  The issue is that when trying to establish a media session, either Audio, Video, or Desktop Sharing, with a remote or federated user the session fails.  With an internal user, everything works fine.  We verified that all of the server settings were correct (Global Settings had the A/V Edge server defined and assigned as the A/V Authentication Service in the pool properties).  The easiest way to troubleshoot what is happening is by looking at the client and Edge Server logs.  We enabled logging in the Communicator client and SIPStack logging on the Edge Server.  We reproduced the issue and took a look at the logs.

The first entry below is the SDP from the initial INVITE send from User A to User B.  These two users are federated with each other.  In this example the 172.16.1.x and the 10.1.2.x networks are "public".  As you can see User A is sending the full candidate list to User B.  It includes the host IP address and the Edge Server's A/V Edge IP address and a port:

Content-Type: application/sdp
Message-Body: v=0
o=- 0 0 IN IP4 172.16.3.122
s=session
c=IN IP4 172.16.3.122
b=CT:99980
t=0 0
m=applicationsharing 1632 TCP/RTP/AVP 127
a=ice-ufrag:R3Ex
a=ice-pwd:F5a5snX+msw7pVmNbtcjA6NL
a=candidate:1 1 TCP-PASS 2120613887 172.16.3.122 11945 typ host
a=candidate:1 2 TCP-PASS 2120613374 172.16.3.122 11945 typ host
a=candidate:2 1 TCP-ACT 2121006591 172.16.3.122 1632 typ host
a=candidate:2 2 TCP-ACT 2121006078 172.16.3.122 1632 typ host
a=candidate:3 1 TCP-PASS 6556159 172.16.1.5 50970 typ relay raddr 172.16.1.5 rport 50970
a=candidate:3 2 TCP-PASS 6556158 172.16.1.5 50970 typ relay raddr 172.16.1.5 rport 50970
a=candidate:4 1 TCP-ACT 7076607 172.16.1.5 50970 typ relay raddr 172.16.1.5 rport 50970
a=candidate:4 2 TCP-ACT 7076094 172.16.1.5 50970 typ relay raddr 172.16.1.5 rport 50970
a=cryptoscale:1 client AES_CM_128_HMAC_SHA1_80 inline:KDP+1zxsJWw98yQj3F2acfd7f0Qj6ds/QkScrGDa|2^31|1:1
a=crypto:2 AES_CM_128_HMAC_SHA1_80 inline:vysmXMPsDCAtpR0wGthkfYANxnY2mxzhyCzbF/pK|2^31|1:1
a=setup:active
a=connection:new
a=rtcp:1632
a=mid:1
a=rtpmap:127 x-data/90000
a=x-applicationsharing-session-id:1
a=x-applicationsharing-role:sharer
a=x-applicationsharing-media-type:rdp

Next is the 200 OK that is sent from User B to User A.  It should include the full candidate list, but it doesn't.  It only includes the host IP address.  This tells me that there might be something wrong with MRAS:

Content-Type: application/sdp
Message-Body: v=0
o=- 0 0 IN IP4 10.1.1.6
s=session
c=IN IP4 10.1.1.6
b=CT:99980
t=0 0
m=applicationsharing 32644 TCP/RTP/SAVP 127
a=ice-ufrag:Y64Q
a=ice-pwd:PPB1tjWybp3ENa/G2eR8HH+9
a=candidate:1 1 TCP-PASS 2120613887 10.1.1.6 19228 typ host
a=candidate:1 2 TCP-PASS 2120613374 10.1.1.6 19228 typ host
a=candidate:2 1 TCP-ACT 2121006591 10.1.1.6 32644 typ host
a=candidate:2 2 TCP-ACT 2121006078 10.1.1.6 32644 typ host
a=crypto:2 AES_CM_128_HMAC_SHA1_80 inline:/UG/qUFgVT2x9cF4XhxaeM/Djsojrf2UGNzyk6V+|2^31|1:1
a=setup:active
a=connection:existing
a=rtcp:32644
a=mid:1
a=rtpmap:127 x-data/90000
a=x-applicationsharing-session-id:1
a=x-applicationsharing-role:viewer
a=x-applicationsharing-media-type:rdp

Since we have the trace from the client, we can see if the client got a response when requesting MRAS credentials.  The client sends this request at logon, so we should be able to see the request and the response.  As you can see in the following entry, User B sends a SERVICE request to the MRAS server that it got through in-band provisioning:

Content-Type: application/msrtc-media-relay-auth+xml
<request requestID="56457248" version="2.0" to="sip:ocsedgeint.dmz.fabrikam.com@fabrikam.com;gruu;opaque=srvr:MRAS:bMcGRlVKg0qaV-CeQYX05gAA" from="sip:wabbott@fabrikam.com" xmlns="https://schemas.microsoft.com/2006/09/sip/mrasp" xmlns:xsi="https://www.w3.org/2001/XMLSchema-instance"><credentialsRequest credentialsRequestID="56457248"><identity>sip:wabbott@fabrikam.com</identity><location>intranet</location><duration>480</duration></credentialsRequest></request>

In the 200 OK the Edge Server responds back with the MRAS credentials for the user.  This includes the username and password as well as the how long the credentials are valid for.  By default this is 8 hours.  More importantly for our troubleshooting, it includes whether the client is internal or external, the FQDN of the Edge Server, and the ports to use:

CONTENT-TYPE: application/msrtc-media-relay-auth+xml
SERVER: RTCC/3.5.0.0 MRAS/2.0
ms-edge-proxy-message-trust: ms-source-type=EdgeProxyGenerated;ms-ep-fqdn=ocsedgeint.dmz.fabrikam.com;ms-source-verified-user=verified

<?xml version="1.0"?>
<response xmlns:xsi="https://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="https://www.w3.org/2001/XMLSchema" requestID="56457248" version="2.0" serverVersion="2.0" to="sip:ocsedgeint.dmz.fabrikam.com@fabrikam.com;gruu;opaque=srvr:MRAS:bMcGRlVKg0qaV-CeQYX05gAA" from="sip:wabbott@fabrikam.com" reasonPhrase="OK" xmlns="https://schemas.microsoft.com/2006/09/sip/mrasp">
<credentialsResponse credentialsRequestID="56457248">
<credentials>
<username>AgAAJHa6+wYBy3Se+6JSSBjv5tOAs3S3dFg8uFItJygAAAAAAqV+swgBroa3jbcDuP7OfPuvUEI=</username>
<password>sVrf1kbPcIJsCX725PyBux3/gcM=</password>
<duration>480</duration>
</credentials>
<mediaRelayList>
<mediaRelay>
<location>intranet</location>
<hostName>ocsedgeint.dmz.fabrikam.com</hostName>
<udpPort>3478</udpPort>
<tcpPort>443</tcpPort>
</mediaRelay>
</mediaRelayList>
</credentialsResponse>
</response>

So we're getting the response that we expect, but User B's candidate list doesn't include all the candidates.  In my experience, this is usually one thing that can show up in a couple of ways.  Normally it's that the client can't contact and receive a response from the Edge Server.  That can be caused by a couple of things.  The first and most likely cause is that the ports that we got back in the MRAS response aren't open.  443/TCP and 3478/UDP need to be open from ANY internal client to the Edge Server internal interface FQDN.  In my experience customers have only opened these ports from the Front-End Servers to the Edge Server, not from any internal client.  In this case the client knows how to contact the Edge Server to request ports for the media session, but can't, so the client can only include its host IP in the candidate list.

The other possibility is that the ports are open in the firewall for any internal client to contact the Edge Server, but the Edge Server doesn't know how to route the response back to the client.  In this case the Edge Server was configured with the internal interface to use static routes.  While there was a static route for the Edge Server to get to the Front-End Servers, it was missing entries for the client subnets.  Adding those additional subnets as static routes allowed the Edge Server to route the response back to the client.

After fixing the issue you can now see in both client's SDP that all of the possible candidates are listed.

SDP for User A:

Content-Type: application/sdp
Message-Body: v=0
o=- 0 0 IN IP4 172.16.3.122
s=session
c=IN IP4 172.16.3.122
b=CT:99980
t=0 0
m=applicationsharing 31834 TCP/RTP/AVP 127
a=ice-ufrag:Mh3y
a=ice-pwd:Gbc6/jKt6GWJZCkzDgGnX+uH
a=candidate:1 1 TCP-PASS 2120613887 172.16.3.122 16292 typ host
a=candidate:1 2 TCP-PASS 2120613374 172.16.3.122 16292 typ host
a=candidate:2 1 TCP-ACT 2121006591 172.16.3.122 31834 typ host
a=candidate:2 2 TCP-ACT 2121006078 172.16.3.122 31834 typ host
a=candidate:3 1 TCP-PASS 6556159 172.16.1.5 59049 typ relay raddr 172.16.1.5 rport 59049
a=candidate:3 2 TCP-PASS 6556158 172.16.1.5 59049 typ relay raddr 172.16.1.5 rport 59049
a=candidate:4 1 TCP-ACT 7076607 172.16.1.5 59049 typ relay raddr 172.16.1.5 rport 59049
a=candidate:4 2 TCP-ACT 7076094 172.16.1.5 59049 typ relay raddr 172.16.1.5 rport 59049
a=cryptoscale:1 client AES_CM_128_HMAC_SHA1_80 inline:eZYclvNadnEO93Y2unhluWyY0a4pdFjMbqIHSft4|2^31|1:1
a=crypto:2 AES_CM_128_HMAC_SHA1_80 inline:UwCu2XbXOqVSqQyTEshQ+zedJ2hIm9rkfl0R9lI9|2^31|1:1
a=setup:active
a=connection:new
a=rtcp:31834
a=mid:1
a=rtpmap:127 x-data/90000
a=x-applicationsharing-session-id:1
a=x-applicationsharing-role:sharer
a=x-applicationsharing-media-type:rdp

SDP for User B:

Content-Type: application/sdp
Message-Body: v=0
o=- 0 0 IN IP4 10.1.1.6
s=session
c=IN IP4 10.1.1.6
b=CT:99980
t=0 0
m=applicationsharing 23624 TCP/RTP/SAVP 127
a=ice-ufrag:PZzq
a=ice-pwd:Ob1pB/PSA7ZAvE0E9ZoMeq8y
a=candidate:1 1 TCP-PASS 2120613887 10.1.1.6 23077 typ host
a=candidate:1 2 TCP-PASS 2120613374 10.1.1.6 23077 typ host
a=candidate:2 1 TCP-ACT 2121006591 10.1.1.6 23624 typ host
a=candidate:2 2 TCP-ACT 2121006078 10.1.1.6 23624 typ host
a=candidate:3 1 TCP-PASS 6556159 10.1.2.4 59020 typ relay raddr 10.1.2.4 rport 59020
a=candidate:3 2 TCP-PASS 6556158 10.1.2.4 59020 typ relay raddr 10.1.2.4 rport 59020
a=candidate:4 1 TCP-ACT 7076607 10.1.2.4 59020 typ relay raddr 10.1.2.4 rport 59020
a=candidate:4 2 TCP-ACT 7076094 10.1.2.4 59020 typ relay raddr 10.1.2.4 rport 59020
a=crypto:2 AES_CM_128_HMAC_SHA1_80 inline:3N1pOzr1dd58l6pkcLpn+tJReG9DOnsHDhXMHmK7|2^31|1:1
a=setup:active
a=connection:existing
a=rtcp:23624
a=mid:1
a=rtpmap:127 x-data/90000
a=x-applicationsharing-session-id:1
a=x-applicationsharing-role:viewer
a=x-applicationsharing-media-type:rdp

This time instead of the session failing to establish because the media connectivity checks failed, you can see that User A will send another INVITE, this time with the best candidate option chosen:

Content-Type: application/sdp
Message-Body: v=0
o=- 0 0 IN IP4 172.16.1.5
s=session
c=IN IP4 172.16.1.5
b=CT:99980
t=0 0
m=applicationsharing 59049 TCP/RTP/SAVP 127
a=ice-ufrag:R3Ex
a=ice-pwd:F5a5snX+msw7pVmNbtcjA6NL
a=candidate:4 1 TCP-ACT 7076607 172.16.1.5 59049 typ relay raddr 172.16.1.5 rport 59049
a=candidate:4 2 TCP-ACT 7076094 172.16.1.5 59049 typ relay raddr 172.16.1.5 rport 59049
a=crypto:2 AES_CM_128_HMAC_SHA1_80 inline:vysmXMPsDCAtpR0wGthkfYANxnY2mxzhyCzbF/pK|2^31|1:1
a=remote-candidates:1 10.1.2.4 59020 2 10.1.2.4 59020
a=setup:active
a=connection:existing
a=rtcp:59049
a=mid:1
a=rtpmap:127 x-data/90000
a=x-applicationsharing-session-id:1
a=x-applicationsharing-role:sharer
a=x-applicationsharing-media-type:rdp

User B will do the same in the 200 OK:

Content-Type: application/sdp
Message-Body: v=0
o=- 0 0 IN IP4 10.1.2.4
s=session
c=IN IP4 10.1.2.4
b=CT:99980
t=0 0
m=applicationsharing 59020 TCP/RTP/SAVP 127
a=ice-ufrag:PZzq
a=ice-pwd:Ob1pB/PSA7ZAvE0E9ZoMeq8y
a=candidate:3 1 TCP-PASS 6556159 10.1.2.4 59020 typ relay raddr 10.1.2.4 rport 59020
a=candidate:3 2 TCP-PASS 6556158 10.1.2.4 59020 typ relay raddr 10.1.2.4 rport 59020
a=crypto:2 AES_CM_128_HMAC_SHA1_80 inline:3N1pOzr1dd58l6pkcLpn+tJReG9DOnsHDhXMHmK7|2^31|1:1
a=remote-candidates:1 172.16.1.5 59049 2 172.16.1.5 59049
a=setup:passive
a=connection:existing
a=rtcp:59020
a=mid:1
a=rtpmap:127 x-data/90000
a=x-applicationsharing-session-id:1
a=x-applicationsharing-role:viewer
a=x-applicationsharing-media-type:rdp

At this point the session is established and working.

The important thing to remember in troubleshooting these issues is to make sure that the required ports are open and that you can actually connect to the required ports.  As always, logging and Snooper are your best friend!