UAG DA Teredo clients not able to connect to UAG DA during heavy load

Article
02/27/2014

Once again this one is from one of the cases that got escalated to me and it was a very interesting case. I m putting my probing questions that will explain how we narrow down the issue as issue was little misunderstood in the beginning and troubleshooting was going around name resolution not working with Teredo clients.

Probing discussion with UAG Admin to narrow down the problem he was facing.

Questions and respective answers
1. What's the issue with DA(direct Access).

Answer. Client's DA connectivity does not work at the time of issue, when we are using teredo.

2.Does the problem happen when they use Iphttps on all the users

Answer : No

3. Does problem happen only with teredo?
Ans: yes

BUT only when under extreme or heavy loads and not during usual load.

After this discussion, it was clear even teredo works. But due to some reason it breaks under heavy load.

Benefit of putting these questions here in my post, just to point out that effective probing can help you narrow down the issue as they were troubleshooting name resolution and DNS proxy on UAG before I was engaged and direction of troubleshooting was incorrect

So to dig deeper ,I took scenario tracing as below from client

Run following two commands in the command prompt

Netsh trace start scenario=directaccess capture=yes report=yes tracefile=C:\client.etl
Netsh wfp capture start

Then

net stop iphlpsvc (to stop IP helper service)
net start iphlpsvc (to start IP helper service)

to initiate the DA connectivity again.

Then stopped the traces by running following two commands in the command prompt

Netsh wfp capture stop
Netsh trace stop

same steps(minus restarting of iphelper service) took on the server side to collect DA scenario tracing.

Then i checked server side captures and in the (\CabFolder\config\neighbors.txt) for teredo(since as we knew that load happens during high load) checked the number of teredo neighbours.
***************************
Internet Address Physical Address Type
-------------------------------------------- ----------------- -----------
x x x x x x xx x Reachable
x x x x x x xx x Unreachable

x x x x x x xx x Probe

found the number to be greater then 3000 and we know by default this is 256 as per https://technet.microsoft.com/en-us/library/ee844188(v=WS.10).aspx

Note the number in the Neighbor Cache Limit field, which by default is 256.

so we checked this value on the server using command

netsh interface ipv6 show global (on all the nodes)

As expected it was 256 i.e. default

then using following command

netsh interface ipv6 set global neighborcachelimit= Maximum

where maximum could be as per the requirement e.g. 6000, so after we increased this value to a higher value , issue never recurred.

UAG DA Teredo clients not able to connect to UAG DA during heavy load

Additional resources