UAG DA Teredo clients not able to connect to UAG DA during heavy load

Once again this one is from one of the cases that got escalated to me and it was a very interesting case. I m putting my probing questions that will explain how we narrow down the issue as issue was little misunderstood in the beginning and troubleshooting was going around name resolution not working with Teredo clients.

Probing discussion with UAG Admin to narrow down the problem he was facing.

Questions and respective answers
1. What's the issue with DA(direct Access).

Answer.  Client's DA connectivity does not work at the time of issue, when we are using teredo.

2.Does the problem happen when they use Iphttps on all the users

Answer : No

3. Does  problem happen only with teredo?
Ans: yes

 BUT only when under extreme or heavy loads and not during usual load.

After this discussion, it was clear even teredo works. But due to some reason it breaks under heavy load.

Benefit of putting these questions here in my post, just  to point out that effective probing can help you narrow down the issue as they were troubleshooting name resolution  and DNS proxy on UAG  before I was engaged and direction of troubleshooting was incorrect

 

So to dig deeper ,I took scenario tracing as below from client

Run following two commands in the command prompt

  •  Netsh trace start scenario=directaccess capture=yes report=yes tracefile=C:\client.etl
  •  Netsh wfp capture start

 

Then 

  • net stop iphlpsvc (to stop IP helper service)
  • net start iphlpsvc (to start IP helper service) 

 to initiate the DA connectivity again.

 

Then stopped the traces by running following two commands in the command prompt

  •  Netsh wfp capture stop
  •  Netsh trace stop

 

same steps(minus restarting of iphelper service) took on the server side to collect DA scenario tracing.

Then i checked server side captures and in the (\CabFolder\config\neighbors.txt)  for teredo(since as we knew that load happens during high load) checked the number of teredo neighbours.
***************************
Internet Address                              Physical Address   Type
--------------------------------------------  -----------------  -----------
x x x x x x xx x Reachable
x x x x x x xx x Unreachable

x x x x x x xx x  Probe

found the number to be greater then 3000  and we know by default this is 256 as per  https://technet.microsoft.com/en-us/library/ee844188(v=WS.10).aspx

Note the number in the Neighbor Cache Limit field, which by default is 256.

so we checked this value on the server using command

netsh interface ipv6 show global (on all the nodes)

As  expected it was 256 i.e. default

then using following command

netsh interface ipv6 set global neighborcachelimit= Maximum

where maximum could be as per the requirement e.g. 6000, so after we increased this value to a higher value , issue never recurred.