IAG 2007 drops connectivity intermitently due to network pool limit

I came across an interesting case recently on IAG 2007 where symptoms were bit different but the rootcause apperently seems to be the same.

The symptoms I came across are as follows:

In this scnerio the number of users getting connected to IAG server is around 250 plus and IAG seems to be dropping connections intermittently. So the users who connected earlier in the day before we reached the limit of 250 connections , they stay on the server and when it reaches the ~ > 250 users , it starts dropping. This is the experince I have seen on portal trunk.

Another scenario that I came across was on Active Synch trunk where number of users were 1000 and then reaching the  ~ >1000 users on this trunk IAG starts dropping random connections and intermittently. Intermittent issues are quite tricky to repro and debug and they are very time consuming to dientify as problem is not there always and you have to wait for the right oppertunity. I looked through different layers of the product to understand as if its a perfomrance limit, network monitor /wireshark doesnt give much information on it, even took ISA level tracing to no avail. Rebooting the server ractifies the issue temporarily but issue bounces back when these trunks reach those numbers.

After some more indepth investigation and taking the debug trace of the IAG filter I noticed the major IAG filter construct throwing the following error:

CExtECBQueue: ERROR: Unable to allocate an additional 1 entities, this would max out the pool. Current status: Total = 501, Available = 0, Used = 501

After doing some debugging around this error and source code reviews I figured out what does this construct is grumbling about.

here is the interface on IAG console that controls this setting:

 So now this could explain the number 501 from the pool. My second cusotmer had this to 1000 for his Active Synch trunk and he was crossing the 1000 limit. Interesting bit is on average browser (User -Agent) usually establishes two TCP connections so 250 users mulitply by 2.

To fix this behaviour I changed the value to double the number of users that are expected to connect to this trunk. It could vary depending on the User-Agent. You also need to do Start --. run --> cmd --> iisrest <ENTER> and then apply the change to IAG configuration by hitting the apply button on the console.

to me these sort of issues pop up due to lack of planning and testing before product is deployed. Defaults are never acceptable but such settings should be kept under review always with more traffic influx. in UAG I have noticed this setting has moved from this interface to Advance trunk <Genreal> tab and its set to 10000 by default.

You dont have to accpet the limits I am referring here as they are unique in each environment but if you run in to similar issue ensure you do see the excpetion in filter trace before you tweak the limits I referred as they are very specific.