Recently, I come across a particular scenario where Get-csPoolUpgradeReadinessState was showing as READY and Front-End Services were started across all Front-Ends, but the TotalActiveFrontEnds showed a number that was different from the total active Front-Ends in the Pool.
You will notice that UpgradeDomain3 has 1 Front-End Server associated, but then the Total Active Front-Ends is Zero. You will also notice that that the total Front-Ends ( in summary) only shows a 2 Active Front-Ends Servers.
Interestingly, Get-csPoolFabricState was not throwing any errors or warnings !!!
To troubleshoot the issue, we started by First checking, if the Front-End Server was failed-over and so we tried to Failback, but to our surprise, the server was not in a failed-over state, and hence Failback was not working ( expected).
Next we started investigating by checking Windows Fabric Logs from C:\Program Data\Windows Fabric\Logs and then running a CLS Logging using a scenario called PowerShell.
In the plain-text log, we noticed the following
TL_WARN(TF_HADR) [LYNCPOOL01\LYNCENT03]8554.13B2C::06/18/2018-23:57:49.112.0000200D (PowerShell,FrontEndState.ReadPerfCounters:poolupgradereadinessstate.cs(568)) (000000000261B13F)FE LYNCENT03.contoso.com is not connected to Fabric Pool Manager according to perf counter.
Based on this we decided to follow a blog entry, Get-CsPoolUpgradeReadinessState showing NOT READY or BUSY and found that the server LYNCENT03.contoso.com was indeed missing the permissions for RTC Server Local Group
So we first added the Local Group
And then updated the permissions to Full Control, and rebooted the server. Once the server was back online and services were running, we noticed that the output for Get-csPoolUpgradeReadinessState was showing Total Active Front-Ends as 3
Attention to detail is indeed important when patching a pool with multiple servers, to ensure that the pools are reporting healthy, when indeed, there could be an issue with one or more servers reporting it’s state.