The case of (un)explained – ConfigMgr Console Slowness After moving to SQL Always ON — Multi Subnet Configuration


Hi there!
Back to you with another installment of the unexplained series.

If you ever get into an issue where things got slow after moving to SQL Always ON like 1 minute delay etc. while doing basic things in console THEN

  1. From the normal route of troubleshooting, checking the provider queries and running it on SQL. You might find that the queries run on a fly when directly run on SQL.
  2. Don’t try to alter the CE LEVEL to 110 or to higher value. I say this because normal operations won’ be affected it would be some queries. But if you wish you can do it and find that it did not help.
So what is happening then? So what do we do here ?

You will be able to reproduce this when you go thru provider. A handy way rather than using WBEMTEST or any other thing when you want to know the time is MEASURE-Command from PS.

vs
In your LAB where it returns almost at 0 seconds

Then you can go thru the normal route of getting the WMI ETL and Dumps around that time to find that the wait time is in connecting to SQL.

We try to use an IP to connect to Listener which may not be in the same SUBNET as the current ACTIVE node.

So how are they configured? And why does it do that way?

They might be configured in a way which is called Multi-Subnet Failover in the SQL Always ON.

Read about it here:

https://docs.microsoft.com/en-us/sql/database-engine/availability-groups/windows/create-or-configure-an-availability-group-listener-sql-server?view=sql-server-2017#MultiSubnetFailover

MultiSubnetFailover Keyword and Associated Features
MultiSubnetFailover is a new connection string keyword used to enable faster failover with Always On Availability Groups and Always On Failover Cluster 
 Instances in SQL Server 2012. The following three sub-features are enabled when MultiSubnetFailover=True is set in connection string:
•	Faster multi-subnet failover to a multi-subnet listener for an Always On Availability Group or Failover Cluster Instances.
•	Faster single subnet failover to a single subnet listener for an Always On Availability Group or Failover Cluster Instances.
o	This feature is used when connecting to a listener that has a single IP in a single subnet. This performs more aggressive TCP connection retries to 
speed up single subnet failovers.
•	Named instance resolution to a multi-subnet Always On Failover Cluster Instance.
o	This is to add named instance resolution support for an Always On Failover Cluster Instances with multiple subnet endpoints

If you open the Listener properties, you will see multi-Subnet right there.

For ConfigMgr – We do not support using MultiSubnetFailover=True  (as of 1810) in the SQL connectivity string, so we cannot take the benefits for this as of now. So provider tries to simply connect to the ListenerName and if it is getting resolved to the *.*.*.88 IP which is in other subnet (which could be blocked from this for SQL port if they have meant it for DR)
we will see the delay until the Provider tries to the correct current Active Node IP.

Is there a way to detect this configuration from ConfigMgr ?

Yes it is indeed.

-- Find the Listener IP associated with the Listener of ConfigMgr AG

select Distinct(ip_address) from sys.availability_group_listener_ip_addresses LIP
INNER JOIN sys.availability_group_listeners LIS ON LIP.listener_id = LIS.listener_id
INNER JOIN sys.availability_replicas REP on LIS.group_id = REP.group_id
INNER JOIN sys.Databases DB ON REP.replica_id = DB.replica_id
INNER JOIN SC_SiteDefinition SSD ON DB.Name = SSD.SQLDatabaseName
INNER JOIN SMSData SD ON SSD.SiteCode = SD.ThisSiteCode

Sample output:

Workarounds

1. Upgrade to ConfigMgr 1902

From ConfigMgr 1902 onwards we have added a registry hook,

On the site system that talks to database,

Under registry key[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\SMS\Identification], set [MSF Enabled] to 1,
with that code will specify 
MultiSubnetFailover=TRUE in the connection string when connecting to SQL.


Note:

In scenarios with Synchronous replica it can cause latency as well depending on the location of the other subnet.
And if we go with asynchronous replica then there is a chance of data loss.

So there is a trade off between this on what exactly works for you. And also to put this on record  Site server HA+ SQL AO as DR strategy is not tested or officially supported per se.


2. Change your Listener configuration to detect the ACTIVE Node IP

Connection strings that do not set MultiSubnetFailover to true

When RegisterAllProvidersIP = 1, any clients whose connection strings do not use MultiSubnetFailover = True, will experience high latency connections. This occurs because these clients attempt connections to all IPs sequentially. In contrast, if RegisterAllProvidersIP is changed to 0, the active IP address is registered in the Client Access Point in the WSFC cluster, reducing latency for legacy clients. Therefore, if you have legacy clients that need to connect to an availability group listener and cannot use the MultiSubnetFailover property, we recommend that you change RegisterAllProvidersIP to 0.

https://docs.microsoft.com/en-us/sql/database-engine/availability-groups/windows/create-or-configure-an-availability-group-listener-sql-server?view=sql-server-2017#RegisterAllProvidersIP

You can query it like this:

get-clusterresource -name AgListnerNameHere | get-clusterparameter

“When you use SQL Server Management Studio, Transact-SQL, or PowerShell to create an availability group listener, the Client Access Point is created in WSFC with the RegisterAllProvidersIP property set to 1 (true).”

So, you can try and see if you set this to 0 if the connection goes through much quicker. I guess using

https://docs.microsoft.com/en-us/powershell/module/failoverclusters/set-clusterparameter?view=win10-ps

Here is an example

get-clusterresource -name AgListnerNameHere | get-clusterparameter


Object         Name                   Value                            Type
------            ----                   -----                            ----
ag2017_listener17 Name                   LISTENER17                       String
ag2017_listener17 DnsName                Listener17                       String
ag2017_listener17 Aliases                                                 String
ag2017_listener17 RemapPipeNames         1                                UInt32
ag2017_listener17 HostRecordTTL          1200                             UInt32
ag2017_listener17 RegisterAllProvidersIP 1                                UInt32
ag2017_listener17 PublishPTRRecords      0                                UInt32
ag2017_listener17 ResourceData           {1, 0, 0, 0...}                  ByteArray
ag2017_listener17 StatusNetBIOS          0                                UInt32
ag2017_listener17 StatusDNS              9002                             UInt32
ag2017_listener17 StatusKerberos         0                                UInt32
ag2017_listener17 CreatingDC             \\PrimaryDC.SQLRepro.edu         String
ag2017_listener17 LastDNSUpdateTime      4/19/2019 8:52:21 PM             DateTime
ag2017_listener17 ObjectGUID             5fde31de2bb9f5448977251512f10ac8 String
ag2017_listener17 DnsSuffix              SQLRepro.edu                     String
ag2017_listener17 ADAware                1                                UInt32

 

PS C:\windows\system32> get-clusterresource -name ag2017_listener17 | set-clusterparameter -name RegisterAllProvidersIP -value 0

WARNING: The properties were stored, but not all changes will take effect until ag2017_listener17 is taken offline and then online again.

 

Hope it helps!

Umair Khan

Support Escalation Engineer | Microsoft System Center Configuration Manager 
Disclaimer: This posting is provided "AS IS" with no warranties and confers no rights.

Skip to main content