[UPDATE] Microsoft officially released a new version of AAD Connect, see the Resolution section.
This time I was involved in troubleshooting an AAD Connect (Azure Active Directory Connect) environment that for some strange reason stop to sync data from the on-premise environment to Azure AD.
The customer has two Azure AD Connect on Windows Server 2012R2, one is the active server that sync the data whit the two environments and one for staging.
Each Azure AD Connect server is connected to a SQL Server named instance on a Failover cluster.
Let me show you a diagram of the environment:
The customer notice that the Azure AD Connect servers, do not sync anymore with Azure Active Directory.
If the customer reboots the AAD Servers the problem disappears, but he wants to identify the root cause.
By reading the Application event log from the two Azure AD Connect Server, I was able to identify immediately the cause, because the event log is full of this events:
Log Name: Application
Event ID: 6322
Task Category: Server
The server encountered an error because the connection to SQL Server failed.
Log Name: Application
Source: Directory Synchronization
Event ID: 905
Task Category: None
Scheduler::SchedulerThreadMain : An error occured and scheduler run failed to perform all operation.
System.Management.Automation.CmdletInvocationException: A connection to SQL Server could not be established
So, the solution is easy, because the SQL was down or there was a network connection issue, I thought.
But this kind of error events was started 20 days ago, and they are repeating multiple times.
By reading the event log on the SQL Server cluster, I was able to identify this event:
Log Name: System
Event ID: 43
Task Category: Windows Update Agent
Installation Started: Windows has started installing the following update: SQL Server 2012 Service Pack XXX
So, 20 days ago the customer has updated the SQL Cluster, and let me say that this is a normal operation and a good thing, but in this date the customer has made the failover of the SQL instances after the installation of the service pack, just to verify that all is working well:
But why after 20 days of the failover the Azure AD Connect didn't reestablished the connection with the SQL instances? That is the point!
By default, the Azure AD Connect try to sync the on-premise environment with Azure AD every 30 min, so for sure every 30 min the service need a connection with the SQL Instance and in 20 days how many 30 min we have? I think too much!
REPRODUCE THE CUSTOMER PROBLEM
I have built a lab with the same exact version of AAD Connect of the customer 1.1.654.0 and at the end of all the tests, those are my conclusion:
If I switch the SQL Instances, the AAD Connect never reestablish the connection with SQL, but with this exception:
- Restarting the "Microsoft Azure AD Sync" service, reopen the connection with SQL.
- Opening the "Syncronization Service Manager" console, reopen the connection with SQL.
Execute the powershell cmdlet "Get-ADsyncRule" , reopen the connection with SQL.
To view if the connection is reestablished I have used the simple NETSTAT command.
So, this is an issue related on the version of Azure AD Connect 1.1.654.0.
Recently, Microsoft has released to public a new version of Azure AD Connect that fix some issues:
- Fix timing window on background tasks for Partition Filtering page when
- Fix timing window on background tasks for Partition Filtering page when switching to next page.
- Fixed a bug that caused Access violation during the ConfigDB custom action
- Fixed a bug to recover from SQL connection timeout.
- Fixed a bug where certificates with SAN wildcards failed a prerequisite check
- Fixed a bug which causes miiserver.exe to crash during an Azure AD connector export.
- Fixed a bug which bad password attempt logged on DC when running the Azure AD Connect wizard to change configuration
I have installed this new version in my lab, and if I make a failover of the SQL instance, the AAD Connect lose the connection with SQL, and in the event log I can see the previous event 6322, but if I wait the next sync interval (by default 30 min) the connection is automatically reestablished.
If you are using the version 1.1.654.0 of AAD Connect and you are not able to update to the new version, for different reasons, you can implement one of these workarounds:
You can attach to the event 6322 a task to restart the "Microsoft Azure AD Sync" service or execute the script
Or implement a scheduled task on the AAD Servers to execute one time per day, or two times or how many times you want, the powershell cmdlet "Get-ADSyncRule", if the service is running.
This is the command that you need to put in the scheduled task:
powershell.exe -executionpolicy remotesigned -file c:\Scripts\reconnect-sql.ps1
This is the "Reconnect-SQL-ps1" script:
# Reconnect SQL Connection "reconnec-sql.ps1"
If((Get-Service ADSync).Status -eq "Running")
elseif ((Get-Service ADSync).Status -eq "Stopped")