The Case of the Unexplained Windows Server 2012 Replica Kerberos Error : 0x8009030C 0x00002EFE

Windows Server 2012 Hyper-V Replica is a new feature that replicates all changes on a virtual machine to a counterpart virtual machine hosted by a different server.  Replica overview and deployment considerations can be found here.  This feature is well documented so I'll skip setup and installation and strictly focus on troubleshooting. I received a call from one of my customers who was trying to evaluate this feature for a disaster recovery scenario in test.  Unfortunately, any time he attempted to enable this feature an exception was raised.  I'd like to share some of the troubleshooting steps we worked though to resolve this case.  You may not have this exact issue but the troubleshooting steps should still be applicable.

The customer environment was fairly straightforward, each Hyper-V host was a domain member and for testing purposes the physical machines themselves were actually located in the same datacenter with gigabit connectivity.  Since this feature was new to me, I started out by first setting it up from scratch in a test environment using commodity hardware which consisted of a DC and two member servers. Armed with this new knowledge and walking through deployment guide above I felt more confident continuing in the investigation.

Domain - REPLICADOMAIN

DC - REPLICADC

Hyper-V Hosts - REPLICAHOST1 (Primary), REPLICAHOST2 (Replica)

 In my customer environment, any attempt to enable replication for a VM returned the following error, 100% repro.  Replica uses mutual authentication based on Kerberos or mutual authentication based on certificates for the server (not for the user).

First place to start is reviewing Windows event logs for items of interest.  I reviewed the following logs on each machine:

System, Application, Security and Microsoft-Windows-Hyper-V-VMMS.  In the latter, I found:

02/04/2013 01:12:06 PM   Error         ITSUSRALAB039.df 32000   Microsoft-Windows-Hyper-V-VMMS      N/A                NT AUTHORITY\SYSTEM               Hyper-V failed to enable replication for virtual machine 'YOURVMNAME': The connection with the server was terminated abnormally (0x00002EFE).

02/01/2013 03:35:17 PM   Error         ITSUSRALAB039.df 29212   Microsoft-Windows-Hyper-V-VMMS      N/A                NT AUTHORITY\SYSTEM               Hyper-V failed to authenticate the primary server using Kerberos authentication. Error: The logon attempt failed (0x8009030C)

You can use the Microsoft Exchange Server Error Code Look-up tool called ERR.exe to help convert the error codes to text.

err 0x00002EFE
# for hex 0x2efe / decimal 12030
  ERROR_INTERNET_CONNECTION_ABORTED                              inetmsg.h
  ERROR_WINHTTP_CONNECTION_ERROR                                    winhttp.h
  ERROR_INTERNET_CONNECTION_ABORTED                              wininet.h
# 3 matches found for "0x00002EFE"

err 0x8009030C
# for hex 0x8009030c / decimal -2146893044
  SEC_E_LOGON_DENIED                                                                  winerror.h

 

The second one was a great tip but unfortunately I didn't put it all together.....  so we did more troubleshooting.

The deployment guide provides specific guidance on what is required for Kerberos delegation and this seemed like a reasonable place to start as well as verifying that all the SPNs are registered.  There are a couple ways to verify this information.  One would be to use ldifde.exe to dump all the configuration settings of the computer object in Active Directory.  The command using ldifde.exe would this:

[Each command would need to be executed against each Hyper-V hosts computer account]

C:\>ldifde -d "CN=REPLICAHOST1,CN=Computers,DC=REPLICADOMAIN,DC=com" -f REPLICAHOST1.log
Connecting to "REPLICADC.REPLICADOMAIN.com"
Logging in as current user using SSPI
Exporting directory to file output.log
Searching for entries...
Writing out entries..
2 entries exported

The command has completed successfully

 

To retrieve the SPN's for each computer object we can leverage setspn.exe, the command would be:

C:\>setspn -L REPLICADOMAIN\REPLICAHOST1

Registered ServicePrincipalNames for CN=REPLICAHOST1,CN=Computers,DC=REPLICADOMAIN,DC=com:
 
 Hyper-V Replica Service/REPLICAHOST1
 Hyper-V Replica Service/REPLICAHOST1.REPLICADOMAIN.com
 Microsoft Virtual System Migration Service/REPLICAHOST1
 Microsoft Virtual System Migration Service/REPLICAHOST1.REPLICADOMAIN.com
 Microsoft Virtual Console Service/REPLICAHOST1
 Microsoft Virtual Console Service/REPLICAHOST1.REPLICADOMAIN.com
 WSMAN/REPLICAHOST1
 WSMAN/REPLICAHOST1.REPLICADOMAIN.com
 TERMSRV/REPLICAHOST1
 TERMSRV/REPLICAHOST1.REPLICADOMAIN.com
 HOST/REPLICAHOST1
 HOST/REPLICAHOST1.REPLICADOMAIN.com

Next we need to verify that the computer objects have Kerberos delegated correctly:

 

 Verify each Hyper-V host can successfully communicate with Active Directory:

C:\>nltest /SC_VERIFY:REPLICADOMAIN
Flags: b0 HAS_IP  HAS_TIMESERV
Trusted DC Name \\REPLICADC.REPLICADOMAIN.com
Trusted DC Connection Status Status = 0 0x0 NERR_Success
Trust Verification Status = 0 0x0 NERR_Success
The command completed successfully

 Verify that KDC is assigning Kerberos tickets:

C:>klist

Current LogonId is 0:0x3364e

Cached Tickets: (7)

#0> Client: Administrator @ REPLICADOMAIN.COM
 Server: krbtgt/REPLICADOMAIN.COM @ REPLICADOMAIN.COM
 KerbTicket Encryption Type: AES-256-CTS-HMAC-SHA1-96
 Ticket Flags 0x60a10000 -> forwardable forwarded renewable pre_authent name_canonicalize
 Start Time: 2/6/2013 15:38:20 (local)
 End Time:   2/7/2013 0:36:21 (local)
 Renew Time: 2/13/2013 14:36:21 (local)
 Session Key Type: AES-256-CTS-HMAC-SHA1-96
 Cache Flags: 0x2 -> DELEGATION
 Kdc Called: REPLICADC.REPLICADOMAIN.com

#1> Client: Administrator @ REPLICADOMAIN.COM
 Server: krbtgt/REPLICADOMAIN.COM @ REPLICADOMAIN.COM
 KerbTicket Encryption Type: AES-256-CTS-HMAC-SHA1-96
 Ticket Flags 0x40e10000 -> forwardable renewable initial pre_authent name_canonicalize
 Start Time: 2/6/2013 14:36:21 (local)
 End Time:   2/7/2013 0:36:21 (local)
 Renew Time: 2/13/2013 14:36:21 (local)
 Session Key Type: AES-256-CTS-HMAC-SHA1-96
 Cache Flags: 0x1 -> PRIMARY
 Kdc Called: REPLICADC.REPLICADOMAIN.com

#2> Client: Administrator @ REPLICADOMAIN.COM
 Server: cifs/REPLICADC.REPLICADOMAIN.com @ REPLICADOMAIN.COM
 KerbTicket Encryption Type: AES-256-CTS-HMAC-SHA1-96
 Ticket Flags 0x40a50000 -> forwardable renewable pre_authent ok_as_delegate name_canonicalize
 Start Time: 2/6/2013 15:38:20 (local)
 End Time:   2/7/2013 0:36:21 (local)
 Renew Time: 2/13/2013 14:36:21 (local)
 Session Key Type: AES-256-CTS-HMAC-SHA1-96
 Cache Flags: 0
 Kdc Called: REPLICADC.REPLICADOMAIN.com

#3> Client: Administrator @ REPLICADOMAIN.COM
 Server: ldap/REPLICADC.REPLICADOMAIN.com @ REPLICADOMAIN.COM
 KerbTicket Encryption Type: AES-256-CTS-HMAC-SHA1-96
 Ticket Flags 0x40a50000 -> forwardable renewable pre_authent ok_as_delegate name_canonicalize
 Start Time: 2/6/2013 15:38:20 (local)
 End Time:   2/7/2013 0:36:21 (local)
 Renew Time: 2/13/2013 14:36:21 (local)
 Session Key Type: AES-256-CTS-HMAC-SHA1-96
 Cache Flags: 0
 Kdc Called: REPLICADC.REPLICADOMAIN.com

#4> Client: Administrator @ REPLICADOMAIN.COM
 Server: RPCSS/REPLICAHOST2 @ REPLICADOMAIN.COM
 KerbTicket Encryption Type: AES-256-CTS-HMAC-SHA1-96
 Ticket Flags 0x40a10000 -> forwardable renewable pre_authent name_canonicalize
 Start Time: 2/6/2013 14:39:01 (local)
 End Time:   2/7/2013 0:36:21 (local)
 Renew Time: 2/13/2013 14:36:21 (local)
 Session Key Type: AES-256-CTS-HMAC-SHA1-96
 Cache Flags: 0
 Kdc Called: REPLICADC.REPLICADOMAIN.com

#5> Client: Administrator @ REPLICADOMAIN.COM
 Server: LDAP/REPLICADC.REPLICADOMAIN.com/REPLICADOMAIN.com @ REPLICADOMAIN.COM
 KerbTicket Encryption Type: AES-256-CTS-HMAC-SHA1-96
 Ticket Flags 0x40a50000 -> forwardable renewable pre_authent ok_as_delegate name_canonicalize
 Start Time: 2/6/2013 14:36:53 (local)
 End Time:   2/7/2013 0:36:21 (local)
 Renew Time: 2/13/2013 14:36:21 (local)
 Session Key Type: AES-256-CTS-HMAC-SHA1-96
 Cache Flags: 0
 Kdc Called: REPLICADC.REPLICADOMAIN.com

#6> Client: Administrator @ REPLICADOMAIN.COM
 Server: RPCSS/REPLICAHOST2.REPLICADOMAIN.com @ REPLICADOMAIN.COM
 KerbTicket Encryption Type: AES-256-CTS-HMAC-SHA1-96
 Ticket Flags 0x40a10000 -> forwardable renewable pre_authent name_canonicalize
 Start Time: 2/6/2013 14:36:21 (local)
 End Time:   2/7/2013 0:36:21 (local)
 Renew Time: 2/13/2013 14:36:21 (local)
 Session Key Type: AES-256-CTS-HMAC-SHA1-96
 Cache Flags: 0
 Kdc Called: REPLICADC.REPLICADOMAIN.com

 

So far everything looks good, no indications of a Kerberos problem (darn it!!).  At this point, we moved on to running a network trace from Primary and Replica server simultaneously. In previous versions of Windows, we would need to install Netmon or Wireshark to collect this information.  A really great trick is leveraging ETW providers to collect the trace.

[Commands to start trace]

ipconfig /flushdns
klist purge
netsh trace start capture = yes

*repro issue*

[Commands to stop trace]

netsh trace stop

 

You can then copy over the ETL trace files to your local machine and view with Netmon.  Unfortunately, the network trace revealed no clues on where to look next. 

 

When in doubt, always turn to Process Monitor  and collect a trace on both sides, I didn't find any Access_Denied or anything else of interest. 

 

At this point, we decided to refocus on the error itself, Kerberos.  I enabled Kerberos debug logging on each Hyper-V machine, reproduced the issue and then stopped trace. Here is the command:

[Start the trace]
Logman.exe start kerb -p "Security: Kerberos Authentication" 0x40043 -o .\kerb.etl -ets

Run through the Hyper-V wizard and stop traces immediately when exception is raised.

[Stop the tace]
Logman.exe stop kerb –ets

Rename Kerb.etl to include the respective server name from both servers.  Unfortunately, in order to convert this trace you will need to engage CTS Support.  My trace revealed a clue which really helped:

Failed to create token: 0xc000015b.  Let's convert that:

err.exe 0xc000015b

STATUS_LOGON_TYPE_NOT_GRANTED

To me this has more to do with user rights security rather than Kerberos security.  In comparing user right assignments between two environments, I noticed a difference which would lead to a break in the case.  From the Local Group Policy Editor, here are the differences:

[Working - Windows Default]

 

 

[Broken - Customer environment]

 

Now we are getting somewhere!  So if this is a security issue, shouldn't an event have been written to Security log which I reviewed in the beginning?  Sigh... Yes.  I simple missed it because I only looked at the Security Log on the Primary machine not the Replica.  Going back to the Security Log I found the following event on the Replica Server:

Log Name:      Security
Source:        Microsoft-Windows-Security-Auditing
Date:          2/6/2013 12:02:01 PM
Event ID:      4625
Task Category: Logon
Level:         Information
Keywords:      Audit Failure
User:          N/A
Computer:      REPLICAHOST2.REPLICADOMAIN.com
Description:
An account failed to log on.

Subject:
 Security ID:  NULL SID
 Account Name:  -
 Account Domain:  -
 Logon ID:  0x0

Logon Type:   3

Account For Which Logon Failed:
 Security ID:  NULL SID
 Account Name:  REPLICAHOST1$
 Account Domain:  REPLICADOMAIN.COM

Failure Information:
 Failure Reason:  The user has not been granted the requested logon type at this machine.
 Status:   0xC000015B
 Sub Status:  0x0

Process Information:
 Caller Process ID: 0x0
 Caller Process Name: -

Network Information:
 Workstation Name: -
 Source Network Address: -
 Source Port:  -

Detailed Authentication Information:
 Logon Process:  Kerberos
 Authentication Package: Kerberos
 Transited Services: -
 Package Name (NTLM only): -
 Key Length:  0

 

This particular customer has a requirement to harden Windows security where possible so instead of simply adding back in all the Windows defaults we granted "Authenticated Users" group the following user right "Access this computer from the network." to both machines which resolved issue!

Summary:

This was a great reminder for me to always take the necessary time to review all of the event logs before engaging in advanced troubleshooting.  I hope some of the troubleshooting steps we used may be helpful for your future investigations.