Issue no 1: DHCP Failover server issues reserved IP address to a client with a different MAC address
This issue pertains to the case when a DHCP Scope which is a part of a failover relationship has an exclusion range and a reservation exists in the scope for one of the IP addresses within the exclusion range. The sequence of events leading to this issue is as follows:
- The reserved IP address is leased out to the reserved client.
- That client releases the IP address (DHCP RELEASE message) by sending a DHCP release messages. This causes the DHCP server to mark the IP addresses as available.
- A different client sends a DISCOVER packet to the DHCP Server. The DHCP server OFFERs the reserved IP to this client though the MAC address of the client does not match the one in the reservation..
- Now the client sends a REQUEST packet to the server.
- DHCP Server now sends back a NAK message taking into account that it is a reserved IP Address.
- Client starts the DORA (DISCOVER OFFER REQUEST ACK) sequence again leading to the same consequences.
- The client is perpetually stuck into a DISCOVER-OFFER-REQUEST-NAK cycle and never gets an IP Address.
With time this may lead to many clients stuck in DISCOVER-OFFER-REQUEST-NAK cycle and losing network connectivity.
Issue no 2: Some IP addresses are perpetually stuck in BAD ADDRESS state on one of the DHCP failover servers while in Active state on the other server. DHCP Server admin channel contains BINDING ACK reject events 20291 and 20292 for these IP addresses. The sequence of events which leads to this issue is as follows:
- A DHCP Server is migrated to a DHCP Server 2012 or DHCP Server 2012 R2 without migrating the lease records. The new DHCP server does not have any leases in it’s database.
- Server is configured for failover leading to a state where none of the failover partners have any lease records.
- A new client requests an IP address. One of the failover partners leases out the first free IP address in it’s database. This IP address is already in use by another client who had obtained it from the DHCP server before migration.
- The client performs a duplicate address detection test which fails. The client declines the lease (DHCP DECLINE) and hence the address is marked as BAD_ADDRESS on the DHCP server.
- The same update is sent to the partner server and that server also marks that IP address as a BAD_ADDRESS.
- When the client to whom the IP Address was issued originally sends a RENEW request to one of the failover partners the server sends an ACK and marks the IP Address as Active.
Now when the update of BAD_ADDRESS to Active state is sent to the partner server, the BINDING update is rejected leading to inconsistency between the 2 DHCP servers.
Both these issues have been fixed in the rollup update for Windows Server 2012 in KB 2919393 and for Windows Server 2012 R2 is KB 2919355.
DHCP failover relation need not be recreated to apply this patch. DHCP server restart is required.
UPDATE: Some customers reported events 20291 getting logged even after applying KB 2919355. KB 2955135 (http://support.microsoft.com/kb/2955135 ) has been released to address this issue with DHCP failover. Customers who have DHCP failover deployments with Windows Server 2012 R2 and experiencing events 20291 should apply this patch.
We are aware of a remote scenario that is not addressed by the KB 2919355. It occurs when a client sends multiple request packets within the same second and the second client request is a release request. You will notice the following pattern in the DHCP audit logs.
In DHCP failover, whenever one of the server responds with an ACK to a client request, the same is updated in the database of the DHCP server with a timestamp and is sent to the partner for synchronization. This timestamp is at granularity of a second. If the partner server receives two synchronization updates from the other DHCP server with the same timestamp, it will reject the second request. This is as per the DHCP failover IETF document (https://tools.ietf.org/html/draft-ietf-dhc-failover-12). If the second request from the client is a release request then this causes the failover servers to go out of sync as the partner server will reject the synchronization update but the first server (which received the client request) will apply the update to its database.
This will cause the failover servers to generate 20291 and 20292 events.
This is considered a remote scenario as a client sending two requests within the same second seems unlikely. However, we would like to understand if customers are seeing this scenario in their deployment. Customers seeing such an issue should contact Microsoft support which will enable collecting of required logs etc to diagnose the issue further.