Active Directory Replication fails due network device tricking RPC traffic

Hello, here is Daniel Mauser from Windows Networking Support team. Recently I worked in a support case where customer complained about Active Directory replication issues, and the underline issue was caused by a device in the middle, which was doing some kind weird behavior with TCP/IP.

In this scenario, basic test connectivity by using tools such as psping or telnet over primary AD TCP port 135 (RPC) showed port was correctly open. However, things start to get uncommon when TCP/IP communication is initiated. I will walk you on the troubleshooting process taken and the outcome to discover the root cause of the issue.

Scenario:

This is very simple scenario where we have the following relevant information.

  • Involved domain controllers:
    • DC1 has NIC configured with IP 10.1.1.10/24;
    • DC4 has NIC configured with IP 10.4.1.40/24;
      As you can note both DCs are in different subnets.
  • Network Captures taken on both sides at the same time (that is core to understand what is going on).
  • DC04 initiates the communication to DC01.

Data Collection:

You can leverage netsh trace to start the network captures. This command is built-in starting on Windows 7 and Server 2008 R2. Also, for this troubleshooting ,we do not need to capture the whole TCP payload but only first 512 bytes (packettruncatebytes=512) which brings some advantages on reducing the size of the capture file, as well as reducing the chances of packet drops when servers, in production, are very busy talking to multiple clients and other services. This article gives you a good insight on that piece: How to Enable a Circular Network Capture with Nmcap or Netsh

netsh trace start capture=yes packettruncatebytes=512 tracefile=c:\%computername%_nettrace.etl maxsize=200 filemode=circular overwrite=yes report=no

After we reproduce the issue we stop the trace above via netsh trace stop on both DC01 and DC04.

Analysis

ETL file produced by the command above can be opened either in Network Monitor 3.4 or Message Analyzer (Recommended). You can also convert this ETL file to CAP and review it on Wireshark. Check this article from my PFE friend Yong Rhee on how to do that: So you want to use Wireshark to read the netsh trace output .etl?

Here is the analysis of the network capture taken for both sides, see comments column for more details.

Review of Capture Taken on DC04

Frame

Src

Dst

Proto

Description

Comments

133

DC04

DC01

TCP

TCP:Flags=......S., SrcPort=57865, DstPort=DCE endpoint resolution(135), PayloadLen=0, Seq=2468100402,Ack=0, Win=8192 ( Negotiating scale factor 0x8 ) = 8192

- Ipv4: Src = 10.4.1.40, Dest = 10.1.1.10, Next Protocol = TCP,Packet ID = 16375,     Identification: 16375 (0x3FF7)

TimeToLive : 128 (0x80)

DC04 initiates TCP 3-Way Handshake with a SYN. On the same conversation captured on DC01 we see this packet arrived there on Frame 1998 (table below). We see IP identification16375 on both sides which means packet left DC04 and the samearrived on DC01.     

134

DC01

DC04

TCP

TCP:Flags=...A..S., SrcPort=DCE endpoint resolution(135), DstPort=57865, PayloadLen=0, Seq =1441552686, Ack=2468100403, Win=0 ( Negotiated scale factor 0x8 ) = 0

- Ipv4: Src = 10.1.1.10, Dest = 10.4.1.40, Next Protocol = TCP,Packet ID = 65534, Total IP Length = 52

Identification:65534 (0xFFFE)

TimeToLive : 54 (0x36) è ROOT CAUSE OF THE ISSUE

Here is when issue starts. Something, which is not DC01, replies that TCP SYN flag with TCP Window Size=0, which means no buffer available to process TCP Request. How do we know this is not really DC01 but a device talking on behalf DC01?

The evidences are IP Identification 65534 and TTL which is 54 . We will see the right packet from DC01 is on frame 136 (TCP Flags ACK and SYN) and has TTL 121.

 

Note  Windows default TTL is 128 and decrements on each hop (router). In this frame, we have TTL=54 usually is attributed for anon-Windows device.

135

DC04

DC01

TCP

TCP:Flags=...A...., SrcPort=57865, DstPort=DCE endpoint resolution(135), PayloadLen=0, Seq=2468100403, Ack=1441552687, Win=256 (scale factor 0x8) = 65536

- Ipv4: Src = 10.4.1.40, Dest = 10.1.1.10, Next Protocol = TCP,Packet ID = 16376,

Identification:16376 (0x3FF8 )

TimeToLive : 128 (0x80)

DC04 sends ACK which also arrives fine on frame 2000 (see table below). Compare both IP Identification numbers which we have: 16376. The issue here is this Acknowledges the SEQ + 1 of invalid ACK SYN from frame 134.

136

DC01

DC04

TCP

TCP:Flags=...A..S., SrcPort=DCE endpoint resolution(135), DstPort=57865, PayloadLen=0, Seq =2228284491, Ack=2468100403, Win=8192 ( Negotiated scale factor 0x8 ) = 2097152

- Ipv4: Src = 10.1.1.10, Dest = 10.4.1.40, Next Protocol = TCP, Packet ID = 20767,     TotalLength: 52 (0x34)

Identification: 20767 (0x511F)

TimeToLive : 121 (0x79)

This is the real ACK SYN from DC01. Thepoint here is DC04 send SEQ Seq =2228284491 but in next frame we see DC01 stick on ACK of the spoofed frame 134 which has Seq =1441552686.              

137

DC04

DC01

TCP

TCP:[Dup Ack #135] Flags=...A....,SrcPort=57865, DstPort=DCE endpoint resolution(135), PayloadLen=0, Seq=2468100403, Ack=1441552687, Win=256 (scale factor 0x8) =65536- Ipv4: Src = 10.4.1.40, Dest = 10.1.1.10, Next Protocol = TCP,Packet ID = 16377, Total IP Length = 40

Identification: 16377 (0x3FF9)

TimeToLive: 128 (0x80)

Here is DUP ACK which basically replies the ACK for invalid frame 134 Ack=1441552687.

Note that ACK increments in 1 on TCP 3-way  handshake (see: https://wiki.wireshark.org/TCP_3_way_handshaking)

138

DC01

DC04

TCP

TCP:Flags =.....R.., SrcPort=DCE endpoint resolution(135), DstPort=57865, PayloadLen=0, Seq=1441552687, Ack=1441552687, Win=0 (scale factor 0x8) = 0

- Ipv4: Src = 10.1.1.10, Dest = 10.4.1.40, Next Protocol = TCP, Packet ID = 20768, Total IP Length = 40

Identification: 20768 (0x5120)

TimeToLive: 121 (0x79)

DC01 sends a RESET because that ACK does not match with the SEQ of real ACK SYN of frame 136.So, DC01 expects DC04 to ACK (Acknowledge) Sequence:  Seq =2228284491            

 

Review of Capture Taken on DC01 

Frame

Src

Dest

Proto

Description

Comments

1998

DC04

DC01

TCP

TCP:Flags=......S., SrcPort=57865, DstPort=DCE endpoint resolution(135), PayloadLen=0, Seq=2468100402, Ack=0, Win=8192 ( Negotiating scalefactor 0x8 ) = 8192

- Ipv4: Src = 10.4.1.40, Dest = 10.1.1.10, Next Protocol = TCP, Packet ID = 16375, Identification: 16375 (0x3FF7)

TimeToLive : 122 (0x7A)

This is valid the frame from DC04 which is the same as frame 133 from the table above.Identification is the same and TTL decrements due network hops between both DCs.

1999

DC01

DC04

TCP

TCP:Flags=...A..S., SrcPort=DCE endpoint resolution(135), DstPort=57865, PayloadLen=0, Seq=2228284491, Ack=2468100403, Win=8192 ( Negotiated scale factor 0x8 ) = 2097152

- Ipv4: Src = 10.1.1.10, Dest =10.4.1.40, Next Protocol = TCP, Packet ID = 20767, Total IP Length = 52

Identification: 20767 (0x511F)

TimeToLive : 128 (0x80)

This is the real ACK SYN which is the same frame as frame 136 seen on Network capture taken on DC04 (See the table above)

 

2000

DC04

DC01

TCP

TCP:Flags=...A...., SrcPort=57865, DstPort=DCE endpoint resolution(135), PayloadLen=0, Seq=2468100403, Ack=1441552687, Win=256 (scale factor 0x8) =65536

- Ipv4: Src = 10.4.1.40, Dest =10.1.1.10, Next Protocol = TCP, Packet ID = 16376, Total IP Length = 40

Identification: 16376 (0x3FF8)

TimeToLive : 122 (0x7A)

This is the ACK from DC04, frame 135 but it contains the invalid ACK number Ack=1441552687. Again, this is due the fact of an unknown device (Frame 134 of DC04 Table) has started the invalid conversation.

2001

DC01

DC04

TCP

TCP:Flags =.....R.., SrcPort=DCE endpoint resolution(135), DstPort=57865, PayloadLen=0, Seq=1441552687, Ack=1441552687, Win=0 (scale factor 0x8) = 0

- Ipv4: Src = 10.1.1.10, Dest =10.4.1.40, Next Protocol = TCP, Packet ID = 20768, Total IP Length = 40

Identification: 20768 (0x5120)

TimeToLive: 128 (0x80)

This is TCP RESET sent from DC01 which does not recognize the ACK sent by DC04. Again, DC04 was misled by invalid ACK SYN sent by an unknown device in the middle of the communication.

 

Let's review helpful points in thew review which are relevant when reviewing this kind of issue:

  1. Check on both captures for IP Identification numbers and see if they match on both side. We can see above that frame 134 has an IP Identification number 65534 that has been not sent by DC01.
  2. Check SEQ and ACK numbers. This initially looks hard but you get used to it. Practice makes perfect :-)
  3. There's an evidence on frame 134 another device is intercepting the communication. So, TTL in Windows is always 128. TTL on frame 134 is 54 which tell us this is not a Windows machine. Also, TTL for right frame decrements from 128 leaving DC01 and arrives on DC04 with 121, which means about 7 hops or 7 routers in the middle decremented the TTL (We can also explorer this in another article).
  4. TCP Zero Window is another important which tells us basically that device is asking Windows not to send traffic because we see TCP Zero Window on frame 134.

Conclusion

The capture shows clearly that an unknown network device between DC01 and DC04 is causing the issue. The conversation is tricked due the fact of ACK SYN packet sent by DC01 the unknown device misleads DC04 to send invalid ACK numbers and the end results in abrupt end of communication (TCP Reset) sent by DC01 because it does not understand them. It is clear in this scenario that TCP Port 135 (RPC Endpoint Mapper) is not blocked and another device by reviewing the TTL.

As side note, reviewing the whole capture we figure out also that behavior only for port TCP 135 (RPC) and did not happen on other TCP ports for conversation between DC01 and DC04. It is very common security devices perform RPC inspection in the network.

I hope you enjoy this article and let us know in the comments what you think. Stay tuned for more troubleshooting articles like this.
Special thanks for Support Escalation Engineer Daniel Pires (another networking geek), for the insights on this article.