Windows 2008 / Multi-subnet clusters and using static routes.


===========================================================

***SEE UPDATE***

===========================================================

With the enhancements in Windows 2008 to allow for multi-subnet clustering it is becoming more common to see this utilized with Exchange 2007 SP1 installations. 

When implementing a clustered solution, it is a requirement that there be a minimum of two interfaces on each node, and that each node can maintain communications across those interfaces.  I see administrators implement this requirement in two different fashions with multi-subnet clusters:

  • The “public” interface of each node resides in different subnets with the “private” interfaces residing in a stretched subnet.
  • The “public” interface of each node resides in different subnets with the “private” interfaces also residing in different subnets.

If you are the second bullet, you’ll want to continue reading this blog.  (If you are the first bullet you’ll probably want to read it anyway since you’ve made it this far…)

For users that have a configuration where both network interfaces are in different subnets this will generally require routing between those two subnets.  A common mis-configuration that I see in this design is the use of default gateways on both of these network interfaces.

When a user attempts to configure two network interfaces each with a default gateway, the following error is noted from the operating system:

image

The text in this message is specifically important as it highlights at this time that this configuration will not produce the desired results.

The most likely cluster configuration where Exchange is used, with this type of clustering, is cluster continuous replication (CCR).  When multiple default gateways are defined, users may see inconsistent results in the performance and ability to replicate logs between the nodes.  The replication issues between nodes are also exacerbated when continuous replication hostnames are used utilizing the secondary networks with the default gateway assigned.  These issues are secondary to any issues that the cluster service many have maintaining communications between the nodes and any communications issues clients may have connecting to the nodes.

If the default gateways are removed from the “private” adapters, reliable routed communications can only occur over the “public” interface.  So…if two default gateways cannot be used, how should we ensure proper communications over both the “public” interface and “private” interface where both reside in different routed subnets.

The first part of this solution is to ensure that the binding order of the network interfaces is set correctly in the operating system.  To confirm the binding order:

  • Open the network connections control panel.
  • Choose the advanced menu (if menu is disabled, enable it by selecting Organize –> Layout –> Menu Bar).
  • Select advanced settings from the advanced menu.
  • On the adapters and bindings tab, ensure that the “public” interface is first in the list, with all secondary interfaces following after.

image

 

The second part of the solution is to maintain the default gateway on the “public” interface.

The third part of the solution is to enable persistent static routes on the “private” interfaces.  In terms of the routes we simple need to configure routes to other “private” networks using gateway addresses that have the ability to route between those “private” networks.  All other traffic not matching this route should be handled by the default gateway of the “public” adapter.

Let’s take a look at an example. 

I desire to have a two node Exchange 2007 SP1 CCR cluster on Windows 2008 with each node residing in a different subnet.

NodeA:

Public

  • IP Address 192.168.0.100
  • Subnet Mask 255.255.255.0
  • Default Gateway 192.168.0.254

Private

  • IP Address 10.0.0.1
  • Subnet Mask 255.255.255.0
  • Gateway on network 10.0.0.254

NodeB:

Public:

  • IP Address 192.168.1.100
  • Subnet Mask 255.255.255.0
  • Default Gateway 192.168.1.254

Private

  • IP Address 10.0.1.1
  • Subnet Mask 255.255.255.0
  • Gateway on network 10.0.1.254

(Note that gateway on network is not the default gateway setting but is the gateway on the private interface network that can route packets to the private network on the other nodes.)

In this case I would want to establish the necessary persistent static routes on each node.  In order to accomplish this, I can use the route add command.  The structure of the route command:

NodeA:  Route add 10.0.1.0 mask 255.255.255.0 10.0.0.254 –p

NodeB:  Route add 10.0.0.0 mask 255.255.255.0 10.0.1.254 –p

The –p switch will ensure that the routes are persistent lasting after a reboot.  Failure to use the –p will result in the routes being removed post a reboot operation. 

You can verify that the routes are correct by running route print and reviewing the persistent route information.

image

image

By utilizing only a default gateway on the “public” adapter, and static routes on the “private” adapters, you can ensure safe routed paths for client communications, cluster communications, and replication service log shipping.

========================================================

Update – 1-18-2010

========================================================

With Windows 2008 and Windows 2008 R2 the recommendation to manage static routes has changed.  Although route add should work the management of routes has technically been replaced with functionality in netsh.  Therefore, it is a recommendation that the netsh commands be utilized to implement and manage static routes.

I will leave the previous information un-edited in the blog since many people have used it.

The first step in implementing static routes with the netsh command is to determine the interface names.  The interface name is the logical name assigned to the network connection – for example Local Area Connection 1.  It is recommended that these networks be named into something more logical, for example LAN-Replication-A.  The logical network names may be the same on all nodes.

image

You can also determine that adapter name from an ipconfig /all.  (Note the name listed below in RED)

Windows IP Configuration

   Host Name . . . . . . . . . . . . : DAG-1
   Primary Dns Suffix  . . . . . . . : exchange.msft
   Node Type . . . . . . . . . . . . : Hybrid
   IP Routing Enabled. . . . . . . . : No
   WINS Proxy Enabled. . . . . . . . : No
   DNS Suffix Search List. . . . . . : exchange.msft

Ethernet adapter LAN:

   Connection-specific DNS Suffix  . :
   Description . . . . . . . . . . . : Microsoft Virtual Machine Bus Network Adapter
   Physical Address. . . . . . . . . : 00-15-5D-00-02-07
   DHCP Enabled. . . . . . . . . . . : No
   Autoconfiguration Enabled . . . . : Yes
   Link-local IPv6 Address . . . . . : fe80::dd27:d7f6:549f:6b9b%11(Preferred)
   IPv4 Address. . . . . . . . . . . : 192.168.0.1(Preferred)
   Subnet Mask . . . . . . . . . . . : 255.255.255.0
   IPv4 Address. . . . . . . . . . . : 192.168.0.2(Preferred)
   Subnet Mask . . . . . . . . . . . : 255.255.255.0
   Default Gateway . . . . . . . . . : 192.168.0.254
   DHCPv6 IAID . . . . . . . . . . . : 234886493
   DHCPv6 Client DUID. . . . . . . . : 00-01-00-01-12-45-7C-F8-00-15-5D-00-02-07
   DNS Servers . . . . . . . . . . . : 192.168.0.253
                                       192.168.0.252
                                       192.168.0.251
   Primary WINS Server . . . . . . . : 192.168.0.253
   Secondary WINS Server . . . . . . : 192.168.0.252
                                       192.168.0.251
   NetBIOS over Tcpip. . . . . . . . : Enabled

Ethernet adapter LAN-Replication-A:

   Connection-specific DNS Suffix  . :
   Description . . . . . . . . . . . : Microsoft Virtual Machine Bus Network Adapter #2
   Physical Address. . . . . . . . . : 00-15-5D-00-02-08
   DHCP Enabled. . . . . . . . . . . : No
   Autoconfiguration Enabled . . . . : Yes
   IPv4 Address. . . . . . . . . . . : 10.0.0.1(Preferred)
   Subnet Mask . . . . . . . . . . . : 255.255.255.0
   Default Gateway . . . . . . . . . :
   NetBIOS over Tcpip. . . . . . . . : Disabled

The netsh command format to add static routes looks like:

netsh interface ipv4 add route <IP/Mask> “InterfaceName” Gateway

Using the information from the above example, the following netsh commands would be utilized in place of route add:

NodeA:  netsh interface ipv4 add route 10.0.1.0/24 “LAN-Replication-A” 10.0.0.254

NodeB:  netsh interface ipv4 add route 10.0.0.0/24 “LAN-Replication-A” 10.0.1.254

The netsh command automatically assumes – unless otherwise specified in the command – that the route added is persistent.

If the command completes successfully the route addition can be verified by running:

netsh interface ipv4 show route

The following is sample output with the added route in RED (output truncated to show sample line including prefix and gateway):

C:\>netsh interface ip show route

Prefix                    Idx  Gateway/Interface Name

————————  —  ————————

10.0.1.0/24                11  LAN-Replication-A

This is how the netsh command can be used to accomplish what would have previously been done with route add.

========================================================

 

========================================================

Update 9/18/2012:

Updated the netsh verification command to show correct syntax.

========================================================


Comments (23)

  1. TIMMCMIC says:

    @Syl

    I have a blog post on how I recommend you utilize network ports.  I never personally recommend attempting to use metrics.  When you are attempting to provide redundancy for the primary network interface where the default gateway is, I recommend that a team be utilized and that the team be configured for fault tolerance only.

  2. R-o-b-e-r-t says:

    Hi TIM,
    I configure 2 NIC on each of 2 DAGSERVER deployed in two phisically separated Subnet…tipical scenario with two Phisical well connected SITES (in a single AD site)
    site 1 (192.168.6.x. for MAPI , 10.0.6.y for Replica) site 2 ((192.168.2.x. for MAPI , 10.0.2.y for Replica) .
    ALL NIC configured as Microsoft Exchange Server DAG guideline (included. static Routing…)
    WHY I CANNOT PING THE REPLICA IP NICs?
    GRACIE, CIAO
    ROBERTO (ITALY)

  3. TIMMCMIC says:

    @JBC:

    This blogs for you…

    TIMMCMIC

  4. TIMMCMIC says:

    @Sunny:

    This is a default route that windows added.

    TIMMCMIC

  5. TIMMCMIC says:

    @JPMayery:

    Although you are correct – route.exe should no longer be used with Windows 2008.  Our guidnace with Windows 2008 and newer is the use of netsh.

    TIMMCMIC

  6. TIMMCMIC says:

    @Frederick:

    I believe you are correct although I generally shy away from doing this.  When specifying a /32 mask you are essentially saying on this IP address utilize this static route.  I find that eventually administators will make a change etc and then the route esentially gets broken.  If that network is routed by that particular gateway, I'd recommend using the /24 as it covers the entire subnet then.

    TIMMCMIC

  7. R-o-b-e-r-t says:

    Hi TIM,
    I configure 2 NIC on each of 2 DAGSERVER deployed in two phisically separated Subnet…tipical scenario with two Phisical well connected SITES (in a single AD site)
    site 1 (192.168.6.x. for MAPI , 10.0.6.y for Replica) site 2 ((192.168.2.x. for MAPI , 10.0.2.y for Replica) .
    ALL NIC configured as Microsoft Exchange Server DAG guideline (included. static Routing…)
    WHY I CANNOT PING THE REPLICA IP NICs?
    GRACIE, CIAO
    ROBERTO (ITALY)

  8. Anonymous says:

    This is useful information, I had a similar configuration.

    I added static routes with the "Route Add" command. After a failover of a cluster group the static routes are not working. With the Netsh command the routes disappear after a failover while with Route command they are still visible but not working.

    I removed the static routes and add them again with the Netsh command and everything is working fine.

    Should "Route" command not be removed as its not working or debug the Route command?

  9. TIMMCMIC says:

    @stretched subnet…

    Let’s say that I have datacenter A and I have datacenter B. Let’s say my primary or public network is 10.0.0.X/24. The subnet – through the magic of networking devices, would be valid as primary in both DCA and DCB. IE – regardless of where a server was installed,
    it’s public nic would have the address 10.0.0.X/24.

    Same would go for a stretched secondary network.

    TIMMCMIC

  10. Anonymous says:

    Is there a max # of route statements that Windows 2008 server can handle? I haven’t seen this mentioned anywhere.

  11. Syl says:

    Very good info, and makes sense. However, what if you want to provide default gateway redundancy in case primary network card(s) would fail? Could we use metrics to control which default is used by default, but would switch to the other default gateway if the primary network interface fails?

    Thanks

  12. Frederick says:

    I have a question about the /Mask piece. In the cmd shell with the route add internal command you could specify 255.255.255.255 as the mask. In netsh that would equate out to /32 correct and would that work?

  13. Jbc says:

    This information could be the answer to my problem. I have several geo-clusters running node majority and they are setup with multiple gateways. My problem is that when our WAN takes a outage or is flapping, nodes in our majority site will be trimmed and go into singleton mode. As you can imagine this isn't great for our production servers. We will test the single gateway with statics in our lab and find if this fixes our issue.

    Thanks

  14. Chad says:

    Thanks for the entry regarding Windows 2008 and the switch to netsh.  We were see very odd behavior using the route add method and moving to netsh resolved it.

  15. JPMayery says:

    You can also retrieve the number of your network card and when you use the command Route.exe it works:

    example:

    route add 192.168.35.0 MASK 255.255.255.0-p 18 192.168.35.254 if

    Here we add the number of the card is identified as 18 (if 18).

    Where normally there is no problem.

  16. JPMayery says:

    You can also retrieve the number of your network card and when you use the command Route.exe it works:

    example:

    route add -p 192.168.35.0 MASK 255.255.255.0 192.168.35.254 if 18

    Here we add the number of the card is identified as 18 (if 18).

    Where normally there is no problem.

  17. Sunny Nair says:

    I see an interesting entry in your route table — is this accurate? Will this work given the situation?

    10.0.1.0/32 gateway 10.0.0.254 Interface 192.168.1.4

  18. Tim, many thanks for the article this is the same situation as I'm in now.

    I'm currently planning to change the Primary CCR Node1 with different IP address due to the Data center relocation. Is it possible to break the CCR setup and then set the CCR Node1 with new IP address ?

  19. TIMMCMIC says:

    @Server Support Specialist…

    Yes – you can simply just move the nodes and re-IP them once they are in their new location.  Remember to add an additional IP address to the cluster core resource group and Exchange group valid for that subnet.

    TIMMCMIC

  20. Thanks for the clarification Tim,

    So the following steps are as follows:

    Server A – Prod

    Server B – DR

    1. Failover the cluster mailbox resource from ServerA into ServerB

    2. Change the Primary Interface IP address (MAPI), Secondary Interface IP address (CCR and heart beat) from the Failover Cluster Administration console.

    Replace the necessary routing using Netsh command as per the article above.

    3. Reboot Server A

    4. Failover back cluster mailbox resource from Server B into Server A

    is that the correct steps which should not cause any outage apart from the cluster failover ?

  21. stretched subnet says:

    can you please tell what do you exactly mean by the term "stretched subnet" ? thank you. great post !

  22. Virendra Tingroya says:

    Dear Tim,

    We have Gateway scripts in our project to manage network redundancy for windows xp and windows server 2003.

    The script is as below:

    ######################################################################

    @echo off

    ::Split Current Date
    FOR /F "TOKENS=2-4 DELIMS=/ " %%A IN (‘ECHO.%DATE%’) DO (
    SET YY=%%C
    SET MM=%%A
    SET DD=%%B
    )

    set currentdir=%cd%
    echo ……………………….. >> %currentdir%GatewayLogs_%computername%_%YY%%MM%%DD%.txt
    echo Script started on %computername% %date% %time% >> %currentdir%GatewayLogs_%computername%_%YY%%MM%%DD%.txt

    :: Initial Parameter set to -1. This will paramter will be added up when a ping occurs.

    SET /a y=-1

    :: Initial Parameter set to -1. This will paramter will be added up when a ping occurs.

    SET /a x=-1

    ipconfig /all >> %currentdir%GatewayLogs_%computername%_%YY%%MM%%DD%.txt
    route print >> %currentdir%GatewayLogs_%computername%_%YY%%MM%%DD%.txt
    route delete 0.0.0.0
    echo route delete 0.0.0.0 >> %currentdir%GatewayLogs_%computername%_%YY%%MM%%DD%.txt
    route add 0.0.0.0 mask 0.0.0.0 172.16.48.2
    echo route add 0.0.0.0 mask 0.0.0.0 172.16.48.2 >> %currentdir%GatewayLogs_%computername%_%YY%%MM%%DD%.txt

    :Check1
    PING -w 100 172.16.48.2 >nul
    IF "%ERRORLEVEL%"=="1" GOTO ERROR1
    set /a y=0
    GOTO Check1

    :Check2
    PING -w 100 172.17.48.2 >nul
    IF "%ERRORLEVEL%"=="1" GOTO ERROR2
    set /a x=0
    GOTO Check2

    ::This loop will help to increase the parameter x by 1, when there is a ping loss.

    :ERROR2
    SET /a x+=1
    echo LAN2 172.17.x.x error %x% time >> %currentdir%GatewayLogs_%computername%_%YY%%MM%%DD%.txt
    IF %x% == 3 GOTO ERROR4
    GOTO Check2

    :ERROR4
    echo I am in error4: network 172.17.x.x failed on %date% %time% >> %currentdir%GatewayLogs_%computername%_%YY%%MM%%DD%.txt
    SET /a x=0
    route delete 0.0.0.0
    route add 0.0.0.0 mask 0.0.0.0 172.16.48.2
    echo route add 0.0.0.0 mask 0.0.0.0 172.16.48.2 >> %currentdir%GatewayLogs_%computername%_%YY%%MM%%DD%.txt
    GOTO Check1

    ::This loop will help to increase the parameter y by 1, when there is a ping loss.

    :ERROR1
    SET /a y+=1
    echo LAN1 172.16.x.x error %y% time >> %currentdir%GatewayLogs_%computername%_%YY%%MM%%DD%.txt
    IF %y% == 3 GOTO ERROR3
    GOTO Check1

    :ERROR3
    echo I am in error3: network 172.16.x.x failed on %date% %time% >> %currentdir%GatewayLogs_%computername%_%YY%%MM%%DD%.txt
    SET /a y=0
    route delete 0.0.0.0
    route add 0.0.0.0 mask 0.0.0.0 172.17.48.2
    echo route add 0.0.0.0 mask 0.0.0.0 172.17.48.2 >> %currentdir%GatewayLogs_%computername%_%YY%%MM%%DD%.txt
    GOTO Check2

    ######################################################################33

    As i understand that route add command does not support Windows 7 and windows server 2008, How Netsh can be used to achieve same functionality as route add in my script??

  23. TIMMCMIC says:

    @Virendra…

    See the instructions on the equivalent netsh commands in this blog. I’d assume you’d replace the route add with the equivalent netsh command in your script.

    TIMMCMIC