SMB Transparent Failover – making file shares continuously available

Note: this post originally appeared on https://aka.ms/clausjor by Claus Joergensen.

SMB Transparent Failover is one of the key features in the feature set introduced in Server Message Block (SMB) 3.0. SMB 3.0 is new in Windows Server 2012 and Windows 8. I am the program manager for SMB Transparent Failover and in this blog post I will give an overview of this new feature.

In Windows Server 2012 the file server introduces support for storing server application data, which means that server applications, like Hyper-V and SQL Server, can store their data files, such as virtual machine files or SQL databases on Windows file shares. These server applications expect their storage to reliable and always available and they do not generally handle IO errors or unexpected closures of handles very well. If the server application cannot access its storage this often leads to databases going offline or virtual machines stopping or crashing because they can no longer write to their disk.

SMB Transparent Failover enables administrators to configure Windows file shares, in Windows Failover Clustering configurations, to be continuously available. Using continuously available file shares enables administrators to perform hardware or software maintenance on any cluster node without interrupting the server applications that are storing their data files on these file shares. Also, in case of a hardware or software failure, the server application nodes will transparently reconnect to another cluster node without interruption of the server applications. In case of a SMB scale-out file share (more on Scale-Out File Server in a following blog post), SMB Transparent Failover allows the administrator to redirect a server application node to a different file server cluster node to facilitate better load balancing.

For more information on storing server application data on SMB file shares and other features to support this scenario, see Windows Server “8” – Taking Server Application Storage to Windows File Shares

 

Installation and configuration

SMB Transparent Failover has the following requirements:

  • A failover cluster running Windows Server 2012 with at least two nodes. The configuration of servers, storage and networking must pass the all tests performed in the Validate a Configuration wizard.
  • File Server role is installed on all cluster nodes.
  • Clustered file server configured with one or more file shares created with the continuously available property. This is the default setting.
  • SMB client computers running the Windows 8 client or Windows Server 2012.
  • To realize SMB Transparent Failover, both the SMB client computer and the SMB server computer must support SMB 3.0, which is introduced in Windows 8 and Windows Server 2012. Computers running down-level SMB versions, such as 1.0, 2.0 or 2.1 can connect and access data on a file share that has the continuously available property set, but will not be able to realize the benefits of the SMB Transparent Failover feature.

    Installing and creating a Failover Cluster

    Information about how to install the Failover Clustering feature, creating and troubleshooting a Windows Server 2012 Failover Cluster see these blog posts:

    Installing the File Server role

    Once the Failover Cluster is up and running, we can install the File Server role. Do the following for each node in the Failover Cluster:

    Graphical User Interface
    • Start Server Manager
    • Click Add roles and features
    • In the Add Roles and Features Wizard, do the following:
      • In Before you begin, click Next
      • In Select installation type, click Next
      • In Select destination server, choose the server where you want to install the File Server role, and click Next
      • In Select server roles, expand File And Storage Services, expand File and iSCSI Services, and check the check box for File Server and click Next
      • In Select features, click Next
      • In Confirm installation selections, click Install

    image

    Figure 1 – Installing File Server role

    PowerShell

    In an elevated PowerShell shell, do the following:

    Add-WindowsFeature -Name File-Services

    Create clustered File Server

    Once the File Server role is installed on all cluster nodes, we can create a clustered file server. In this example we will create a clustered file server of type “File Server for general use” and name it SMBFS. I will provide more information on “Scale-Out File Server for application data” in a follow-up blog post.

    Do the following to create a clustered file server.

    Graphical User Interface
    • Start Server Manager
    • Click Tools and select Failover Cluster Manager
    • In the console tree, do the following
    • Select and expand the cluster you are managing

     

  • Select Roles
  • In the Actions pane, click Configure Role
  • In Before You Begin, click Next
  • In Select Role, select File Server and click Next
  • In File Server Type, select the type of clustered file server you want to use
  • In Client Access Point, enter the name of the clustered file server
  • In Client Access Point, complete the Network Address for static IP addressed as needed and click Next
  • In Select Storage, select the disks that you want to assign to this clustered file server and click Next
  • In Confirmation, review your selections and when ready click Next
  • clip_image004Figure 2 – Select File Server Typeclip_image006Figure 3 – Configure Client Access Point

    clip_image008

    Figure 4 – Select Storage

    PowerShell

    In an elevated PowerShell shell, do the following:

    Add-ClusterFileServerRole -Name SMBFS -Storage “Cluster Disk 1” -StaticAddress 192.168.9.99/24

    Create a file share that is continuously available

    Now that we have created the clustered file server, we can create file shares that are continuously available. In this example we will create a file share named “appstorage” on the clustered file server we created previously.

    Do the following to create a file share that is continuously available:

    Graphical User Interface
    • Start Server Manager
    • Click Tools and select Failover Cluster Manager
    • In the console tree, do the following
    • Select and expand the cluster you are managing

     

  • Select Roles
  • In the Results pane, select the file server where you want to create the file share and in the Actions pane click Add File Share. This will start the New Share Wizard
  • In the New Share Wizard, do the following
    • In Select Profile, select the appropriate profile (SMB Share – Applications in this example) and click Next
  • In Share Location, select the volume where you want to create the share and click Next
  • In Share Name, enter the share name and click Next
  • In Configure Share Setting, verify Enable continuous availability is set and click Next
  • In Specify permissions and control access, modify the permissions as needed to enable access and click Next
  • In Confirmation, review your selections and when ready click Create
  • Click Close
  • clip_image010Figure 5 – Select Profileclip_image012Figure 6 – Select server and path

    clip_image014

    Figure 7 – Share Name

    clip_image016

    Figure 8 – Configure Share Settings

    To verify a share has the continuously available property set, do the following:

    • Start Server Manager
    • Click Tools and select Failover Cluster Manager
    • In the console tree, do the following
    • Select and expand the cluster you are managing

     

  • Select Roles
  • In the Results pane, select the file server you want to examine
  • In the bottom window, click the Shares tab
  • Locate the share of interest and examine the Continuous Availability property
  • PowerShell

    These steps assume the folder for the share is already created. If this is not the case, create folder before continuing.

    In an elevated PowerShell shell on the cluster node where the clustered file server is online, do the following to create a file share with continuous availability property set:

    New-SmbShare -Name AppStorage –Path f:\appstorage –Scope smbfs –FullControl smbtest\administrator

    In an elevated PowerShell shell on the cluster node where the clustered file server is online, do the following to verify a file share has continuous availability property set.

    Get-SmbShare -Name AppStorage | Select *

    PresetPathAcl : System.Security.AccessControl.DirectorySecurity

    ShareState : Online

    AvailabilityType : Clustered

    ShareType : FileSystemDirectory

    FolderEnumerationMode : Unrestricted

    CachingMode : None

    CATimeout : 0

    ConcurrentUserLimit : 0

    ContinuouslyAvailable : True

    CurrentUsers : 0

    Description :

    EncryptData : False

    Name : appstorage

    Path : F:\Shares\appstorage

    Scoped : True

    ScopeName : SMBFS

    SecurityDescriptor : O:BAG:DUD:(A;OICI;FA;;;WD)

    ShadowCopy : False

    Special : False

    Temporary : False

    Volume : \\?\Volume{266f94b0-9640-4e1f-b056-6a3e999e6ecf}\

    Note that we didn’t request the continuous availability property to be set. This is because the property is set by default. If you want to create a file share without the property set, do the following:

    New-SmbShare -Name AppStorage -Path f:\appstorage -Scope smbfs –FullControl smbtest\administrator -ContinuouslyAvailable:$false

    Using a file share that is continuously available

    Now that we have created a clustered file server with a file share that is continuously available, let’s go ahead and use it.

    The below diagram illustrates the setup that I will be using in this section.

    clip_image018

    Figure 9 – Clustered File Server

    On the file share is a 10GB data file (testfile.dat) that is being accessed by an application on the SMB client computer (FSF-260403-10). The below screenshot shows the SMB Client Shares performance counters for \\smbfs\appstorage share as seen from the SMB Client. As you can see the application is doing 8KB reads and writes.

    clip_image020

    Figure 10 – Data Access

    Zeroing in on data requests/sec in graph form, we see the following:

    clip_image022

    In an elevated PowerShell shell on the cluster node where the clustered file server is online, do the following to:

    Get-SmbOpenFile | Select *

    ClientComputerName : [2001:4898:e0:32af:890b:6268:df3b:bf8]

    ClientUserName : SMBTEST\Administrator

    ClusterNodeName :

    ContinuouslyAvailable : True

    Encrypted : False

    FileId : 4415226380557

    Locks : 0

    Path : F:\Shares\appstorage\testfile.dat

    Permissions : 1180059

    ScopeName : SMBFS

    SessionId : 4415226380341

    ShareRelativePath : testfile.dat

    Planned move of the cluster group

    With assurance that the file handle is indeed continuously available, let’s go ahead and move the cluster group to another cluster node. In an elevated PowerShell shell on one of the cluster nodes, do the following to move the cluster group:

    Move-ClusterGroup -Name smbfs -Node FSF-260403-08

    Name      OwnerNode       State

    —-      ———       —–

    smbfs     FSF-260403-08   Online

    Looking at Data Requests/sec in Performance Monitor, we see that there is a short brown-out where IO is stalled of a few seconds while the cluster group is moved, but continues uninterrupted when the cluster group has completed the move.

    The tear down and setup of SMB session, connections and active handles between the SMB client and the SMB server on the cluster nodes is handled completely transparent to the application. The application does not see any errors during this transition, only a brief stall in IO.

    clip_image024

    Figure 11 – Move Cluster Group

    Let’s take a look at the operational log for SMB Client in Event Viewer (Applications and Services Log – Microsoft – Windows – SMB Client – Operational) on the SMB Client computer.

    In the event log we see a series of warning events around 9:36:01PM. These warning events signal the tear down of SMB connections, sessions and shares. There is also a series of information events around 9:36:07PM. These information events signal the recovery of SMB sessions, connections and shares. These events are very useful in understanding the activities during the recovery and that the recovery was successfulJ

    clip_image026

    Figure 12 – Events for planned move

    So how does SMB Transparent Failover actually work? When the SMB client initially connects to the file share, the client determines whether the file share has the continuous availability property set. If it does, this means the file share is a clustered file share and supports SMB transparent failover. When the SMB client subsequently opens a file on the file share on behalf of the application, it requests a persistent file handle. When the SMB server receives a request to open a file with a persistent handle, the SMB server interacts with the Resume Key filter to persist sufficient information about the file handle, along with a unique key (resume key) supplied by the SMB client, to stable storage.

    If a planned move or failure occurs on the file server cluster node to which the SMB client is connected, the SMB client attempts to reconnect to another file server cluster node. Once the SMB client successfully reconnects to another node in the cluster, the SMB client starts the resume operation using the resume key. When the SMB server receives the resume key, it interacts with the Resume Key filter to recover the handle state to the same state it was prior to the failure with end-to-end support (SMB client, SMB server and Resume Key filter) for operations that can be replayed, as well as operations that cannot be replayed. Resume Key filter also protects the handle state after failover to ensure namespace consistency and that the client can reconnect. The application running on the SMB client computer does not experience any failures or errors during this operation. From an application perspective, it appears the I/O operations are stalled for a small amount of time.

    To protect against data loss from writing data into an unstable cache, persistent file handles are always opened with write through.

    Unplanned failure of the active cluster node

    Now, let’s introduce an unplanned failure. The cluster group was moved to FSF-260403-08. Since all these machines are running as virtual machines in a Hyper-V setup, I can use Hyper-V manager to reset FSF-260403-08.

    Looking at Data Requests/sec in Performance Monitor, we see that there is a slightly longer brown-out where IO is stalled. In this time period cluster detects that FSF-260403 has failed and starts the cluster group on another node. Once started, SMB can perform transparent recovery.

    clip_image028

    Figure 13 – Unplanned Failure

    And again the SMBClient event log shows events related to the event:

    clip_image030

    Figure 14 – Events for unplanned failure

    Now you will probably ask yourself: “Wait a minute. SMB is running over TCP and TCP timeout is typically 20 seconds and SMB uses a couple of them before determining the cluster node failed. So how come the recovery is ~10 seconds and not 40 or 60 seconds??”

    Enter Witness service.

    Witness service was created to enable faster recovery from unplanned failures, allowing the SMB client to not have to wait for TCP timeouts. Witness is a new service that is installed automatically with the failover clustering feature. When the SMB client initially connects to a cluster node, the SMB client notifies the Witness client, which is running on the same computer. The Witness client obtains a list of cluster nodes from the Witness service running on the cluster node it is connected to. The Witness client picks a different cluster node and issues a registration request to the Witness service on that cluster node. The Witness service then listens to cluster events related to the clustered file server the SMB client is connected to.

    If an unplanned failure occurs on the file server cluster node the SMB client is connected to, the Witness service on the other cluster node receives a notification from the cluster service. The Witness service notifies the Witness client, which in turns notifies the SMB client that the cluster node has failed. Upon receiving the Witness notification, the SMB client immediately starts reconnecting to a different file server cluster node, which significantly speeds up recovery from unplanned failures.

    You can examine the state of the Witness service across the cluster using the Get-SmbWitnessClient command. Notice that Get-SmbWitnessClient can be run on any cluster node and provides a cluster aggregate view of Witness service, similar to Get-SmbOpenFile and Get-SmbSessions. In an elevated PowerShell shell on one of the cluster nodes, do the following to:

    Get-SmbWitnessClient | select *

    State : RequestedNotifications

    ClientName : FSF-260403-10

    FileServerNodeName : FSF-260403-08

    IPAddress : 2001:4898:E0:32AF:3256:8C83:59E5:BDB5

    NetworkName : SMBFS

    NotificationsCancelled : 0

    NotificationsSent : 0

    QueuedNotifications : 0

    ResourcesMonitored : 1

    WitnessNodeName : FSF-260403-07

    Examining the above output (run before the unplanned failure), we can see the SMB client (FSF-260403-10) is currently connected to cluster node FSF-260403-08 (SMB connection) and has registered for witness notification for SMBFS with Witness service on FSF-260403-07.

    Looking at Event Viewer (Applications and Services Log – Microsoft – Windows – SMBWitnessClient – Operational) on the SMB Client computer, we see that the Witness client received notification for SMBFS. Since the cluster group was moved to FSF-260403-07, which is also the Witness node for the Witness client, the following event shows the Witness client unregistering from FSF-260403-07 and registering with FSF-260403-09.

    clip_image032

    Figure 15 – Witness event log

     

    Tips and Tricks

    Protecting file server services

    LanmanServer and LanmanWorkstation runs in service hosts with other services. In extreme cases other services running in the same service hosts can affect the availability of LanmanServer and LanmanWorkstation. You can configure these services to run in their own service host using the following commands:

    sc config lanmanserver type= own

    sc config lanmanworkstation type= own

    The computer needs to be restarted for this change to take effect.

    Loopback configurations

    Accessing a file share, that has continuously available property set, as a loopback share is not supported.

    For example, SQL Server or Hyper-V storing their data files on SMB file shares must run on computers that are not a member of the file server cluster for the SMB file shares.

    Using legacy tools

    When creating file shares, the continuous availability property is set by default on tools introduced in Windows Server 2012, including the new file share creation wizard and the New-SmbShare command. If you have automation built around using older tools, such as NET SHARE or Explorer or using the NET APIs the continuous availability property will not be set by default and these tools do not support setting it. To work around this issue you can set the following registry key, which will cause all shares to be created with the property set regardless if they support it or not:

    Set-ItemProperty -Path “HKLM:\SYSTEM\CurrentControlSet\Services\LanmanServer\Parameters” EnableCaAlways -Value 1 –Force

    Witness service

    By default the network traffic between the Witness Client and Witness Server requires mutual authentication and is signed. However the traffic is not encrypted, as it doesn’t contain any user data. It is possible to enable encryption of Witness network traffic.

    To configure the Witness client to send traffic encrypted, set the following registry key on each client:

    Set-ItemProperty -Path “HKLM:\SYSTEM\CurrentControlSet\Services\LanmanWorkstation\Parameters” WitnessFlags -Value 1 –Force

    To configure the Witness Service to not accept unencrypted traffic, set the following registry key on each cluster node:

    Set-ItemProperty -Path “HKLM:\SYSTEM\CurrentControlSet\Services\SMBWitness\Parameters” Flags -Value 1 –Force

    Disabling NetBios over TCP/IP

    I have seen disabling NetBios over TCP/IP speed up failover times. To disable NetBios over TCP/IP for an interface, do the following in Network Connections:

    · Select the interface you want to modify, right-click and select Properties

    · In interface properties, select Internet Protocol Version 4 (TCP/IPv4) and click Properties

    · In Internet Protocol Version 4 (TCP/IPv4) Properties, click Advanced

    · In Advanced TCP/IP Settings, click the WINS tab

    · On the WINS tab, select the Disable NetBIOS over TCP/IP radio button

    When disabling NetBIOS over TCP/IP it should be configured for all network interfaces on all cluster nodes.

    clip_image034

    Figure 16 – Disable NetBIOS over TCP/IP

    Disable 8.3 name generation

    SMB Transparent Failover does not support cluster disks with 8.3 name generation enabled. In Windows Server 2012 8.3 name generation is disabled by default on any data volumes created. However, if you import volumes created on down-level versions of Windows or by accident create the volume with 8.3 name generation enabled, SMB Transparent Failover will not work. An event will be logged in (Applications and Services Log – Microsoft – Windows – ResumeKeyFilter – Operational) notifying that it failed to attach to the volume because 8.3 name generation is enabled.

    You can use fsutil to query and setting the state of 8.3 name generation system-wide and on individual volumes. You can also use fsutil to remove previously generated short names from a volume.

    Conclusion

    I hope you enjoyed this introduction to SMB Transparent Failover and agree how this feature is useful to provide continued access despite needing to occasionally restart servers when performing software or hardware maintenance or in the unfortunate event where a cluster node fails. Providing continued access to file share during these events is extremely important, especially for workloads such as Microsoft Hyper-V and Microsoft SQL Server.

    I am looking forward to dive into Scale-Out File Server in a future post.

    Claus Joergensen

    Principal Program Manager

    Windows File Server Team