File Server Tip: How to rebalance a Scale-Out File Server using a little PowerShell

A Scale-Out File Server is a cluster role that offers the same SMB file share (or set of shares) on every node of the cluster. As clients come in, they are spread across the nodes using a round-robin mechanism based on DNS. In the common case of having many clients and just a few file server cluster nodes, things even out quite nicely.

You can check which clients are accessing which nodes using the SMB Witness facility in SMB 3.0:

Get-SmbWitnessClient

This cmdlet will show which clients are currently connected to the cluster along with the physical nodes they selected for data access and the other node that acts as an SMB Witness.

There are situations that might cause a cluster to become unbalanced. For instance, if you have a two-node Scale-Out File Server and one node fails, all clients will end up connect to the surviving node.

It's great that SMB Transparent Failover feature will make that event to cause no disruption to any of the clients. However, after you second node recovers, the clients won't automatically move back to the original node.

To remedy this situation, you can use a simple cmdlet to move the SMB client between Scale-Out File Server nodes. Here's how you do it:

Move-SmbWitnessClient -ClientName Client -DestinationNode Node 

You can also combine the two cmdlets mentioned above with the Get-ClusterNode cmdlet to create a PowerShell script to automatically spread all clients evenly across the cluster nodes:

$Clients = Get-SmbWitnessClient | Sort ClientName
$Nodes = Get-ClusterNode | ? State -eq "Up" | Sort Name
$MaxNode = $Nodes.Count
$Node = 0

foreach ($Client in $Clients) {

    $ClientName = $Client.ClientName
    $NodeName = $Nodes[$Node].Name  

    "Moving client " + $ClientName + “ to node “ + $NodeName
    Move-SmbWitnessClient -ClientName $ClientName -DestinationNode $NodeName -Confirm:$false
Start-Sleep -Seconds 5

    $Node++
    if ($Node -ge $MaxNode) {$Node=0}

}

Notes about the script:

  • Only clients with an SMB witness connection will be enumerated and able to move. If you just brought up a failed node in a two-node cluster, it might take a few minutes for the clients to re-establish the witness connections.
  • It will take some time for each actual move to occur. Each SMB clients will be notified that it should move, then it will perform the move lazily. Give a minute or so before checking if the moves happened via Get-SmbWitnessClient.
  • The move is transparent and does not cause any disruption to clients.
  • Make sure to validate the script in a test environment to confirm it's behaving as you expected.

This is the simplest form of this script and there’s are at least a few better ways to do it. For instance, you could take into account the current node for every client and do a minimum number of moves to achieve balance. You could also look at the number of open files per client (using Get-SmbOpenFile) and balance taking that information into account, trying to balance the number of open files per node instead of the number of clients. Those would obviously be a bit more complicated to write. If you invest the time to create a better version of the script, share it in the comments...

 

-----------

 

P.S.: Jeromy Statia from Microsoft IT shared the following script, which he wrote that takes into consideration the open files per node and minimal moves clients to provide a balanced SMB Scale-Out File Server Cluster.

He plans to have this running as a scheduled task which executes once every hour or so. This is definitely a more advanced solution than my script above. Thanks for sharing, Jeromy…

 

$clusterNodes = Get-ClusterNode | ? State -eq "Up" | Sort-Object Name | Select-Object -ExpandProperty Name
Write-Host "Grabbing all witness client information..."

$witnessClientObject = @(Get-SmbWitnessClient | %{
    $clientObj = @{};
    $clientObj['WitnessClient'] = $_;
    $clientObj['OpenFileCount'] = @(Get-SmbOpenFile -ClientUserName "*$($_.ClientName)*").Count;
    New-Object PSObject -Property $clientObj
    } | sort-object OpenFileCount -Descending)

if($witnessClientObject.count -gt 0)
{
    Write-Host "Found $($witnessClientObject.Count) objects"
    $witnessClientObject | ft {$_.witnessclient.ClientName}, {$_.OpenFileCount} -a
    Write-Host "Getting node distribution"
    $distributionOfFiles = @($witnessClientObject | Group-Object {$_.WitnessClient.FileServerNodeName})
    $distributionObjects = @()

    foreach($distribution in $distributionOfFiles)
    {
        $distributionObject = @{}
        $distributionObject['FileServerNodeName'] = $distribution.Name
        $distributionObject['OpenFileCount'] = ($distribution.Group | Measure-Object OpenFileCount -Sum).Sum
        $distributionObject['Clients'] = $distribution.Group
        $distributionObjects += New-Object PSObject -Property $distributionObject
    }

    #add in any cluster nodes that have 0 witness connections

    foreach($unusedClusterNode in ($clusterNodes |? { $name = $_; -not($distributionOfFiles |?{ $_.Name -match $name}) }))
    {
        $distributionObject = @{}
        $distributionObject['FileServerNodeName'] = $unusedClusterNode
        $distributionObject['OpenFileCount'] = 0
        $distributionObject['Clients'] = @()
        $distributionObjects += New-Object PSObject -Property $distributionObject
    }

    #sort by the number of open files per server node

    $sortedDistribution = $distributionObjects | Sort-Object OpenFileCount -Descending
    $sortedDistribution |%{ Write-Host "$($_.FileServerNodeName) - $($_.OpenFileCount)"}
    Write-Host ""
    Write-host "Distribution OpenFileCounts:"
    Write-Host ""

    #Balance where needed

    for($step = 0; $step -lt $sortedDistribution.Count/2; ++$step)
    {
        #Get the difference between the largest and smallest file counts for this step
        #divide by two so we don't flop a single connection back an forth on each run
        $currentFileOpenVariance = [Math]::Ceiling(($sortedDistribution[$step].OpenFileCount - $sortedDistribution[-1 - $step].OpenFileCount)/2)
        Write-Host "Variance for step $($step): $($currentFileOpenVariance)"
        $moveTargets = @()
        $moveOpenFiles = 0

        foreach($client in $sortedDistribution[$step].Clients)
        {
            if($client.OpenFileCount -gt 0)
            {
                $varianceAfterMove = ($moveOpenFiles + $client.OpenFileCount)
                Write-Host "Checking $($varianceAfterMove) to be less than or equal to $($currentFileOpenVariance) to be a move target"
                if($varianceAfterMove -le $currentFileOpenVariance)
                {
                    Write-Host "Client $($client.WitnessClient.ClientName) is a target for move"
                    $moveTargets += $client.WitnessClient
                    $moveOpenFiles += $client.OpenFileCount
                }
            }
        }

        if($moveTargets.Count -gt 0)
        {
            foreach($moveTarget in $moveTargets)
            {
                Write-Host "Moving witness client $($moveTarget.ClientName) to SMB file server node $($sortedDistribution[-1 - $step].FileServerNodeName)"
                Move-SmbWitnessClient -ClientName $moveTarget.ClientName `
                                      -DestinationNode $sortedDistribution[-1 - $step].FileServerNodeName `
                                      -Confirm:$false `
                                      -ErrorAction Continue | Out-Null
            }
        }
        else
        {
            Write-Host "No move targets available"
        }
    }
}

Write-Host "SMB Witness client connections should now be as balanced as possible"