Offline Files and Continuous Availability: the monstrous union you should not consecrate

Hi all, Ned here again with a quick chat about mixing Continuous Availability and Offline Files. As you know, we have several public docs recommending against combining CA and Client Side Caching (aka CSC aka Offline Files) because when users attempt to go offline, it will take up to six minutes. This usually leads to unhappy humans and applications. Today I’ll explain more and give you some options.

The inherent problem

CA was designed for the Scale-out File Server workload, and it provides both disk write-through guarantees and “transparent failover”, where a client re-attaches to file handles after a cluster failover, thanks to a resume key filter running on the server. This means that applications like Hyper-V and SQL Server continue to dish out their virtual machines and databases when a storage cluster node reboots. You can enable CA on non-SOFS shares in a cluster, and you can use end-user applications like Word with them, but for a variety of reasons, it’s not something we recommend. On a standalone file server, you cannot configure it at all. Windows Server 2012 set the precedent of enabling CA by default on all clustered shares; something I now regret but cannot change.

CSC was designed for branch office and mobile users back when networks were hilarious. A user could cache their unstructured data locally and synchronize with a file server over SMB. By Windows 7 and 8, it was a pretty decent system, with background sync and offline functionality that allowed a user to seamlessly roam while IT got centralized backups.

The root cause of the issues between CSC and CA? That’s easy: we wrote Offline Files in 1998 and Continuous Availability in 2010. They are products of very different networks, clients, and strategies – but all laid on top of a single protocol family, SMB. They were never designed to interoperate. Heck, we didn’t find the 6-minute timeout issue – a customer did, more than two years after release. Offline transitions did not happen quickly enough and applications saw long hangs when trying to access an unreachable share, or the opposite, where data was saved to the local cache instead of being durably persisted on the server. The experience is crummy.

They were justifiably… displeased.

image
Or maybe he just tried one of Rick Claus’ beers?

The difference between Windows 8 and Windows 10

Windows 8.1 and Windows 10 support SMB3, allowing them to utilize CA, transparent failover, and Scale-out File Servers. But due to the aforementioned timeout problem that many customers reported, we decided in Windows 10 to fix the glitch.

Starting in Win10, when you connect to a CA-enabled share, there is no longer an option to use offline files. No matter the settings, files will not cache and the user will not run into timeout problems.

image

When you mark a share as “continuously available” you are essentially committing to the following contract:

  1. The SMB server must always be available.
  2. Any data written to the server needs durable persistence on disk and must be resilient to disk failures.
  3. The network between the client and the server is expected to be fault tolerant and high speed.

The CA feature tries to hide transient failures at any of the above three interfaces from applications by holding and resuming handles. Remember, it’s for high-throughput, high-availability, high-IO, mission critical server applications like SQL Server and Hyper-V. #3 above directly conflicts with “offline files” – which assumes flakey, slow, and intermittent network connectivity.

Naturally, this decision to change the behavior in Win10 may not make your day, which leads me to:

A variety of alternatives

  • Use Work Folders. The future is definitely Work Folders. CSC hasn’t had a feature update since Windows 8, and that should be a strong sign that it probably never will. We have moved into a new phase, where users want to sync their data from PCs, phones, and tablets – and not necessarily all running Windows nor SMB. Work Folders brings all that to the table, and is actively under development and accepting feedback. Heck, Jane the Work Folders PM wrote about it constantly. She never sleeps.
  • Just use CA. If you are looking for data consistency and transparent failover for non-mobile users, stop using CSC. Disable it on all your CA shares using Server Manager, Failover Cluster Manager or Set-SmbShare:

image

image

  • Just use CSC. If you are looking for mobile user and crumbling network support, stop using CA. Disable it on your shares using Server Manager, Failover Cluster Manager, Set-SmbShare, or Explorer:

image

image

  • Use a two-share combination. If you have a hybrid set of mobile users and desktop users all accessing the same data, nothing is stopping you from creating two shares to the same data – one with CA enabled and one with CSC enabled. Then your users can select the share that matches their needs.

image

imageimage

The future

We heard the Windows 8.1 feedback and changed the behavior in Windows 10 to stop the timeout issue. We then had feedback from Windows 10 customers that wanted it back the old way. So we decided to add this ability back into a future release of Windows 10; look for it here in group policy:

Software\Policies\Microsoft\Windows\LanmanWorkstation\AllowOfflineFilesforCAShares

When you have hundreds of millions of Windows computers, it’s a tricky balancing act to please everyone anyone. But this is proof that we are always listening and adjusting.

Finally

Hibbert: You know, isn’t it interesting how the left – or sinister – twin is invariably the evil one. I had this theory that… Wait a minute. Hugo’s scar is on the wrong side. He couldn’t have been the evil left twin. That means the evil twin is, and always has been… Bart!

Bart: Oh, don’t look so shocked.

Hibbert: Well, chalk this one up to carelessness on my part.

    – Pobody’s Nerfect, “Treehouse of Horror VII”, The Simpsons

Until next time,

– Ned “Hugo” Pyle