Guest Post: Distributed Cache Service in SharePoint 2013

Introduction

A thronging SharePoint 2013 site can generate a lot of new content. The improved social features in SharePoint 2013 can generate dramatic amounts of new, frequently changing content in a large organisation. When you couple this with the ubiquity of this content in SharePoint, this could cause an awful lot more database access and page slowdown than with previous versions.

To the rescue steps forward the Distributed Cache service. Based upon the AppFabric Cache Service, this is a requirement for SharePoint that is installed by the pre-requisite installer as part of your farm setup.

It’s not just social data that benefits from being cached. The Distributed Cache service also caches Newsfeeds, Microblogging, Conversations, Security Trimming, OneNote client access and more! In fact, it even takes away the necessity of farms that use Claims Authentication to implement session affinity load balancing.TechEdChallenge

How to plan for the Distributed Cache Service

First of all, download and read the Distributed Cache planning overview from Microsoft. When you install AppFabric as part of the SharePoint 2013 pre-requisites, it will automatically allocate 10% of the current available RAM to the cache. If you manually install the pre-requisites yourself, make sure you use the /gac switch when you install AppFabric. When you are building your farm, if you find you have already installed AppFabric before you run the pre-requisites installer, it is strongly recommended that you uninstall AppFabric first.

Servers running the Distributed Cache service are referred to as Cache Hosts. Every SharePoint farm needs at least one server running this service. By default, as you build your farm, the Distributed Cache service gets started on each server you join. When you have more than one instance of this service running in a farm, you have a Cache Cluster. In practice, when you have built your farm, you then proceed to switch off the Distributed Cache on any servers that you decide shouldn’t run that service, using the PowerShell cmdlets below.

Each item of data stored by the Distributed Cache is stored once only, and exists only on one server at a time. It’s worth noting that although AppFabric supports high availability, the SharePoint implementation of the Distributed Cache does not. If one of your cache servers dies, the cached items will be lost. In practice this means that performance will be reduced for that data until another Cache Host in the Cluster picks up that data.

Warning – Important Safety Tip

Do not administer the Distributed Cache through the Service window in Administrative Tools under Control Panel, or through the AppFabric for Windows Server application on the Start menu. This could get the Distributed Cache service into a state where you might need to rebuild your farm!

Dedicated or Collocated?

There are two modes in which you can run the Distributed Cache service. You can run it as a dedicated service, with no other SharePoint services running on that server. Alternatively, you can run it collocated with other SharePoint services on the same server. For large scale production use, the recommendation is to have dedicated servers hosting your cache.

Microsoft recommends you avoid starting a Distributed Cache service instance on servers that are already running SQL Server, Search, Excel Services or Project Services.

Network Connectivity

If you plan to have more than one Cache Host, the first server added should be configured to allow inbound ICMPv4 traffic. If you are using Windows Firewall, you can enable this in PowerShell with the Set-NetFirewallRule cmdlet. The name of the rule is “File and Printer SharePoint (Echo request – ICMPv4-In)”. Notice also that it doesn’t take a Boolean ($true), but rather the string “True” as an argument to the -Enabled parameter. Don’t forget to Import-Module NetSecurity first, though!

clip_image002

Starting and Stopping the Service

Once the Distributed Cache service instance is started on any server in your farm, it will become part of your Cache Cluster.

The right way to start the service is with the Add-SPDistributedCacheServiceInstance PowerShell cmdlet. You run this on a SharePoint server you would like to add to your Cache Cluster, which makes the current server a Cache Host. Simply stopping the service instance would cause the contents of the cache on that server to go missing, degrading performance.

If you need to remove a server from the Cache Cluster, the safe way to do this is first to use Stop-SPDistributedCacheServiceInstance with the –Graceful parameter. This transfers any cached data to another server, and can therefore take some time to perform. Afterwards you can safely run Remove-SPDistributedCacheServiceInstance to make the current server a non-Cache Host.

clip_image004

If you get a Health Analyzer Rule violation in Central Administration saying that “The Distributed Cache host may cause cache reliability problems” it is likely that a Distributed Cache service instance has been stopped on a server without removing the server from the Cache Cluster. To resolve this, you can either start the service instance again using the Add-SPDistributedCacheServiceInstance cmdlet, or remove it with Remove-SPDistributedCacheServiceInstance as above.

Memory Management

Getting the memory allocation right is critical to SharePoint performance. We change the amount of memory allocated per server to get this right. If you later change the amount of installed RAM, the Distributed Cache service does not update its memory allocation automatically.

In a small farm with fewer than 10,000 users, Microsoft recommends allocating 1GB of RAM for the Distributed Cache. This can be either a dedicated server or collocated with other SharePoint services, such as the Web Application Service. Beyond this the recommendation is dedicated servers for the cache. A medium farm with fewer than 100,000 users should look to allocated around 2.5GB for the cache, and a large farm with up to 500,000 users should set aside around 12GB of RAM allocated for the cache.

The Distributed Cache service actually uses twice the allocated amount of RAM, using the extra for housekeeping.

Memory Limits

It is a very strong recommendation that you should not allocate more than 16GB to any one Cache Host. This may cause the Cache Service to timeout during housekeeping operations and become unresponsive for several seconds at a time. If you need a cache size of greater than 16GB, it is better to use multiple servers in a Cache Cluster. You can have up to a maximum of 16 hosts in a Cache Cluster.

Memory Allocation

For the large farm example, we would use the Update-SPDistributedCacheSize cmdlet with the –CacheSizeInMB parameter specifying 12 as the amount of RAM to allocate. If you need to find out how much RAM is currently allocated, you can issue the Use-CacheCluster and Get-AFCacheHostConfiguration cmdlets.

clip_image006

Distributed Cache Service Account

When AppFabric is installed as part of the SharePoint pre-requisites, it is configured to run under the credentials of the server farm. This is far from ideal, and will eventually trigger a violation of a Health Analyzer Rule. To avoid this, you can change the account used by the Distributed Cache service. In the example below, we’re retrieving a managed account that has already been registered with our farm, called “CONTOSO\my_managed_account” with the Get-SPManagedAccount cmdlet. We then set that as the ManagedAccount property of the ProcessIdentity object of the Distributed Cache (“AppFabricCachingService”) SPService.

clip_image008

Troubleshooting

It is possible that after invoking the .Deploy() line in the above PowerShell script you will encounter an error such as “TCP port 22234 is already in use.”

clip_image010

Further attempts to work with the cache might also generate errors such as “Specified host is not present in cluster”:

clip_image012

You may even receive error messages saying “cacheHostInfo is null”.

Not to worry! Microsoft has an article on how to repair a broken Cache Host. First you need to get a reference to the broken Distributed Cache service instance, for example by filtering the results from Get-SPServiceInstance passing in the name of the affected host as the Server parameter, and then invoking Delete() on the service instance. Finally, you can restart the service instance with Add-SPDistributedCacheServiceInstance as below:

clip_image014

Conclusion

The Distributed Cache service is an enabler for many of the new social features in SharePoint 2013. We couldn’t have the rapid, almost real-time conversations in SharePoint’s feeds and microblogging features without it. Although it is tricky to configure, the Distributed Cache service is something you need to plan for in your SharePoint 2013 farms, and is best implemented with dedicated servers.

Bonus Material

 

About the Author:

IMG_8323-2Joel Jeffery is a SharePoint 2013 MCSE and Microsoft Certified Trainer.He runs JFDI Phoenix Ltd, a Microsoft Gold Partner based in Brighton, UK. He and his team provide live SharePoint support for admins, devs, architects and end users at SharePoint Doctors . He’s been working with SharePoint and developing systems in .NET for more than a decade. Joel can also be found teaching a series of SharePoint courses for Firebrand Training . Catch up with Joel at JoelBlogs .