The
Distributed Cache (DC) is a new component that is added to
SharePoint 2013. Social networking tools, such as My Sites, and
social content technologies, such as microblogs, activity feeds, news feeds,
authentication tokens etc., are examples of social computing features.
Thus, its one the most critical part for SharePoint 2013 in terms of social
computing.
The Distributed Cache service uses Windows AppFabric caching technology behind the scenes.
The Distributed Cache service uses Windows AppFabric caching technology behind the scenes.
The cache
could consume a ton for memory the application and web servers. While
implementing DC service, there are two modes that could be used:
1. Collocated mode – in this mode, the Distributed
Cache service runs together with other services on the application
server.
2. Dedicated mode – in this mode, all services
other than the Distributed Cache service are stopped on the application
server that runs the Distributed Cache service
Microsoft recommends to use dedicated mode in the SharePoint Farm.Capacity planning is important factor which you will implement in the SharePoint farm.
These are
Microsoft recommends for Distributed Cache capacity:
Deployment size
|
Small farm
|
Medium farm
|
Large farm
|
Total number of users
|
< 10,000
|
< 100,000
|
< 500,000
|
Recommended cache size for the Distributed Cache service
|
1 GB
|
2.5 GB
|
12 GB
|
Total memory
allocation for the Distributed Cache service (double the recommended cache
size above, plus reserve 2 GB for the OS)
|
2 GB
|
5 GB
|
34 GB
|
Recommended architectural
configuration
|
Dedicated server or
co-located on a front-end serve
|
Dedicated server
|
Dedicated server
|
Minimum cache hosts per farm
|
1
|
1
|
2
|
Note: The Distributed Cache service, cache
size should not exceed 16 GB so Microsoft recommend that you use two
servers in a large farm environment.
While implementing
DC, it is better to have dedicated farm even for small farm.
What I found in TechNet, troubleshooting for DC is not very documented. Especially when you run into issues. Fortunately, there are blogs that help in troubleshooting the DC. I have all the references in the end of this blog post.
What I found in TechNet, troubleshooting for DC is not very documented. Especially when you run into issues. Fortunately, there are blogs that help in troubleshooting the DC. I have all the references in the end of this blog post.
My
SharePoint server 2013 farm is as follows:
OS: Windows Server 2012
SharePoint Version:
SharePoint Server 2013 Standard, Build number: 15.0.4420.1017 (RTM)
SQL Server:
SQL Server 2012
OS: Windows Server 2012
SharePoint Version:
SharePoint Server 2013 Standard, Build number: 15.0.4420.1017 (RTM)
SQL Server:
SQL Server 2012
A) App
Server, 8 GB RAM
B) Web Front End 01, 3 GB RAM
C) WEB Front End 02, 3 GB RAM
First things first. I will list down all the Pre requisites for Distributed Cache to function properly so that you do not pull your hair and become frustrated like me! :)
B) Web Front End 01, 3 GB RAM
C) WEB Front End 02, 3 GB RAM
First things first. I will list down all the Pre requisites for Distributed Cache to function properly so that you do not pull your hair and become frustrated like me! :)
- Warning
while setting DC service.
Do not restart the AppFabric Caching in the services console. Microsoft strongly recommends this and if you do this, you might need to rebuild your farm.
- Always use PowerShell the Distributed cache commandlets.
- Firewall Ports
- Distributed Cache requires following high ports. (22233, 22234, 22235, 22236)
Note: If firewall has been opened of above ports, use PowerShell using Distributed Cache Commandlets, the DC ports will opened automatically. - ICMPv4
and ICMPv6 have
to be opened for DC to function properly.
Besides this following ports have to be opened as well:
8, 138, 139, 445 Ports required - Firewalls in the organizationIf the Network topology has 2 – 3 firewalls for SharePoint farm, all Firewalls have to be opened as well.
Search and User Profile requirements - Search: Continuous crawl has to be enabled.
- User Profile: The service account of the application pool of the web application for My Site should have Full Control.
- Use Stop-SPDistributedCacheServiceInstance –Graceful to stop any of the Distributed cache instance for any SharePoint server.
- Assign the Distributed Cache memory when you set up the Distributed cache instance for all SharePoint servers. DC eats memory like crazy and users will complain later on.
- Remote Services to be enabled.
I will
cover both collocated and dedicated modes for DC configuration.
- In collocated configuration, each server in the farm will have DC instance with the STARTED status.
- Whereas in the dedicated configuration, you can choose either one server to be dedicated Distributed Cache servers and other web servers MUST have STOPPED status. The Distributed Cache instance MUST be available in all SharePoint servers.
-----------------------------------------------------------------------------------
Issue
#1 Error: cacheHostInfo is null or removing existing DC instance Remove-SPDistributedCacheServiceInstance
Fix:
Forcefully delete the Distributed Cache Instance as follows:
$instanceName ="SPDistributedCacheService Name=AppFabricCachingService"
Fix:
Forcefully delete the Distributed Cache Instance as follows:
$instanceName ="SPDistributedCacheService Name=AppFabricCachingService"
$serviceInstance
= Get-SPServiceInstance | ? {($_.service.tostring()) -eq $instanceName –and
($_.server.name) -eq "SP2013App"}
$serviceInstance.Delete()
Add-SPDistributedCacheServiceInstance$serviceInstance.Delete()
Issue #2 Error Starting the Distributed instance Cache
While you
provision DC instance you may receive above error.
Fix:
Remove and Add the DC instance.
Fix:
Remove and Add the DC instance.
#Removing
the service from SharePoint on local host.
Stop-SPDistributedCacheServiceInstance
–Graceful Remove-SPDistributedCacheServiceInstance$instanceName
="SPDistributedCacheService Name=AppFabricCachingService"
$serviceInstance
= Get-SPServiceInstance | ? {($_.service.tostring()) -eq $instanceName -and ($_.server.name)
-eq $env:computername}$serviceInstance.delete()
#Add DC Instance
#Add DC Instance
$SPFarm =
Get-SPFarm
$cacheClusterName = "SPDistributedCacheCluster_" + $SPFarm.Id.ToString()
$cacheClusterManager = [Microsoft.SharePoint.DistributedCaching.Utilities.SPDistributedCacheClusterInfoManager]::Local
$cacheClusterName = "SPDistributedCacheCluster_" + $SPFarm.Id.ToString()
$cacheClusterManager = [Microsoft.SharePoint.DistributedCaching.Utilities.SPDistributedCacheClusterInfoManager]::Local
$cacheClusterInfo
= $cacheClusterManager.GetSPDistributedCacheClusterInfo($cacheClusterName);
$instanceName ="SPDistributedCacheService Name=AppFabricCachingService"
$serviceInstance = Get-SPServiceInstance | ? {($_.Service.Tostring()) -eq $instanceName -and ($_.Server.Name) -eq $env:computername}
$serviceInstance.Delete()
Add-SPDistributedCacheServiceInstance
$instanceName ="SPDistributedCacheService Name=AppFabricCachingService"
$serviceInstance = Get-SPServiceInstance | ? {($_.Service.Tostring()) -eq $instanceName -and ($_.Server.Name) -eq $env:computername}
$serviceInstance.Delete()
Add-SPDistributedCacheServiceInstance
Issue
#3 ErrorCode<ERRPS002>:SubStatus<ES0001>:Invalid
provider and connection string read. Please provide the values manually.
Fix:
Somehow,
the connection string has been missing and we need to manually add the database
entry for AppFabric as follows:
a) Run (Windows + R) and enter Regedit
b) HKEY_LOCAL_MACHINE >> SOFTWARE >> MICROSOFT >> AppFabric >> V1.0 >> CONFIGURATION
c) Enter Connection String and Provider as follows:
Connection String:
Data Source=spsql;Initial Catalog=SPFarm_SharePoint_Config;Integrated Security=True;Enlist=False
Provider:
SPDistributedCacheClusterProvider
Then use PowerShell to verify the Distributed Cache
Use-CacheCluster
Get-CacheHost
Issue #4 Page load take 6 seconds.
Unexpected Exception in SPDistributedCachePointerWrapper::InitializeDataCacheFactory for usage 'DistributedViewStateCache' - Exception 'Microsoft.ApplicationServer.Caching.DataCacheException: ErrorCode<ERRCA0017>:SubStatus<ES0006>:There is a temporary failure. Please retry later. (One ormore specified cache servers are unavailable, which could be caused by busy network or servers. For on-premises cache clusters, also verify the following conditions. Ensure that security permission has been granted for this client account, and check that the AppFabric Caching Service is allowed through the firewall on all cache hosts. Also the MaxBufferSize on the server must be greater than or equal to the serialized object size sent from the client.) ---> System.ServiceModel.ProtocolException
Page load took more than 6 seconds in Developer Dashboard as shown:
b) HKEY_LOCAL_MACHINE >> SOFTWARE >> MICROSOFT >> AppFabric >> V1.0 >> CONFIGURATION
c) Enter Connection String and Provider as follows:
Connection String:
Data Source=spsql;Initial Catalog=SPFarm_SharePoint_Config;Integrated Security=True;Enlist=False
Provider:
SPDistributedCacheClusterProvider
Then use PowerShell to verify the Distributed Cache
Use-CacheCluster
Get-CacheHost
Issue #4 Page load take 6 seconds.
Unexpected Exception in SPDistributedCachePointerWrapper::InitializeDataCacheFactory for usage 'DistributedViewStateCache' - Exception 'Microsoft.ApplicationServer.Caching.DataCacheException: ErrorCode<ERRCA0017>:SubStatus<ES0006>:There is a temporary failure. Please retry later. (One ormore specified cache servers are unavailable, which could be caused by busy network or servers. For on-premises cache clusters, also verify the following conditions. Ensure that security permission has been granted for this client account, and check that the AppFabric Caching Service is allowed through the firewall on all cache hosts. Also the MaxBufferSize on the server must be greater than or equal to the serialized object size sent from the client.) ---> System.ServiceModel.ProtocolException
Page load took more than 6 seconds in Developer Dashboard as shown:
and
you can see there is exactly 6 seconds in the developer dashboard.
In my SharePoint environment, I was getting the following errors as all in collocated mode for DC.Fix:
It took more than 4 weeks to find the actual issue for me. To troubleshoot the Distributed cache, we need to know what incorrect settings were in my environment:
As mentioned, I have 3 SharePoint Server 1 Application and 2 web front end.
a) On App Server
Use-CacheCluster
Get-CacheHost
Only APP server status is UP.Apps02: UP
Wfe01: Unknown
Wfe02: Unknown
And other WFE server were showing below errors:
Error: SubStatus(ES0001): Cache host SP13WFE01.contoso.com is not reachable. Error: SubStatus(ES0001): Cache host SP13WFE02.contoso.com is not reachable. b) first Frond End Server
Apps02: Unknown
Wfe01: Down
Wfe02: Unknown
c) Second Frond End Server
Apps02: Unknown
Wfe01: Unknown
Wfe02: Down
In my SharePoint environment, I was getting the following errors as all in collocated mode for DC.Fix:
It took more than 4 weeks to find the actual issue for me. To troubleshoot the Distributed cache, we need to know what incorrect settings were in my environment:
As mentioned, I have 3 SharePoint Server 1 Application and 2 web front end.
a) On App Server
Use-CacheCluster
Get-CacheHost
Only APP server status is UP.Apps02: UP
Wfe01: Unknown
Wfe02: Unknown
And other WFE server were showing below errors:
Error: SubStatus(ES0001): Cache host SP13WFE01.contoso.com is not reachable. Error: SubStatus(ES0001): Cache host SP13WFE02.contoso.com is not reachable. b) first Frond End Server
Apps02: Unknown
Wfe01: Down
Wfe02: Unknown
c) Second Frond End Server
Apps02: Unknown
Wfe01: Unknown
Wfe02: Down
App02
|
Wfe01
|
Wfe02
|
Apps02: UP
Wfe01: Unknown Wfe02: Unknown |
Apps02: Unknown
Wfe01: Down Wfe02: Unknown |
Apps02: Unknown
Wfe01: Unknown Wfe02: Down |
Clearly, each cache host is not able to connect to each other in above errors. So on each SharePoint server, the current server (Apps02) shows UP services status, whereas other WFEs shows UNKNOWN status. Same applies to WFE01 and WFE02. During my troubleshooting, I found if any server has UNKNOWN status, it means some configuration has be fixed.
Collated mode
Step1: Inbound rule for Distributed Cache ports (22233 - 2223) for each server in Firewall.
Perform this for each server.
Now, in my SharePoint farm WFE02 shows these settings
we
have to open Firewall for WFE01 as well.
Step2: Start the Remote services on each server as shown:
Step3: Turn on Ping for all SharePoint servers.
Now, each SharePoint server has server status as UP.
Use-CacheCluster
Get-CacheHost
App Server:
Step2: Start the Remote services on each server as shown:
Step3: Turn on Ping for all SharePoint servers.
Now, each SharePoint server has server status as UP.
Use-CacheCluster
Get-CacheHost
App Server:
This
works perfectly in the collated mode for Distributed Cache.
Verify the page load and in my environment page load took 288.69 milliseconds with Distributed Cache started.
To simulate Dedicated Distributed Cache server, I stopped the DC instance for both the WFEs and only Application server to manage the Distributed Cache instance.
APP02
WFE01
WFE02
I hope this blog post help someone.
Verify the page load and in my environment page load took 288.69 milliseconds with Distributed Cache started.
To simulate Dedicated Distributed Cache server, I stopped the DC instance for both the WFEs and only Application server to manage the Distributed Cache instance.
APP02
WFE01
WFE02
I hope this blog post help someone.
References
1. Plan for feeds and the Distributed Cache service in SharePoint Server 2013
https://technet.microsoft.com/en-us/library/jj219572.aspx?f=255&MSPPError=-2147217396#plandc
1. Plan for feeds and the Distributed Cache service in SharePoint Server 2013
https://technet.microsoft.com/en-us/library/jj219572.aspx?f=255&MSPPError=-2147217396#plandc
2. Manage the Distributed Cache service in SharePoint Server 2013
https://technet.microsoft.com/en-us/library/jj219613.aspx
https://technet.microsoft.com/en-us/library/jj219613.aspx
3. File and Printer Sharing Firewall Rules Explained
http://tritoneco.com/2013/09/18/file-and-printer-sharing-firewall-rules-explained
http://tritoneco.com/2013/09/18/file-and-printer-sharing-firewall-rules-explained
cacheHostInfo is null
4. cacheHostInfo is null 01
http://almondlabs.com/blog/manage-the-distributed-cache/
http://almondlabs.com/blog/manage-the-distributed-cache/
5. cacheHostInfo is null 02
http://sharepointengineer.com/2014/11/04/sharepoint-2013-distributed-cache-cachehostinfo-is-null/
6. cacheHostInfo is null 03
http://strangelittletech.blogspot.com/2014/07/cachehostinfo-is-null-repair.html
Troubleshooting:
7. MSDN Blog 1
http://blogs.msdn.com/b/sambetts/archive/2014/03/19/sharepoint-2013-distributed-cache-appfabric-troubleshooting.aspx
10. Blog 3
http://habaneroconsulting.com/insights/SharePoint-2013-Distributed-Cache-Bug#.Vod2VBV96Um
12. SharePoint 2013 Distributed Cache: Boon or Bane.
http://sharepointjournal.com/tag/invalid-provider-and-connection-string-read-spdistributedcacheserviceinstance/
Articles:
11. THE FIVE MINUTE CHEAT-SHEET ON SHAREPOINT 2013′S DISTRIBUTED CACHE SERVICE
http://blog.idera.com/sharepoint/the-five-minute-cheat-sheet-on-sharepoint-2013s-distributed-cache-service/
http://blog.idera.com/sharepoint/the-five-minute-cheat-sheet-on-sharepoint-2013s-distributed-cache-service/
Social MSDN
12. Distributed Cache Slowing Down All My Farm
https://social.technet.microsoft.com/forums/sharepoint/en-US/7b695bf8-bebd-4c23-9025-f8ea02ba902d/distributed-cache-slowing-down-all-my-farm
13 Dedicated Distributed
Cache Server and ports issue
https://social.msdn.microsoft.com/Forums/office/en-US/31b6b982-d49b-470b-b1e7-ac7e3bd34bdd/dedicated-distributed-cache-server-and-ports-issue?forum=sharepointadmin
https://social.msdn.microsoft.com/Forums/office/en-US/31b6b982-d49b-470b-b1e7-ac7e3bd34bdd/dedicated-distributed-cache-server-and-ports-issue?forum=sharepointadmin