Very slow startup of AppFabric Distributed Cache Service

Damir Dobric Posts

Next talks:

 

    

Follow me on Twitter: #ddobric



 

 

Archives

When working with AppFabric Distributed cache v1.1 (v1.0 does not have this issue) you might experience bad service startup performance. Sometimes you need to simply restart the service or even to reboot the machine.
In both cases service will start up (and also stop) very quickly, but it twill not be really available. In other words,  the service is started, but caching does not work.
Depending on available memory (not cache size) this can take very long time. For example, on machines with 8GB RAM start up can take 10-15 minutes. That means after restart the cache host is not working at least from the point of view of cache consumer (your) application. In development and testing environments when this host is the only one, this might be a night mare.

What is the problem?

First of all it is important to know that this problem exists if you are running any number of ‘N’ nodes (N={1,2,…}. However if you are running more than one node , you will probably never notice this issues. This is because cluster will observe your requests by not which has already previously been started. This is something which you usually never known. For you the system will appear working. But, don’t think that the issue is not there.

The issue is the memory management which has been changed from v1.0 to v1.1. The new version v1.1 is intelligent enough to watch available memory and not to take everything.
Version v1.0 has been taking almost all available memory, which caused other running applications like IIS to crash. After enabling the coexisting cache in Windows Azure this have been changed with v1.1. The bad thing here is that v1.1 is trying now to allocate all required memory on start up. The startup time seems to be depended on available memory. I couldn’t notice this by myself, but some unofficial talks indicate that this might be the case.

How to repro?

Stop or kill caching service and then start it.

Execute in powershell: get-cacheclusterhealth.
This command will run into timeout (2 min.). You can repeat it over and over again. After 10-15 minutes it will work. Until this time is reached, the command will time-out.
You can also try to connect by application with using of Cache Client API and the result will be the same: Time-Out.

How to workaround this?

Unfortunately, if you are running one node only the cache will simply not work for 10-15 minutes.
To workaround this do not stop or kill the service by yourself!!
Use following command-let:

stop-cachecluster

This will perform regular stop of all nodes (hosts) in cluster.

Finally start the cluster with:

start-cachecluster

This will start all nodes (hosts) and cluster will be immediately (15-30 sec.) available.

If your cache service or machine was not gracefully stopped you should start it with start-cachecluster (do not start service by yourself!!!).
This will run cluster immediately.

If you want to automate this the best would be to create following PowerShell script and to start it on ever machine reboot.

 

// Wait on services to be started.
Start-Sleep -s 60

Import-Module DistributedCacheAdministration

// Kill cache process if it is running. Recommended is to set Distributed Cache Service to manual start.
get-process distributedcacheservice | kill –Force

use-cachecluster

// This ensures graceful start of cache cluster. It will start all cache services on all nodes in cluster.
start-cachecluster

Start-Sleep -s 10

// Optional: Fill cache with data
start-process "your program which fills the cache.exe"


Last but not least
There are also some other issues with memory management which are not related to this problem. Sometimes your application might block for a while, because GC is running on you application thread.
If you want to run GC in the background take a look on this KB.

At the moment of writing of this article, newest KB is 4: http://support.microsoft.com/kb/2800726


Posted Aug 13 2013, 07:53 AM by Damir Dobric

Comments

Damir wrote re: Very slow startup of AppFabric Distributed Cache Service
on 10-06-2013 15:00

Please check out more about memory management in this post: blogs.msdn.com/.../windows-server-appfabric-memory-consumption-questions.aspx

developers.de is a .Net Community Blog powered by daenet GmbH.