Azure WebJob with IDLE_TIMEOUT abort

If you are reading this, you have probably successfully implemented and tested your job, which now suddenly simply fails.
I have found many posts related to this issue and figured out that none of them explain the real problem behind it.
Service Control Manager is a component, which takes a care of a life cycle of the container (host) of your webjob.
It implements some kind of life-tick to the job. If the job is not detected as running for 120 seconds, job is declared as aborted and you will get following error:

Command 'cmd /c ""MyJob.Webjob ...' was aborted due to no output nor CPU activity for 121 seconds. You can increase the SCM_COMMAND_IDLE_TIMEOUT app setting (or WEBJOBS_IDLE_TIMEOUT if this is a WebJob) if needed.
cmd /c ""MyJob.Webjob.StuffJob.exe""

This error happens if the job uses TimerTrigger.

public static void RunMe([TimerTrigger("0 */1 * * * *")] TimerInfo timer, ILogger logger)

There are two reasons why your job will not get back positive life-tick.

  1. The job is long-running
  2. The job is not running

If the job is long-running one, it might be (is) running in the background, but SCM does not know this. In this case you have to set WEBJOBS_IDLE_TIMEOUT and/or SCM_COMMAND_IDLE_TIMEOUT (simply set both of them) on some reasonable time. For example, if your job is running 7 minutes in average, set these values on 10 minutes. Note values are set in seconds.
Additionally, even better is to implement simple writing to log, which job is running. This is also good for operator of the job, not only for SCM.

But, what if your job is not long-running one? This is exactly a part, which is not well documented. This is when the job is not running. It sounds strange, but it is very easy to understand. If the App Service plan is set on free it means that "Always-Running" feature is not supported. This is in general not a problem, but.
Assume, you set your job timer is set on 1 or 2 minutes. You trigger it manually and let it run. All will work fine.
Now, you set timer on 3 minutes. Job will start and then stop for 3 minutes. This one minute is greater than IDDLE_TIMEOUT of 2 minutes (120 sec.), so SCM will declare a job as none-running and set it on aborted state, with error provided above.

Recap

To conclude all this, there are few ways to fix this error. It is not any magic.
If the job is long-running use WEBJOBS_IDLE_TIMEOUT and SCM_COMMAND_IDLE_TIMEOUT as described above.

If the job is not long-running, it should have scheduled timers less than 2 minutes, which will probably work well for testing only.

Finally, the ultimate solution is to use Basic or Standard offering of AppPlan.
In that case you can ENABLE Always On to keep the container loaded all the time.
However, WEBJOBS_IDLE_TIMEOUT and SCM_COMMAND_IDLE_TIMEOUT must also be set as described above. Continuous WebJobs or of WebJobs triggered using a CRON (TimerTrigger) expression without Allways On, WILL NOT run reliably.


comments powered by Disqus