IotEdge: Optimizing for performance

Reasons for instability of device

When working with Azure IoTEdge you have to be aware of the size of your device. IotEdge is by default optimized for performance. It means it allocates a fair amount of memory for internally used buffers. Unfortunately this approach is not the best choice on small devices like for example PI.
If you run such, by default optimized code on small device, you will for sure experience instability.

In this case EdgeHub might simply crash from time to time. Following is a typical error, which can be found in the log of edgeHub module.

Unhandled Exception: System.NullReferenceException: Object reference not set to an instance of an object.
   at Microsoft.Azure.Devices.Client.InternalClient.<>c.<ApplyTimeout>b__61_1(Task t)
   at System.Threading.Tasks.ContinuationTaskFromTask.InnerInvoke()
   at System.Threading.Tasks.Task.<>c.<.cctor>b__278_1(Object obj)
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location where exception was thrown ---
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot)
--- End of stack trace from previous location where exception was thrown ---
   at Microsoft.Azure.Devices.Client.InternalClient.OnReceiveEventMessageCalled(String input, Message message)
   at Microsoft.Azure.Devices.Client.Transport.AmqpTransportHandler.ProcessReceivedEventMessage(AmqpMessage amqpMessage)
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.<>c.<ThrowAsync>b__7_1(Object state)
   at System.Threading.QueueUserWorkItemCallbackDefaultContext.<>c.<.cctor>b__5_0(Object state)
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location where exception was thrown ---
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
   at System.Threading.QueueUserWorkItemCallbackDefaultContext.ExecuteWorkItem()
   at System.Threading.ThreadPoolWorkQueue.Dispatch()
   at System.Threading._ThreadPoolWaitCallback.PerformWaitCallback()

Another instability might be unexpected disconnect of modules. Here is an example:

Module ai-02-arm32v7/fncrecognizer is not connected

Once this happen, module will not receive any routed message and most likely edgeHub will not be able to route messages to IotHub

When some module enters such instable state, you will see module log like following one:

iotedge logs -f fncrecognizer

Note: fncrecognizer is the name of my module.

2018-09-02 17:42:13.594 +00:00 [INF] - Installing intermediate certificates.
2018-09-02 17:42:14.165 +00:00 [INF] - Attempting to connect to IoT Hub for client ai-02-arm32v7/$edgeHub via AMQP...
2018-09-02 17:42:14.513 +00:00 [INF] - Connected to IoT Hub for client ai-02-arm32v7/$edgeHub via AMQP, with client operation timeout 60000.
2018-09-02 17:42:14.558 +00:00 [INF] - Created cloud connection for client ai-02-arm32v7/$edgeHub
2018-09-02 17:42:14.565 +00:00 [INF] - New cloud connection created for device ai-02-arm32v7/$edgeHub
2018-09-02 17:42:14.601 +00:00 [INF] - Initializing configuration
2018-09-02 17:42:20.477 +00:00 [INF] - Created persistent store at /tmp/edgeHub
2018-09-02 17:42:20.628 +00:00 [INF] - Created new message store
2018-09-02 17:42:20.631 +00:00 [INF] - Started task to cleanup processed and stale messages
2018-09-02 17:42:20.998 +00:00 [INF] - New device connection for device ai-02-arm32v7/$edgeHub
2018-09-02 17:42:26.343 +00:00 [INF] - Exiting disconnected state
2018-09-02 17:42:26.432 +00:00 [INF] - Entering connected state
2018-09-02 17:42:28.456 +00:00 [INF] - Obtained edge hub config from module twin
2018-09-02 17:42:29.179 +00:00 [INF] - Set the following 2 route(s) in edge hub
2018-09-02 17:42:29.185 +00:00 [INF] - sinusgenFncRecognizer: FROM /messages/modules/sinusgen/outputs/output1 INTO BrokeredEndpoint("/modules/fncrecognizer/inputs/input1")
2018-09-02 17:42:29.186 +00:00 [INF] - fncrecognizerToIoTHub: FROM /messages/modules/fncrecognizer/outputs/anomalyoutput INTO $upstream
2018-09-02 17:42:29.192 +00:00 [INF] - Updated message store TTL to 7200 seconds
2018-09-02 17:42:29.194 +00:00 [INF] - Updated the edge hub store and forward configuration
2018-09-02 17:42:29.197 +00:00 [INF] - Initialized edge hub configuration
2018-09-02 17:42:29.694 +00:00 [INF] - Starting protocol heads - (MQTT, AMQP, HTTP)
2018-09-02 17:42:29.720 +00:00 [INF] - Starting MQTT head
2018-09-02 17:42:30.395 +00:00 [INF] - Initializing TLS endpoint on port 8883 for MQTT head.
2018-09-02 17:42:30.731 +00:00 [INF] - Starting AMQP head
2018-09-02 17:42:30.776 +00:00 [INF] - Started MQTT head
2018-09-02 17:42:31.349 +00:00 [INF] - Started AMQP head
2018-09-02 17:42:31.363 +00:00 [INF] - Starting HTTP head
2018-09-02 17:42:31.916 +00:00 [INF] - User profile is available. Using '"/home/edgehubuser/.aspnet/DataProtection-Keys"' as key repository; keys will not be encrypted at rest.
2018-09-02 17:42:34.018 +00:00 [WRN] - Overriding address(es) '"http://+:80"'. Binding to endpoints defined in "UseKestrel()" instead.
2018-09-02 17:42:34.160 +00:00 [INF] - Started HTTP head
2018-09-02 17:42:34.453 +00:00 [INF] - Attempting to connect to IoT Hub for client ai-02-arm32v7/sinusgen via AMQP...
2018-09-02 17:42:34.466 +00:00 [INF] - Connected to IoT Hub for client ai-02-arm32v7/sinusgen via AMQP, with client operation timeout 60000.
2018-09-02 17:42:34.468 +00:00 [INF] - Created cloud connection for client ai-02-arm32v7/sinusgen
2018-09-02 17:42:34.468 +00:00 [INF] - New cloud connection created for device ai-02-arm32v7/sinusgen
2018-09-02 17:42:34.556 +00:00 [INF] - New token requested by client ai-02-arm32v7/sinusgen, but using existing token as it is usable.
2018-09-02 17:42:34.950 +00:00 [INF] - New device connection for device ai-02-arm32v7/sinusgen
2018-09-02 17:42:35.052 +00:00 [INF] - Bind device proxy for device ai-02-arm32v7/sinusgen
2018-09-02 17:42:35.058 +00:00 [INF] - Initialized AMQP connection handler for ai-02-arm32v7/sinusgen
2018-09-02 17:42:35.120 +00:00 [INF] - Opened link Events for ai-02-arm32v7/sinusgen
2018-09-02 17:42:36.007 +00:00 [ERR] - Module ai-02-arm32v7/fncrecognizer is not connected
2018-09-02 17:42:37.066 +00:00 [ERR] - Module ai-02-arm32v7/fncrecognizer is not connected
2018-09-02 17:42:39.356 +00:00 [ERR] - Module ai-02-arm32v7/fncrecognizer is not connected
2018-09-02 17:42:42.939 +00:00 [ERR] - Module ai-02-arm32v7/fncrecognizer is not connected
2018-09-02 17:42:51.768 +00:00 [ERR] - Module ai-02-arm32v7/fncrecognizer is not connected
2018-09-02 17:43:07.334 +00:00 [ERR] - Module ai-02-arm32v7/fncrecognizer is not connected
2018-09-02 17:43:39.786 +00:00 [ERR] - Module ai-02-arm32v7/fncrecognizer is not connected
2018-09-02 17:44:39.792 +00:00 [ERR] - Module ai-02-arm32v7/fncrecognizer is not connected
2018-09-02 17:45:39.795 +00:00 [ERR] - Module ai-02-arm32v7/fncrecognizer is not connected
2018-09-02 17:46:39.796 +00:00 [ERR] - Module ai-02-arm32v7/fncrecognizer is not connected
2018-09-02 17:47:39.805 +00:00 [ERR] - Module ai-02-arm32v7/fncrecognizer is not connected
2018-09-02 17:48:39.806 +00:00 [ERR] - Module ai-02-arm32v7/fncrecognizer is not connected
2018-09-02 17:49:39.815 +00:00 [ERR] - Module ai-02-arm32v7/fncrecognizer is not connected
2018-09-02 17:50:39.816 +00:00 [ERR] - Module ai-02-arm32v7/fncrecognizer is not connected
2018-09-02 17:51:39.817 +00:00 [ERR] - Module ai-02-arm32v7/fncrecognizer is not connected

Here was executed RESTART of the module. After restart all works fine again.

2018-09-02 17:52:09.860 +00:00 [INF] - Attempting to connect to IoT Hub for client ai-02-arm32v7/fncrecognizer via AMQP...
2018-09-02 17:52:09.882 +00:00 [INF] - Connected to IoT Hub for client ai-02-arm32v7/fncrecognizer via AMQP, with client operation timeout 60000.
2018-09-02 17:52:09.883 +00:00 [INF] - Created cloud connection for client ai-02-arm32v7/fncrecognizer
2018-09-02 17:52:09.883 +00:00 [INF] - New cloud connection created for device ai-02-arm32v7/fncrecognizer
2018-09-02 17:52:09.885 +00:00 [INF] - New token requested by client ai-02-arm32v7/fncrecognizer, but using existing token as it is usable.
2018-09-02 17:52:10.299 +00:00 [INF] - New device connection for device ai-02-arm32v7/fncrecognizer
2018-09-02 17:52:10.361 +00:00 [INF] - Bind device proxy for device ai-02-arm32v7/fncrecognizer
2018-09-02 17:52:10.361 +00:00 [INF] - Initialized AMQP connection handler for ai-02-arm32v7/fncrecognizer
2018-09-02 17:52:10.361 +00:00 [INF] - Opened link Events for ai-02-arm32v7/fncrecognizer
2018-09-02 17:52:10.663 +00:00 [INF] - Opened link ModuleMessages for ai-02-arm32v7/fncrecognizer

How to make it stable?

To solve this problem set environment variable OptimizeForPerformance of module edgeHub to false.
This will automatically request lower memory allocation on cost of performance.

You can do it directly in a portal or or configuration.

Set optimization in Portal

Following image shows how to set performance optimization in Azure Portal under Set Modules | "Configure advanced Edge runtime settings".

369_optforperf

Set optimization in deployment file

Following configuration snippet shows how to set performance optimization inside of deploymnet.template.json file, which can be found under root folder of modules solution in VS code.

   "systemModules": {
          "edgeAgent": {
            "type": "docker",
            "settings": {
              "image": "mcr.microsoft.com/azureiotedge-agent:1.0",
              "createOptions": ""
            },
            "env": {
              "OptimizeForPerformance": {
                "value": "false"
              }
            }
          },

Conclusion

IotEdge is by default optimized for performance when running on "well-sized" devices. If you want to run it on small devices (i.e.: PI), you will have to set environment variable OptimizeForPerformance of module edgeHub to false.
If you don't do this, all modules on device might get instable.
Unfortunately, the platform does not provide a clear definition of the "small-sized" device. Because of this it will be very hard to know for which size of device this value has to be set.


comments powered by Disqus