web
You’re offline. This is a read only version of the page.
close
Skip to main content
Community site session details

Community site session details

Session Id :

Monitoring batch workloads with Application Insights

Kenny Saelen Profile Picture Kenny Saelen

Introduction

In Dynamics 365 Finance & Supply Chain Management we have the ability to use the Monitoring and Telemetry feature to send application telemetry to Microsoft Application Insights. If you are new to this feature, please visit the following documentation link to get an overview of the capabilities and how to get started.
Monitoring and Telemetry with Microsoft Dynamics 365 Finance and Microsoft Dynamics 365 Supply Chain Management | Microsoft Learn

Every organization that is using Dynamics 365 Finance & Supply Chain Management today has critical workloads running in the Batch framework. Having telemetry on the execution of these batch jobs is essential to the operations teams administering and monitoring these workloads. Starting from version 10.0.45 (7.0.7690.21 PU69), we are adding this additional capability to monitor batch workloads via Application Insights. In addition, we are also providing a backport for version 10.0.44 (7.0.7606.126 PU68).

Enable batch telemetry

To get started with batch monitoring, the following flights need to be enabled:
On developer machines you can add the flights to the SysFlighting table. Please note that this is currently in private preview. You will only be able to test this in sandboxes when this enters public preview. The current expected timeline for Public Preview is September 2025, but this is subject to change based on the outcomes of the Private Preview.

Once the flighting configuration is in place, new parameters will be visible on the Monitoring and Telemetry parameters form in the Configure tab page.



Querying the batch telemetry

If you are new to using Application Insights please first have a look at the following article to get you started with KQL queries to review telemetry: Analyze and monitor telemetry with KQL - Finance & Operations | Dynamics 365 | Microsoft Learn .

We have currently added the following signals for you to query. All of the queries you find below are already packaged into a sample dashboard which is available through our Github repository. More details on downloading and using the sample dashboard can be found further in this post in the ‘FastTrack sample accelerator dashboard for batch’ section.

Batch start and stop times

Provides information about the starting and completion time for batch jobs.  This is a crucial part of batch telemetry as it provides a way of calculating how long batch jobs are taking to complete. Consider the following query that captures the start times for batch jobs, includes the completion times based on the same ActivityId and provides the completion times.

customEvents
| where     timestamp between (_startTime .. _endTime)
| where     name in ("BatchTaskStart","BatchTaskFinished","BatchTaskFailure")
| extend    CustomDimensionsParsed = parse_json(customDimensions)
| extend    InfoMessageParsed   = parse_json(tostring(CustomDimensionsParsed.InfoMessage))
| extend    ActivityId          = tostring(CustomDimensionsParsed.activityId)
| extend    ClassName           = tostring(CustomDimensionsParsed.ClassName)
| extend    BatchJobId          = tostring(CustomDimensionsParsed.BatchJobId)
| extend    BatchJobTaskId1      = tostring(CustomDimensionsParsed.BatchJobTaskId)
| extend    BatchJobTaskId2      = tostring(CustomDimensionsParsed.BatchTaskId)
| extend    BatchJobTaskId      = iif(isnotempty(BatchJobTaskId1), BatchJobTaskId1, BatchJobTaskId2)
| extend    StartTime  = iff(name == "BatchTaskStart"   , timestamp, datetime(null))
| extend    EndTime    = iff(name == "BatchTaskFinished", timestamp, datetime(null))
| extend    ErrorTime  = iff(name == "BatchTaskFailure" , timestamp, datetime(null))
| extend    RetryCount = iff(name == "BatchTaskStart"   , InfoMessageParsed.RetryCount, "")
| project   StartTime, EndTime,ErrorTime,RetryCount, ActivityId, ClassName, BatchJobId, RoleInstance = cloud_RoleInstance, BatchJobTaskId
| where     isempty(_batchJobId) or tostring(BatchJobId) in (_batchJobId)
| summarize   StartTime         = min(StartTime),
              CompletionTime    = max(EndTime),
              ErrorTime         = max(ErrorTime),
              RetryCount        = take_any(RetryCount),
              ClassName         = any(ClassName),
              BatchJobId        = any(BatchJobId),
              BatchJobTaskId    = any(BatchJobTaskId),
              RoleInstance      = any(RoleInstance)
by ActivityId
| where isnotempty(StartTime)
| extend BatchAOS = strcat(split(RoleInstance, "-")[0])
| project ActivityId, BatchAOS, BatchJobId, BatchJobTaskId, ClassName, StartTime, CompletionTime, ElapsedTime = CompletionTime - StartTime, RetryCount



Batch throttling

Provides information about throttling for batch workloads. This enables customers to troubleshoot whether batch jobs were throttled, and how the system metrics were during during throttling. (CPU, Memory, SQL DTU). The following query can be used to get the running tasks and throttled tasks over time.

let _scale = '10m';
customEvents
| where     timestamp between (_startTime .. _endTime)
| where     name in ("BatchThrottled", " BatchTaskStart")
| extend    customDimensionsParsed = parse_json(customDimensions)
| extend    batchJobId  = customDimensionsParsed.BatchJobId
| where     isempty(_batchJobId) or tostring(batchJobId) in (_batchJobId)
| summarize ThrottledTasks = countif(name == "BatchThrottled"), RunningTasks = countif(name  == " BatchTaskStart") by TimeSum=bin(timestamp, totimespan(_scale))
| project TimeSum, RunningTasks, ThrottledTasks
| order by TimeSum asc


Batch threads

Provides information about the currently running threads. This allows customers to identify whether a batch job did not start because of the lack of available threads on the batch AOS instances. As an example, let’s query the thread information to understand what the available threads are to process batch workloads.

customEvents
| where     timestamp between (_startTime .. _endTime)
| where     name == "BatchThreadInfo"
| extend    customDimensionsParsed  = parse_json(customDimensions)
| extend    infoMessageParsed       = parse_json(tostring(customDimensionsParsed.InfoMessage))
| parse     customDimensionsParsed.InfoMessage with * "GetCountCurrentBatchTasks " CountConcurrentBatchTasks ", TaskQueueCount " TaskQueueCount ", GetMaxConcurrentBatchTasks " maxBatchThreads ", GetReservedNumberOfThreads " numOfReservedBatchThreads "."
| project     timestamp
            , roleInstance              = cloud_RoleInstance
            , usedThreads               = CountConcurrentBatchTasks
            , maxBatchThreads           = maxBatchThreads
            , numOfReservedBatchThreads = numOfReservedBatchThreads
| project   timestamp, Batch = tostring(split(roleInstance, "-")[0]), AvailableThreads = todecimal(maxBatchThreads) - todecimal(usedThreads) - todecimal(numOfReservedBatchThreads)


Rendering this on a timeline provides us with a good view on when there are no available threads for processing.


Batch failures

This will provide additional information when a certain batch job or task cannot be scheduled correctly. This is on top of the already existing error information coming from the Infolog where we have a correlation to the originating batch job. For example, to get a list of batch failures including call stack information, run the following query:

customEvents
| where     timestamp between (_startTime .. _endTime)
| where     name in ("BatchTaskFailure")
| extend    CustomDimensionsParsed = parse_json(customDimensions)
| extend    BatchJobCaption     = tostring(CustomDimensionsParsed.BatchJobCaption)
| extend    ActivityId          = tostring(CustomDimensionsParsed.activityId)
| extend    ClassName           = iif((tostring(CustomDimensionsParsed.ClassName) == "<empty>"), "", tostring(CustomDimensionsParsed.ClassName))
| extend    BatchJobId          = tostring(CustomDimensionsParsed.BatchJobId)
| extend    BatchTaskId         = tostring(CustomDimensionsParsed.BatchTaskId)
| extend    EventMessage        = tostring(CustomDimensionsParsed.EventMessage)
| extend    ExceptionType       = tostring(CustomDimensionsParsed.ExceptionType)
| extend    ExceptionMessage    = tostring(CustomDimensionsParsed.ExceptionMessage)
| extend    CallStack           = tostring(CustomDimensionsParsed.CallStack)
| where     isempty(_batchJobId) or tostring(BatchJobId) in (_batchJobId)
| project   TimeStamp = timestamp, RoleInstance = cloud_RoleInstance, ActivityId, BatchJobCaption, ClassName, BatchJobId, BatchTaskId, EventMessage, ExceptionType, ExceptionMessage, CallStack


Batch queue

Provides information about the current queue sizes for different queues in the Priority Based Scheduling framework. Use the following query to give a timeline overview of the queue sizes.

customEvents
| where     timestamp between (_startTime .. _endTime)
| where     name == " BatchPBSQueuesAndBuffersSizes"
| extend    CustomDimensionsParsed = parse_json(customDimensions)
| extend    InfoMessageParsed = parse_json(tostring(CustomDimensionsParsed.InfoMessage))
| extend    BatchLowSchedulingQueue                 = toint(InfoMessageParsed.BatchLowSchedulingQueue)
| extend    BatchNormalSchedulingQueue              = toint(InfoMessageParsed.BatchNormalSchedulingQueue)
| extend    BatchHighSchedulingQueue                = toint(InfoMessageParsed.BatchHighSchedulingQueue)
| extend    BatchCriticalSchedulingQueue            = toint(InfoMessageParsed.BatchCriticalSchedulingQueue)
| extend    BatchReservedCapacitySchedulingQueue    = toint(InfoMessageParsed.BatchReservedCapacitySchedulingQueue)
| extend    ReadyTasksBuffer                        = toint(InfoMessageParsed.ReadyTasksBuffer)
| extend    ReadyTasksBufferWithPriorities          = toint(InfoMessageParsed.ReadyTasksBufferWithPriorities)
| project timestamp, BatchLowSchedulingQueue, BatchNormalSchedulingQueue, BatchHighSchedulingQueue, BatchReservedCapacitySchedulingQueue, ReadyTasksBuffer, ReadyTasksBufferWithPriorities



FastTrack sample accelerator dashboard for batch

With the FastTrack team, we have created an accelerator dashboard that you can download and use immediately in your test environments.



How to import the sample dashboard in Azure Data Explorer

Always import the dashboard into a non-production environment first, validate that the visualizations and queries align with your organization's data model and monitoring requirements, and only then promote it to your production Application Insights workspace.
  1. Download the latest batch dashboard release from Github. Go to the repository starting page Dynamics 365 FastTrack FSCM Telemetry Samples. On the right side of the main page, you can find the release to download.


  2. On the release page, you can find the assets and select the D365FSCM-Monitoring-Dashboard-Batch-v1.0.0.0.zip archive to download and extract.
  1. Open the release zip file and locate the ADE-Dashboard-D365FO-Monitoring-Batch.json file in the package.


  2. Import the file in Azure Data Explorer.
  1. Name the dashboard appropriately and then click to select Datasources.
  2. In the Datasources selection pane, input your Azure Application Insights subscription ID in the placeholder:
  3. After updating the correct subscription ID, click on connect.
  4. You will get a list of databases. Select your Application Insights name from the list and save the changes.
  5. Your dashboard should have data now. Feel free to edit the queries to suit your needs.

Please don’t hesitate to  share your feedback and ideas for the dashboard evolution using the post comments or by contacting us at D365AppInsights@microsoft.com.
In addition to the dashboard discussed in this blog post, there are several other sample dashboards available on the GitHub repository.
/**
* SAMPLE CODE NOTICE
*
* THIS SAMPLE CODE IS MADE AVAILABLE AS IS.  MICROSOFT MAKES NO WARRANTIES, WHETHER EXPRESS OR IMPLIED,
* OF FITNESS FOR A PARTICULAR PURPOSE, OF ACCURACY OR COMPLETENESS OF RESPONSES, OF RESULTS, OR CONDITIONS OF MERCHANTABILITY.
* THE ENTIRE RISK OF THE USE OR THE RESULTS FROM THE USE OF THIS SAMPLE CODE REMAINS WITH THE USER.
* NO TECHNICAL SUPPORT IS PROVIDED.  YOU MAY NOT DISTRIBUTE THIS CODE UNLESS YOU HAVE A LICENSE AGREEMENT WITH MICROSOFT THAT ALLOWS YOU TO DO SO.
*/


Comments