Looking at Activity Queue Times from Azure Data Factory with Log Analytics

I’ve been working on a project to populate an Operational Data Store using Azure Data Factory (ADF). We have been seeking to tune our pipelines so we can import data every 15 minutes. After tuning the queries and adding useful indexes to target databases, we turned our attention to the ADF activity durations and queue times.

Data Factory places the pipeline activities into a queue, where they wait until they can be executed. If your queue time is long, it can mean that the Integration Runtime on which the activity is executing is waiting on resources (CPU, memory, networking, or otherwise), or that you need to increase the concurrent job limit.

You can see queue time in the ADF Monitor by looking at the output of an activity.

Azure Data Factory pipeline execution details in the ADF monitor. The activity output is open, showing the resulting JSON. There is a property called durationInQueue, which contains the queue time in seconds. The result shown is 2 seconds.
Output from a stored procedure activity showing 2 seconds of queue time

But what if you want to see activity queue times across activities, across pipelines, or even across data factories? Then you need to output your logs to somewhere that makes this easier.

Send ADF Logs to Log Analytics

You can output your ADF logs to a storage account, to Log Analytics, to an event hub, or to a partner solution. I prefer Log Analytics because it’s easy to query and look for trends using KQL.

To configure the output to Log Analytics, you must create a Log Analytics workspace (if you don’t have an existing one) and add a diagnostic setting to the data factory resource. Once you have data feeding into Log Analytics, you can query it.

If you choose resource-specific destination tables in the diagnostic setting, you will find a table in Log Analytics called ADFActivityRun. This table contains a column called Output. The Output column contains the JSON we see in the ADF Studio Monitor app.

KQL has functions for parsing JSON and retrieving only the JSON objects I want to include. This means that I could write a query like the following.

ADFActivityRun
| extend queuetime = extractjson('$.durationInQueue.integrationRuntimeQueue',Output, typeof(int))
| where Status == 'Succeeded'
| where queuetime > 0
| project Start, End, PipelineName, ActivityType, ActivityName, dur = datetime_diff('second', End, Start), queuetime, PipelineRunId, ActivityRunId
| sort by queuetime desc

This query gives me a list of activities with successful executions that have queue times greater than zero. I can choose from any columns in the ADFActivityRun table, including the pipeline and activity names, start and end times, activity types, and run IDs. Duration is not an available column so I had to calculate it by calculating the difference between the start and end time. The queue time is buried in the JSON in the Output column, so I used the extractjson function to get the duration in queue value.

Now that I know how to get the queue duration, I can look for trends across various slices of data. A query to get average queue time by activity type might look like the below.

ADFActivityRun
| where Status == 'Succeeded'
| where startofday(Start) == datetime(2022-01-04)
| extend queuetime = extractjson('$.durationInQueue.integrationRuntimeQueue', Output, typeof(int))
| summarize avg_queuetime = avg(queuetime) by ActivityType
| sort by avg_queuetime desc

In this query, I am retrieving activities with successful executions that started on January 4, 2022. I added my calculation to retrieve queue time. Then I calculated average queue time by activity type using the summarize operator and sorted the result descending by queue time.

I could have filtered on any other available activity: pipeline name, activity name, integration runtime, resource ID, and more.

This information is available via the API as well, but a Log Analytics workspace can be spun up and running in minutes without having to write code to connect.

If you are used to writing SQL, the transition to KQL isn’t too bad. Check out the SQL to Kusto query translation page in Microsoft Docs.

Share

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Trust DCAC with your data

Your data systems may be treading water today, but are they prepared for the next phase of your business growth?