Cannot cancel durable timer when waiting for external event with timeout

I am using durable functions to wait for external events with timeouts. Even though one of the events is received before the timeout, a TimerFired event is recognised in dfMon when none should be.

The orchestration’s logic is as follows:

  • orchstration will put a message on a queue which is monitored by an external system
  • this remote system will notify my orchestration via cmdReceived external event if the above queue message was received
  • then it will trigger a long-running local process
  • afterwards it will again notify my orchestration via cmdExecuted external event of the completion of this process

I.e. ->

  1. PUT a message on the queue
  2. timer1: wait 10mins for external event cmdReceived, continue if received, else throw
  3. timer2: wait 60mins for external event cmdExecuted, continue if received, else throw

Now the cmdReceived event is received rather quickly, after 2mins or so. The the next timer should wait for a max of 60mins for cmdExecuted to be received. This usually takes about 12mins or so. By then 10mins (from timer1) have elapsed. And a TimerFired event is logged, however no exception is thrown and the orchestration keeps on running. But I do ask myself whether there is still an impact I am not immediately aware of. The existence of the TimerFired event (timer1) I mean.

This is the relevant line of code:

await ctx.WaitForExternalEvent("cmdReceived", TimeSpan.FromMinutes(10));
var success = await ctx.WaitForExternalEvent<bool>("cmdExecuted", TimeSpan.FromMinutes(60));

My first thought was, that maybe I need to explicitly cancel the timer myself. I’ve read about this here. So then I tried:

var cts = new CancellationTokenSource();
await ctx.WaitForExternalEvent("cmdReceived", TimeSpan.FromMinutes(10), cts.Token);

var cts2 = new CancellationTokenSource();
var success = await ctx.WaitForExternalEvent<bool>("cmdExecuted", TimeSpan.FromMinutes(60), cts2.Token);

However, this didn’t really help. The first timer keeps on firing (though not throwing) after 10mins even after the event for which it was configured has already been received.

See below for a screenshot from dfMon.

enter image description here

Is there anything wrong about this? I find this extra TimerFired event confusing. Does it matter as long as it is (correctly) not throwing an exception?


>Solution :

Short answer: no it does not matter. The event is ignored by Durable Task.

The way that durable timers work with the default Storage durability provider is through scheduled queue messages.
So when you call WaitForExternalEvent, it uses CreateTimer under the hood.
When the replay finishes and Durable Task sees that there is a timer that needs to be started, it sends a scheduled message to one of the control queues.
This message will become visible at the time you specified.

Now when your orchestration receives the external event, that causes a replay, and it’ll replay over the WaitForExternalEvent call.
Here it sees it has a new event for a received event.
This resolves the Task it is waiting on and your orchestration continues.

The next time it replays the timer event arrives.
But the timer gets ignored by processing the external event first.
(Internally it uses TaskCompletionSource.TrySetException() which won’t do anything because SetResult has already been called)
It won’t throw the timeout exception.
Regardless that it did nothing, the event did occur so it is recorded.

Leave a Reply