Orchestration Process Execution Delays Caused by Stuck Loop Steps

Symptom

Orchestration processes take excessively long to complete. Loop steps that should reset and re-execute within minutes are taking hours or even days. This affects all orchestration processes in the org, not just a specific one.

Cause

Stale orchestration steps from old processes are consuming queue processing capacity. These steps are stuck in "Ready" or "In Progress" status with errors such as:

CSPOFA.ProcessGraph.ArgumentException: Loop target step could not be found in the process graph: <STEP_ID>

The orchestration poller processes steps in priority order. When stale steps with errors continuously fail and retry, they consume transaction capacity (the org has a configurable limit of standard steps per transaction, e.g., 25). This delays processing of new, legitimate steps.

Resolution

Step 1: Identify stuck orchestration steps

Run the following SOQL query to find steps that may be blocking the queue:

SELECT LastModifiedDate, Id, Name, CSPOFA__Status__c,
  CSPOFA__Type__c,
  CSPOFA__Orchestration_Process__r.CSPOFA__Priority__c,
  CreatedDate
FROM CSPOFA__Orchestration_Step__c
WHERE CSPOFA__Status__c IN ('Ready', 'In Progress')
  AND CSPOFA__Step_On_Hold__c = false
  AND CSPOFA__Orchestration_Process__r.CSPOFA__Process_On_Hold__c = false
  AND CSPOFA__Orchestration_Process__r.CSPOFA__Processing_Mode__c != 'Foreground'
  AND CSPOFA__Orchestration_Process__r.CSPOFA__State__c = 'ACTIVE'
  AND CSPOFA__Class__r.CSPOFA__Category__c != 'Custom'
  AND CSPOFA__Class__r.Name NOT IN ('Monitor Field')
ORDER BY CSPOFA__Orchestration_Process__r.CSPOFA__Priority__c DESC,
  LastModifiedDate ASC
LIMIT 500

Look for steps that have been in Ready/In Progress for an unusually long time (days or weeks).

Step 2: Check the step history for errors

For each suspicious step, review the step history to identify errors like ProcessGraph.ArgumentException: Loop target step could not be found.

Step 3: Put stale steps on hold

For steps that are clearly stale (old processes, error loops), set CSPOFA__Step_On_Hold__c = true to remove them from the processing queue. This immediately frees up capacity for current processes.

Step 4: Verify improvement

After putting stale steps on hold, monitor the orchestration processes to confirm that loop execution times return to normal.

Step 5: Archive old processes (optional)

Consider archiving the orchestration processes associated with the stale steps to prevent them from reactivating.

Additional Notes

The orchestration poller processes steps per transaction based on the configured limit. Stale steps that continuously fail consume capacity that would otherwise go to legitimate work
This issue typically manifests as a gradual slowdown rather than a sudden failure, making it harder to diagnose
Periodically running the diagnostic query and cleaning up stale steps is recommended as preventive maintenance

Choose files or drag and drop files

ai_automated

Tags:

Was this article helpful?

Yes

Priyanka Bhotika
Posted

Comments

Please sign in to comment