Symptom
Orchestration processes take excessively long to complete. Loop steps that should reset and re-execute within minutes are taking hours or even days. This affects all orchestration processes in the org, not just a specific one.
Cause
Stale orchestration steps from old processes are consuming queue processing capacity. These steps are stuck in "Ready" or "In Progress" status with errors such as:
CSPOFA.ProcessGraph.ArgumentException: Loop target step could not be found in the process graph: <STEP_ID>
The orchestration poller processes steps in priority order. When stale steps with errors continuously fail and retry, they consume transaction capacity (the org has a configurable limit of standard steps per transaction, e.g., 25). This delays processing of new, legitimate steps.
Resolution
Step 1: Identify stuck orchestration steps
Run the following SOQL query to find steps that may be blocking the queue:
SELECT LastModifiedDate, Id, Name, CSPOFA__Status__c,
CSPOFA__Type__c,
CSPOFA__Orchestration_Process__r.CSPOFA__Priority__c,
CreatedDate
FROM CSPOFA__Orchestration_Step__c
WHERE CSPOFA__Status__c IN ('Ready', 'In Progress')
AND CSPOFA__Step_On_Hold__c = false
AND CSPOFA__Orchestration_Process__r.CSPOFA__Process_On_Hold__c = false
AND CSPOFA__Orchestration_Process__r.CSPOFA__Processing_Mode__c != 'Foreground'
AND CSPOFA__Orchestration_Process__r.CSPOFA__State__c = 'ACTIVE'
AND CSPOFA__Class__r.CSPOFA__Category__c != 'Custom'
AND CSPOFA__Class__r.Name NOT IN ('Monitor Field')
ORDER BY CSPOFA__Orchestration_Process__r.CSPOFA__Priority__c DESC,
LastModifiedDate ASC
LIMIT 500
Look for steps that have been in Ready/In Progress for an unusually long time (days or weeks).
Step 2: Check the step history for errors
For each suspicious step, review the step history to identify errors like ProcessGraph.ArgumentException: Loop target step could not be found.
Step 3: Put stale steps on hold
For steps that are clearly stale (old processes, error loops), set CSPOFA__Step_On_Hold__c = true to remove them from the processing queue. This immediately frees up capacity for current processes.
Step 4: Verify improvement
After putting stale steps on hold, monitor the orchestration processes to confirm that loop execution times return to normal.
Step 5: Archive old processes (optional)
Consider archiving the orchestration processes associated with the stale steps to prevent them from reactivating.
Additional Notes
- The orchestration poller processes steps per transaction based on the configured limit. Stale steps that continuously fail consume capacity that would otherwise go to legitimate work
- This issue typically manifests as a gradual slowdown rather than a sudden failure, making it harder to diagnose
- Periodically running the diagnostic query and cleaning up stale steps is recommended as preventive maintenance
Priyanka Bhotika
Comments