-
-
Notifications
You must be signed in to change notification settings - Fork 759
Workflow finishes too early when parallel branches are running #6316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Not necessarily a bug, but rather you expecting it to do something it's not designed to do without explicit instructions. You can try to play around with "JOIN" on the "check_step". If I structure my workflow like this:
And put a "join" on the task 3, it waits for both parallel branches to finish before executing. If I don't put a "join" on it, task 3 executes when whichever parallel branch finishes the first. Reference: https://docs.stackstorm.com/orquesta/languages/orquesta.html#task-transition-model |
Problem is that these two (or more) branches should finish on a final state (sometimes the same, sometimes different ones) without waiting for the other(s). I don't want a join state as from the first check_step execution, the two branches have to run separately. From the docs I understand that Orquesta engine waits for the end of currently running tasks in parallel branches before exiting (in case of fail, the fail-fast design I suppose) but there's no fail here, and I can't see anything about ending of workflow in case of success with more than one branch running (and without a join). I may be wrong, but I suppose that instead of "Orquesta engine exits the workflow once all parallel branches are in succeeded state or one of them is in failed state" we have "Orquesta engine exits the workflow once one parallel branch is in succeeded or in failed state" |
I altered the main workflow to include some output when the
The delay in the
We can see the step looping every 2 seconds from 22:51:05 to 22:51:13. A second step loop begins at 22:51:55 which coincides with the exit from the
What can be seen is that action count is 0 in execution The workflow only increments the action counter which leads me to think the current execution context is being overwritten with stale data when the parallel task completes. I suspect this is a bug in Orquesta, not StackStorm core, but needs to be investigated to be confirmed. These tests were run on
|
Despite my opinion that this is a bug, @fdrab is correct in pointing out that this is expected behaviour which is documented here https://docs.stackstorm.com/orquesta/context.html#assignment-scope
This basically means, you'll need to rethink how you're controlling the flow of the workflow using something other than the context. It could be using an external data source such as redis/consul/etcd or whatever best fits the use case. Some sort of locking will need to be used to ensure consistency when updating the |
I don't understand why this
As far as I understand, a join naturally happens when all branches are completed to create the workflow global context used for output. It's not the same case here as I don't do any explicit join so there shouldn't be any merge until both branches finish. Moreover in your previous example, the execution doesn't seem to reproduce the behaviour as all your |
I tried with your timers ( |
I re-ran the workflow with
The display shows
|
It means that one of your That's the point : I suppose that Orquesta ends workflow too early as one branch (and here action) is still running but the first one has finished. By doing this, some actions expected to run on the second branch doesn't execute and the workflow is not fully completed from user pov. |
I see what you mean about the expected execution of 5 sleep_action steps. I dug deeper into the Orquesta code to see what's going on. It uses networkx under the hood to build a Multi-edge Directed Graph. When the workflow yaml is processed, the following multi-set of Nodes/Vertices and Edges are produced NODES/VERTICES:
EDGES:
As I understand it (I may be wrong), the edges correspond to a linear evaluation of the workflow where the first evaluation of check step creates a pair of edges for
As a workaround, I suggest not using the pattern of returning to the same node ( I'll keep investigating but I'm not sure I'll have enough free time to reach a full resolution to this issue. |
I also dug a bit into Orquesta too this morning ^^' {
'directed': True,
'multigraph': True,
'graph': [],
'nodes': [
{'id': 'entrypoint'},
{'id': 'check_step'},
{'id': 'sleep_action'},
{'id': 'sleep_wf'}
],
'adjacency':
[
[
{
'criteria': [],
'ref': 0,
'id': 'check_step',
'key': 0
}
],
[
{
'criteria': ['<% ctx(nextstep) = "step1" %>'],
'ref': 0,
'id': 'sleep_action',
'key': 0
},
{
'criteria': ['<% ctx(nextstep) = "step2" %>'],
'ref': 1,
'id': 'sleep_action',
'key': 1
},
{
'criteria': ['<% ctx(nextstep) = "step3" %>'],
'ref': 2,
'id': 'sleep_action',
'key': 2
},
{
'criteria': ['<% ctx(nextstep) = "step1" %>'],
'ref': 0,
'id': 'sleep_wf',
'key': 0
}
],
[
{
'criteria': [],
'ref': 0,
'id': 'check_step',
'key': 0
}
],
[
{
'criteria': [],
'ref': 0,
'id': 'check_step',
'key': 0
}
]
]
} However it seems that a MultiDiGraph can support parallel edges, that's why I'm a bit surprised that Orquesta looses this feature. Found on Networkx doc website :
As a beginner with Orquesta engine, I didn't find where the kind of join/barrier is done. >>> wf_spec.tasks.is_split_task("check_step")
True
>>> wf_spec.tasks.in_cycle("check_step")
True Because of this check on line 64 # Determine if the task is a split task and if it is in a cycle. If the task is a
# split task, keep track of where the split(s) occurs.
if wf_spec.tasks.is_split_task(task_name) and not wf_spec.tasks.in_cycle(task_name):
splits.append(task_name)
if splits:
wf_graph.update_task(task_name, splits=splits) To me, it seems that Orquesta doesn't want a cycle on a split task (maybe to limit infinite parallel branch creation). |
I think I would use sub workflows for something like your example workflow. A task that explicitly starts 2 branches ( Note: this is quirky behavior in orquesta. It would be nice to fix this and other issues at some point. For now, hopefully this gives you a way to do what you need to with orquesta as it is today. |
SUMMARY
While running our workflows, we found out that some that are supposed to branch under some conditions don't execute until the expected last task on the second branch.
As we created a loop in the workflow, we don't know if it's a real bug or if we have improper use of orquesta workflow engine.
In the last case, some docs may be missing as there's no warning regarding loops on tasks in orquesta engine.
STACKSTORM VERSION
st2 --version
: st2 3.5.0, on Python 3.6.8OS, environment, install method
Running on CentOS Linux release 7.6.1810 (Core) and installed manually (with rpm + dependencies) following installation docs.
Steps to reproduce the problem
Here are two simple workflows to reproduce the problem :
tester_bug.yaml
sleep_wf.yaml
And their associated metadatas :
mdt_tester_bug.yaml
mdt_sleep_wf.yaml
Expected Results
StackStorm should return a result only when both branches in
tester_bug.yaml
are finished.Actual Results
Instead of returning when both branches are finished, StackStorm terminates workflow when one of them is finished and kills the other.
Example output of this behaviour :
We see that a sleep action (not even the last one, only the one before the last) is running while StackStorm already returned a successful result).
The text was updated successfully, but these errors were encountered: