8000 change ordering of poll queue dao by astelmashenko · Pull Request #515 · conductor-oss/conductor · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

change ordering of poll queue dao #515

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

astelmashenko
Copy link
Contributor

Pull Request type

  • Bugfix
  • Feature
  • Refactoring (no functional changes, no api changes)
  • Build related changes
  • WHOSUSING.md
  • Other (please describe):

Additionl information

Persistance: postgres
Lock: Redis

Changes in this PR

We have a lot of log-running workflows, around 25000. Some of them are waiting for external input in WAIT task. We observed issue that some workflows stuck in a way where it has a completed task and the next after completed tasks never selected for execution.
Investigation showed that PostgresQueueDAO.popMessages used a query to select workflows for scheduling:

        String POP_QUERY =
                "WITH cte AS ("
                        + "    SELECT queue_name, message_id "
                        + "    FROM queue_message "
                        + "    WHERE queue_name = ? "
                        + "      AND popped = false "
                        + "      AND deliver_on <= (current_timestamp + (1000 || ' microseconds')::interval) "
                        + "    ORDER BY deliver_on, priority DESC, created_on "
                        + "    LIMIT ? "
                        + "    FOR UPDATE SKIP LOCKED "
                        + ") "
                        + "UPDATE queue_message "
                        + "   SET popped = true "
                        + "   FROM cte "
                        + "   WHERE queue_message.queue_name = cte.queue_name "
                        + "     AND queue_message.message_id = cte.message_id "
                        + "     AND queue_message.popped = false "
                        + "   RETURNING queue_message.message_id, queue_message.priority, queue_message.payload";

In our case we had around 4000 workflows with priority=10 and others with priority=0. Select queue_message like this:

SELECT message_id, deliver_on, priority FROM queue_message WHERE queue_name = '_deciderQueue' AND popped = false AND deliver_on <= (current_timestamp + (1000 ||' microseconds')::interval) ORDER BY priority DESC, deliver_on, created_on limit 16;

never selected some outdated workflows with priority=0. I did not find in codebase when priority gets changed, maybe it is something from previous versions of conductor.

To fix the issue I had to reset all priorities to 0. And I'm thinking that implementation should be changed so workflows never get stuck and should be ordered by devliver_on first and only then by priority.

What do you think on the issue?

@astelmashenko astelmashenko changed the title changer order by for popping messages, deliver_on first and only then… change ordering of poll queue dao May 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant
0