8000 [Bug]: ServerCheckJob causing frequent MaxAttemptsExceededException - coolify-redis growing and write size increasing · Issue #5741 · coollabsio/coolify · GitHub 8000
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
[Bug]: ServerCheckJob causing frequent MaxAttemptsExceededException - coolify-redis growing and write size increasing #5741
Closed
@ErikPetersenDev

Description

@ErikPetersenDev

Error Message and Logs

Background:

  • On 4/28 I updated to beta.413 after probably a month or more of not updating (not sure prior version).
  • On 4/29 and 4/30 I noticed that my server disk write throughput was going up linearly, starting from around when I updated Coolify. Other than Coolify, the only other service is a postgres DB which is not in active use.
  • I'm currently on beta.413 but the issue is continuing.
  • I traced the issue and determined the following: I'm getting MaxAttemptsExceededException from ServerCheckJob repeatedly. When the error occurs it is saved in coolify-redis. The redis instance has persistence set to 20 seconds. After two days I have over 2500 entries, almost all from the exception, since the exception appears to expire in about 7 days. This means the size of the persistence save every 20 seconds continues to grow linearly as the number of exceptions in Redis climbs.

This Discord thread has more details about how I found this, screenshots, etc.: https://discord.com/channels/459365938081431553/1367146983084658799/1367146983084658799

Possible related commit:
Andras shared a commit from within the past couple weeks that may be related: b78f2cc

At this point there are a few issues:

  1. What is happening with ServerCheckJob in the first place. Is it failing or just timing out? Is there something I can look into to figure it out what's happening with that job?
  2. The failing job's exception is stored for one week, so I'm getting thousands of duplicate exception messages in coolify-redis which is causing disk usage by redis to go up linearly. Can this be adjusted?
  3. The commit linked above may be related to the MaxAttemptsExceededException since it altered the behavior of overlapping jobs.

Steps to Reproduce

I don't have explicit steps to reproduce. I updated to the latest version and noticed the issue because my write throughput started climbing on my server monitoring.

Example Repository URL

No response

Coolify Version

v4.0.0-beta.415

Are you using Coolify Cloud?

No (self-hosted)

Operating System and Version (self-hosted)

Ubuntu 24.04.2 LTS

Additional Information

I'd ultimately like to find and correct the cause of ServerCheckJob failing, but I'm not sure if that is an independent issue or if it's failing/retrying in the first place due to a bug.

There's also a potential issue related to continuously adding duplicate exception messages to redis and saving them for a week in memory and persisting to disk every 20 seconds, since one error like this could start to have a large impact depending on how fast the exceptions are being saved.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0