8000 Dropping records during decaton v9 migration due to incompatible retry task format · Issue #250 · line/decaton · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

8000 Dropping records during decaton v9 migration due to incompatible retry task format #250

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
maediy opened this issue May 15, 2025 · 2 comments

Comments

@maediy
Copy link
maediy commented May 15, 2025

Problem Overview

During migration to Decaton v9, we encountered dropping records in retry topics due to deserialization failures. The issue occurs when a processor with decaton.retry.task.in.legacy.format=true produces retry tasks while still having the dt_meta header present, causing a format mismatch when those tasks are later consumed.

Image

Situation

  • decaton-client: v9.1.0
  • decaton-processor: v9.1.0
    • decaton.retry.task.in.legacy.format=true
    • decaton.legacy.parse.fallback.enabled=true

I migrated Decaton according to Case D of the migration guide. After updating the clients to produce v9+ format messages, I encountered the following error logs:

Message:
  Dropping not-deserializable task [topic=xxx-retry partition=0 offset=1234]  

Stack trace:
  java.lang.IllegalArgumentException: com.google.protobuf.InvalidProtocolBufferException: Protocol message had invalid UTF-8.
  ...

Root cause

  • Decaton clients produce tasks with the dt_meta header if their version is v9 or higher.
  • The DecatonTaskRetryQueueingProcessor produces tasks in legacy format if decaton.retry.task.in.legacy.format=true and retains headers, including dt_meta.
  • The Decaton processor attempts to deserialize messages as v9+ format if the dt_meta header is present; otherwise, it uses the legacy format.

This leads to deserialization errors when processing tasks in the retry topic that have the dt_meta header and are in legacy format.

Proposed Solution

Modify DecatonTaskRetryQueueingProcessor to remove the dt_meta header when decaton.retry.task.in.legacy.format=true to prevent the format mismatch.
The fix should be applied at:

@ocadaruma
Copy link
Member
ocadaruma commented May 15, 2025

Thanks for the detailed report.

Ugh, this is indeed a critical bug in migration procedure which we totally overlooked.
Let me address this and publish a fix version soon.

Also I'll re-check if the invariant "dt_meta exists iff the value is stored directly" is always true.

Thanks again.

@ocadaruma
Copy link
Member

Verified if removing dt_meta when decaton.retry.task.in.legacy.format=true is enough and no other overlooked races by TLA+ just in case.
https://gist.github.com/ocadaruma/97b2d06dee07d653be38d26e068d594f

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
0