fix(groups): Group Flush should handle MessageSizeTooLargeError #33585

jose-sequeira · 2025-06-12T09:13:09Z

Important

👉 Stay up-to-date with PostHog coding conventions for a smoother review.

Problem

We had an issue in production, as the flush logic did not handle KafkaMessageTooLarge errors, which are non-transient and should produce an ingestion warning message. This PR fixes this

Changes

Did you write or update any docs for this change?

I've added or updated the docs
I've reached out for help from the docs team
No docs needed for this change

How did you test this code?

greptile-apps

PR Summary

Added error handling for MessageSizeTooLarge errors in group flush logic to prevent infinite retries on oversized Kafka messages.

Modified BatchWritingGroupStore to properly handle MessageSizeTooLarge errors during group updates, generating ingestion warnings instead of retrying
Enhanced promiseRetry utility to support non-retriable errors through new nonRetriableErrorTypes parameter
Added test coverage for MessageSizeTooLarge error scenarios in group property updates
Fixed production issue where large group properties could cause system instability due to continuous retry attempts

_{3 files reviewed, 2 comments}
_{Edit PR Review Bot Settings | Greptile}

plugin-server/src/worker/ingestion/groups/batch-writing-group-store.test.ts

greptile-apps · 2025-06-12T09:13:38Z

plugin-server/src/utils/retries.ts

+            logger.debug('🚫', `failed ${name}, non-retriable error encountered`, { error })
+            return Promise.reject(error)
+        }
+
        logger.debug('🔁', `failed ${name}, retrying`, { error })
        const nextInterval = Math.min(
            retryIntervalMillis * defaultRetryConfig.BACKOFF_FACTOR,
            defaultRetryConfig.MAX_INTERVAL
        )
        await new Promise((resolve) => setTimeout(resolve, retryIntervalMillis))


style: Use the sleep utility function defined in utils.ts for consistency instead of raw Promise timeout

Suggested change

await new Promise((resolve) => setTimeout(resolve, retryIntervalMillis)< 8000 span class="pl-kos">)

await sleep(retryIntervalMillis)

pl

Small comment, but LGTM, feel free to merge.

jose-sequeira · 2025-06-16T14:55:18Z

Small comment, but LGTM, feel free to merge.

Think you may have not posted the comment 😅

pl · 2025-06-16T15:01:00Z

plugin-server/src/utils/retries.ts

 ): Promise<T> {
    if (retries <= 0) {
        logger.error('🚨', `Final retry failure for ${name}`, { previousError })
        return Promise.reject(previousError)
    }
    return fn().catch(async (error) => {
+        // Check if error is non-retriable
+        if (nonRetriableErrorTypes && nonRetriableErrorTypes.some((ErrorType) => error instanceof ErrorType)) {
+            logger.debug('🚫', `failed ${name}, non-retriable error encountered`, { error })


question: wdyt about bumping it to the warn level? Could be useful to see those errors while troubleshooting

pl · 2025-06-16T15:01:27Z

Small comment, but LGTM, feel free to merge.

Think you may have not posted the comment 😅

Oops, don't know where it went - reposted 😅

Group Flush should handle MessageSizeTooLargeError

ece3625

greptile-apps bot reviewed Jun 12, 2025

View reviewed changes

jose-sequeira changed the title ~~Group Flush should handle MessageSizeTooLargeError~~ fix(groups): Group Flush should handle MessageSizeTooLargeError Jun 12, 2025

Fix redudant mock

902d7dc

jose-sequeira requested review from a team June 16, 2025 09:27

pl approved these changes Jun 16, 2025

View reviewed changes

pl reviewed Jun 16, 2025

View reviewed changes

jose-sequeira merged commit 2cf5c5f into master Jun 16, 2025
98 checks passed

jose-sequeira deleted the group-handle-big-properties branch June 16, 2025 16:07

adamleithp pushed a commit that referenced this pull request Jun 17, 2025

fix(groups): Group Flush should handle MessageSizeTooLargeError (#33585)

3a61e24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(groups): Group Flush should handle MessageSizeTooLargeError #33585

fix(groups): Group Flush should handle MessageSizeTooLargeError #33585

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

	await new Promise((resolve) => setTimeout(resolve, retryIntervalMillis)< 8000 span class="pl-kos">)
	await sleep(retryIntervalMillis)

fix(groups): Group Flush should handle MessageSizeTooLargeError #33585

fix(groups): Group Flush should handle MessageSizeTooLargeError #33585

Uh oh!

Conversation

Problem

Changes

Did you write or update any docs for this change?

How did you test this code?

Uh oh!

Choose a reason for hiding this comment

PR Summary

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!