8000 Automatic Pipeline deduplication · Issue #98 · hadron-project/hadron · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
Automatic Pipeline deduplication #98
Open
@thedodd

Description

@thedodd

Pipelines should deduplicate events from the Source stream as they are instantiated. This would ensure that there are no duplicate Pipeline instances run for a given partition. However:

  • This does not guard against valid retransmission of a root event though, and should not.
  • This does not guard against cases where an event was duplicated across different partitions.

Old proposal (invalid):

In cases where a duplicate root event has been written to the source Stream of a Pipeline, it may be convenient for users to simply return a Skip variant in the response payload, instructing Hadron to cancel the remainder of the Pipeline for that root event.

Is it currently is without this feature, users simply need to model this by returning events of a specific type indicating that the Pipeline overall is a no-op or the like.

To implement this, we will need to make some modeling changes to the way Pipeline instances (an execution of a Pipeline over a specific root event) are modeled.

Currently there is no real state associated with them, they are tracked via metadata offsets and the like. This Pipeline state monitoring will be perfect for monitoring of Pipelines for the future metrics and monitoring UI.

The above is an invalid argument as a Pipeline handler has no way to discern if the duplicate is just a retransmission of the original root event or a new event being processed at a later point which has duplicate identity.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0