fix: remove dedupping the sign requests based on signId #355

ppca · 2025-05-06T06:28:15Z

We used to dedup sign requests based on signIds because our eth indexer used to have to go back to the last processed block and resubscribe. Since we do not have resubscribe logic any more, we do not need to dedup

Copilot

Pull Request Overview

This PR removes deduplication of sign requests based on signId, reflecting the updated requirement that subsystems no longer need to filter duplicate requests.

Removed the contains method and its usage in the sign queue
Eliminated the unique sign requests metric from the metrics module

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
chain-signatures/node/src/protocol/signature.rs	Removed deduplication logic from the sign queue
chain-signatures/node/src/metrics.rs	Removed the metric counter for unique sign requests

Comments suppressed due to low confidence (2)

chain-signatures/node/src/protocol/signature.rs:176

[nitpick] Since deduplication is intentionally removed, consider adding a comment near this block to highlight that duplicate requests are now expected and handled elsewhere.

if self.contains(&indexed.id) {

chain-signatures/node/src/metrics.rs:496

With the removal of deduplication logic, update the metric documentation accordingly and verify that any monitoring or alerting systems are adjusted for the removal of this counter.

pub(crate) static NUM_UNIQUE_SIGN_REQUESTS: LazyLock<CounterVec> = LazyLock::new(|| {

volovyks · 2025-05-06T07:46:24Z

chain-signatures/node/src/protocol/signature.rs

-                tracing::info!(sign_id = ?indexed.id, "skipping sign request: already in the sign queue");
-                continue;
-            }
-            crate::metrics::NUM_UNIQUE_SIGN_REQUESTS


Why do we need to remove this metric?

mostly because I thought we don't care about this. Because the traffic we send for ethereum actually are half dupes, and we would want to respond to them even if they are dupes right?

Not sure I understand your logic. This is signature.rs, it is used not only for Ethereum requests.

yeah this file is for all sign requests. But this dedup logic causes problem when a duplicate request comes in before its previous request is completely finished.
For example, both node 1 and 2 have request A in their sign queue, node 1 has taken it off the sign queue and started working on it, but node 2 has not received message to join protocol so request A still in node2's sign queue. Now a dupe of request A comes in, and it will be removed as a duplicate for node2, but will be added to sign queue for node1. And when the first request A has finished generation of signature, you can imagine, node 1 will not have the 2nd request A, but node 2 will start request A.
Since we receive a lot of duplicate eth requests (likely due to retrying on our load test side), this inconsistency becomes more severe for eth requests.

volovyks · 2025-05-06T07:47:08Z

chain-signatures/node/src/protocol/signature.rs

@@ -183,14 +176,6 @@ impl SignQueue {
                other => other,
            }
        } {
-            if self.contains(&indexed.id) {
-                tracing::info!(sign_id = ?indexed.id, "skipping sign request: already in the sign queue");


This check seams important. We are planning to have many different indexing solutions with different architecture.
Should we convert it to warn and keep it?

Reason why I'm removing this logic is because we are currently skipping dupes and not generating signatures for them. But the traffic we receive for eth are half dupes. Also another issue is different nodes receive eth requests at slightly different times, and they often turn out to have different set of requests after dedupping, and that may cause some signatures don't have certain nodes joining protocol because that request is not in their sign queue.
This logic was originally added because for eth, the client will poll for the signId, so if multiple requests came in at almost same time, responding to one of them will have all clients receive the respond event for that signId. But this will make grafana tracking very hard, and it is a waste of our load testing logic as well because we won't respond to the dupe logic anyways.
I think in practice not skipping dupes at all is a cleaner solution and it achieves same experience for the user. So I'm removing this logic.

what are half dupes exactly?

Also another issue is different nodes receive eth requests at slightly different times, and they often turn out to have different set of requests after dedupping, and that may cause some signatures don't have certain nodes joining protocol because that request is not in their sign queue.

Duplicate requests should not have us end up with a sign queue with different requests from each other node so something is wrong here. It sounds like our sign queue is not well constrained or that eth indexer is not properly constructing reqeusts. Either ways, we should have a unit test for sign queue to track the behavior we're intending here.

so if you look at multichain_sign_requests_count_unique for eth, it is typically around half of multichain_sign_requests_count for eth. I think it's likely when we send eth traffic, we have a lot of retries, and we might end up sending the request with same parameters multiple times, and we will get duplicate requests that won't be handled by nodes.

Duplicate requests should not have us end up with a sign queue with different requests from each other node so something is wrong here. It sounds like our sign queue is not well constrained or that eth indexer is not properly constructing reqeusts.

I think it can happen when a duplicate request comes in before its previous request is completely finished. For example, both node 1 and 2 have request A in their sign queue, node 1 has taken it off the sign queue and started working on it, but node 2 has not received message to join protocol so request A still in node2's sign queue. Now a dupe of request A comes in, and it will be removed as a duplicate for node2, but will be added to sign queue for node1. And when the first request A has finished generation of signature, you can imagine, node 1 will not have the 2nd request A, but node 2 will start request A.

I see, then I think we should move to the sign queue not removing the request until it's fully completed. We can add additional info like if the request is in progress or not, that way duplicates do not create additional entries in the queue. Additionally, we should also store the completed request sign id in an LruCache or something similar, such that we don't receive the same request after we're done

yeah that makes sense. But it remains a problem that each node completes the signature protocol at different times, and the same problem will still happen.....
Shall we merge this first and then look into if a better fix is needed?

what do you mean it's still a problem if it completes at different times? Wouldn't my suggestion with storing the completed request/sign_id be enough to resolve this? Or is this something else I'm missing.

I would err on the side of keeping the dedup logic since fulfilling and responding to duplicate requests can be expensive, especially on eth

fix: remove dedupping the sign requests based on signId

7ee3dc6

ppca requested review from volovyks, ChaoticTempest and Copilot May 6, 2025 06:28

Copilot AI reviewed May 6, 2025

View reviewed changes

volovyks reviewed May 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: remove dedupping the sign requests based on signId #355

fix: remove dedupping the sign requests based on signId #355

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fix: remove dedupping the sign requests based on signId #355

Are you sure you want to change the base?

fix: remove dedupping the sign requests based on signId #355

Uh oh!

Conversation

Uh oh!

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!