8000 fix: remove dedupping the sign requests based on signId by ppca · Pull Request #355 · sig-net/mpc · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

fix: remove dedupping the sign requests based on signId #355

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 0 additions & 9 deletions chain-signatures/node/src/metrics.rs
Original file line number Diff line number Diff line change
Expand Up @@ -496,15 +496,6 @@ pub(crate) static PRESIGNATURE_BEFORE_POKE_DELAY: LazyLock<HistogramVec> = LazyL
.unwrap()
});

pub(crate) static NUM_UNIQUE_SIGN_REQUESTS: LazyLock<CounterVec> = LazyLock::new(|| {
try_create_counter_vec(
"multichain_sign_requests_count_unique",
"number of multichain sign requests, marked by sign requests indexed and deduped",
&["chain", "node_account_id"],
)
.unwrap()
});

pub(crate) static PRESIGNATURE_ACCRUED_WAIT_DELAY: LazyLock<HistogramVec> = LazyLock::new(|| {
try_create_histogram_vec(
"multichain_presignature_accrued_wait_delay_ms",
Expand Down
15 changes: 0 additions & 15 deletions chain-signatures/node/src/protocol/signature.rs
Original file line number Diff line number Diff line change
Expand Up @@ -101,13 +101,6 @@ impl SignQueue {
self.len() == 0
}

fn contains(&self, sign_id: &SignId) -> bool {
self.my_requests
.iter()
.any(|req| req.indexed.id == *sign_id)
|| self.other_requests.contains_key(sign_id)
}

fn organize_request(
&self,
threshold: usize,
Expand Down Expand Up @@ -183,14 +176,6 @@ impl SignQueue {
other => other,
}
} {
if self.contains(&indexed.id) {
tracing::info!(sign_id = ?indexed.id, "skipping sign request: already in the sign queue");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check seams important. We are planning to have many different indexing solutions with different architecture.
Should we convert it to warn and keep it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reason why I'm removing this logic is because we are currently skipping dupes and not generating signatures for them. But the traffic we receive for eth are half dupes. Also another issue is different nodes receive eth requests at slightly different times, and they often turn out to have different set of requests after dedupping, and that may cause some signatures don't have certain nodes joining protocol because that request is not in their sign queue.
This logic was originally added because for eth, the client will poll for the signId, so if multiple requests came in at almost same time, responding to one of them will have all clients receive the respond event for that signId. But this will make grafana tracking very hard, and it is a waste of our load testing logic as well because we won't respond to the dupe logic anyways.
I think in practice not skipping dupes at all is a cleaner solution and it achieves same experience for the user. So I'm removing this logic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are half dupes exactly?

Also another issue is different nodes receive eth requests at slightly different times, and they often turn out to have different set of requests after dedupping, and that may cause some signatures don't have certain nodes joining protocol because that request is not in their sign queue.

Duplicate requests should not have us end up with a sign queue with different requests from each other node so something is wrong here. It sounds like our sign queue is not well constrained or that eth indexer is not properly constructing reqeusts. Either ways, we should have a unit test for sign queue to track the behavior we're intending here.

Copy link
Contributor Author
@ppca ppca May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so if you look at multichain_sign_requests_count_unique for eth, it is typically around half of multichain_sign_requests_count for eth. I think it's likely when we send eth traffic, we have a lot of retries, and we might end up sending the request with same parameters multiple times, and we will get duplicate requests that won't be handled by nodes.

Duplicate requests should not have us end up with a sign queue with different requests from each other node so something is wrong here. It sounds like our sign queue is not well constrained or that eth indexer is not properly constructing reqeusts.

I think it can happen when a duplicate request comes in before its previous request is completely finished. For example, both node 1 and 2 have request A in their sign queue, node 1 has taken it off the sign queue and started working on it, but node 2 has not received message to join protocol so request A still in node2's sign queue. Now a dupe of request A comes in, and it will be removed as a duplicate for node2, but will be added to sign queue for node1. And when the first request A has finished generation of signature, you can imagine, node 1 will not have the 2nd request A, but node 2 will start request A.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, then I think we should move to the sign queue not removing the request until it's fully completed. We can add additional info like if the request is in progress or not, that way duplicates do not create additional entries in the queue. Additionally, we should also store the completed request sign id in an LruCache or something similar, such that we don't receive the same request after we're done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that makes sense. But it remains a problem that each node completes the signature protocol at different times, and the same problem will still happen.....
Shall we merge this first and then look into if a better fix is needed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean it's still a problem if it completes at different times? Wouldn't my suggestion with storing the completed request/sign_id be enough to resolve this? Or is this something else I'm missing.

I would err on the side of keeping the dedup logic since fulfilling and responding to duplicate requests can be expensive, especially on eth

continue;
}
crate::metrics::NUM_UNIQUE_SIGN_REQUESTS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to remove this metric?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly because I thought we don't care about this. Because the traffic we send for ethereum actually are half dupes, and we would want to respond to them even if they are dupes right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand your logic. This is signature.rs, it is used not only for Ethereum requests.

Copy link
Contributor Author
@ppca ppca May 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this file is for all sign requests. But this dedup logic causes problem when a duplicate request comes in before its previous request is completely finished.
For example, both node 1 and 2 have request A in their sign queue, node 1 has taken it off the sign queue and started working on it, but node 2 has not received message to join protocol so request A still in node2's sign queue. Now a dupe of request A comes in, and it will be removed as a duplicate for node2, but will be added to sign queue for node1. And when the first request A has finished generation of signature, you can imagine, node 1 will not have the 2nd request A, but node 2 will start request A.
Since we receive a lot of duplicate eth requests (likely due to retrying on our load test side), this inconsistency becomes more severe for eth requests.

.with_label_values(&[indexed.chain.as_str(), my_account_id.as_str()])
.inc();

let request = self.organize_request(threshold, &stable, indexed, false);
let is_mine = request.proposer == self.me;
if is_mine {
Expand Down
Loading
0