-
Notifications
You must be signed in to change notification settings - Fork 3
fix: remove dedupping the sign requests based on signId #355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -101,13 +101,6 @@ impl SignQueue { | |
self.len() == 0 | ||
} | ||
|
||
fn contains(&self, sign_id: &SignId) -> bool { | ||
self.my_requests | ||
.iter() | ||
.any(|req| req.indexed.id == *sign_id) | ||
|| self.other_requests.contains_key(sign_id) | ||
} | ||
|
||
fn organize_request( | ||
&self, | ||
threshold: usize, | ||
|
@@ -183,14 +176,6 @@ impl SignQueue { | |
other => other, | ||
} | ||
} { | ||
if self.contains(&indexed.id) { | ||
tracing::info!(sign_id = ?indexed.id, "skipping sign request: already in the sign queue"); | ||
continue; | ||
} | ||
crate::metrics::NUM_UNIQUE_SIGN_REQUESTS | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why do we need to remove this metric? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. mostly because I thought we don't care about this. Because the traffic we send for ethereum actually are half dupes, and we would want to respond to them even if they are dupes right? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure I understand your logic. This is signature.rs, it is used not only for Ethereum requests.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah this file is for all sign requests. But this dedup logic causes problem when a duplicate request comes in before its previous request is completely finished. |
||
.with_label_values(&[indexed.chain.as_str(), my_account_id.as_str()]) | ||
.inc(); | ||
|
||
let request = self.organize_request(threshold, &stable, indexed, false); | ||
let is_mine = request.proposer == self.me; | ||
if is_mine { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check seams important. We are planning to have many different indexing solutions with different architecture.
Should we convert it to warn and keep it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reason why I'm removing this logic is because we are currently skipping dupes and not generating signatures for them. But the traffic we receive for eth are half dupes. Also another issue is different nodes receive eth requests at slightly different times, and they often turn out to have different set of requests after dedupping, and that may cause some signatures don't have certain nodes joining protocol because that request is not in their sign queue.
This logic was originally added because for eth, the client will poll for the signId, so if multiple requests came in at almost same time, responding to one of them will have all clients receive the respond event for that signId. But this will make grafana tracking very hard, and it is a waste of our load testing logic as well because we won't respond to the dupe logic anyways.
I think in practice not skipping dupes at all is a cleaner solution and it achieves same experience for the user. So I'm removing this logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are half dupes exactly?
Duplicate requests should not have us end up with a sign queue with different requests from each other node so something is wrong here. It sounds like our sign queue is not well constrained or that eth indexer is not properly constructing reqeusts. Either ways, we should have a unit test for sign queue to track the behavior we're intending here.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so if you look at multichain_sign_requests_count_unique for eth, it is typically around half of multichain_sign_requests_count for eth. I think it's likely when we send eth traffic, we have a lot of retries, and we might end up sending the request with same parameters multiple times, and we will get duplicate requests that won't be handled by nodes.
I think it can happen when a duplicate request comes in before its previous request is completely finished. For example, both node 1 and 2 have request A in their sign queue, node 1 has taken it off the sign queue and started working on it, but node 2 has not received message to join protocol so request A still in node2's sign queue. Now a dupe of request A comes in, and it will be removed as a duplicate for node2, but will be added to sign queue for node1. And when the first request A has finished generation of signature, you can imagine, node 1 will not have the 2nd request A, but node 2 will start request A.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, then I think we should move to the sign queue not removing the request until it's fully completed. We can add additional info like if the request is in progress or not, that way duplicates do not create additional entries in the queue. Additionally, we should also store the completed request sign id in an LruCache or something similar, such that we don't receive the same request after we're done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah that makes sense. But it remains a problem that each node completes the signature protocol at different times, and the same problem will still happen.....
Shall we merge this first and then look into if a better fix is needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you mean it's still a problem if it completes at different times? Wouldn't my suggestion with storing the completed request/sign_id be enough to resolve this? Or is this something else I'm missing.
I would err on the side of keeping the dedup logic since fulfilling and responding to duplicate requests can be expensive, especially on eth