8000 Use the first sister message as a Heartbeat, too by peci1 · Pull Request #93 · ros/bond_core · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Use the first sister message as a Heartbeat, too #93

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 25, 2025

Conversation

peci1
Copy link
@peci1 peci1 commented Apr 11, 2023

I've noticed that when the sister is killed quickly after being started, there is (very much repeatable) chance to deadlock the other Bond. Consider this sequence:

  1. Sister sends her first status message. My bond executes AwaitingSister->SisterAlive() which calls Connect(). Connect() stops the connection timer.
  2. Sister dies before sending her second message.
  3. My bond is now in Alive state. However, Heartbeat() has not yet been called (that would be done by the second message which did not come), so heartbeat_timer_ is still off and does not trigger the timeout event.

The fix I sent fixes it on the C++ side. I'm not sure about the Python side, and not sure about the SM source code. The Heartbeat() needs to be called already in the Alive state (it is undefined in AwaitingSister). Quickly skimming through the SM syntax, it doesn't seem to me it would support calling some functions before the transition and some after it.

If that would be the case, I'd suggest calling Heartbeat in the AwaitingSister state (right after calling Connected()) and defining it for AwaitingSister state to do the same as for the Alive state. That would require no additional changes to the SM definition, so it might be preferred.

I'll let the maintainers decide which approach would be better.

@peci1
Copy link
Author
peci1 commented Apr 11, 2023

A practical example where this can happen is when a nodelet is unloaded very shortly after being loaded.

@peci1
Copy link
Author
peci1 commented Jun 15, 2024

Could this please get a review?

@mjcarroll mjcarroll self-assigned this Jun 17, 2024
@mjcarroll mjcarroll self-requested a review June 17, 2024 14:04
@sloretz sloretz changed the base branch from kinetic-devel to noetic-devel April 2, 2025 20:59
Copy link
@sloretz sloretz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR!

@sloretz sloretz merged commit c6298c2 into ros:noetic-devel Apr 25, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0