-
Notifications
You must be signed in to change notification settings - Fork 931
[Java] Make AeronCluster track connection status. #1791
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Java] Make AeronCluster track connection status. #1791
Conversation
aeron-cluster/src/main/java/io/aeron/cluster/client/AeronCluster.java
Outdated
Show resolved
Hide resolved
aeron-cluster/src/main/java/io/aeron/cluster/client/AeronCluster.java
Outdated
Show resolved
Hide resolved
* @return this for a fluent API. | ||
*/ | ||
public Context newLeaderTimeoutNs(final long newLeaderTimeoutNs) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much work would it be to embed the leader heartbeat timeout into the detail string of the login response or add it as an extra field (SessionEvent in cluster codec). It would be nice not to require the extra configuration. (Not a stopper for this fix, but would be good if it would fit into the next release).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll have a look. For the detail string do you mean something like leaderHeartbeatTimeoutNs=5000|foo=bar
? Having a field would be nicer, but in this case the codec is shared with other events, so for them it would be a waste of space, especially if we decide to add more information in future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my preference would be for an additional field. The only place this is used is session open/close so the redundant information wouldn't be too much overhead. Having thought about it more I think that a separate field is better than encoding it in a string. SBE has support for adding fields message versioning and compatibility where as arbitrary string encoding does not.
SBE should handle this (due to block length being encoded on the message) so older clients should safely ignore this field (but we can test for that).
aeron-cluster/src/main/java/io/aeron/cluster/client/AeronCluster.java
Outdated
Show resolved
Hide resolved
Overall looks good, right direction. |
Currently, AeronCluster does very little to track its connection status. All applications that want to react to, for example, losing network connectivity to the leader or the cluster shutting down, need to implement custom logic. This change provides such logic out of the box. If connection is lost for more than newLeaderTimeoutNs, or a message can't be sent to a new leader for messageTimeoutNs, the client will close itself. For applications which call pollEgress/controlledPollEgress and check isClosed afterward no changes are needed. For applications which use the ingress publication or the egress subscription directly, two methods were added: trackIngressPublicationResult and pollStateChanges. For the new leader timeout we currently use twice the default leader heartbeat timeout, in future we would ideally make the cluster announce the actual value. Changed the semantics of pollEgress/controlledPollEgress return value, hopefully not a breaking change. The state diagram is: stateDiagram-v2 [*] --> CONNECTED CONNECTED --> AWAIT_NEW_LEADER : ingress publication NOT_CONNECTED CONNECTED --> AWAIT_NEW_LEADER : ingress publication CLOSED CONNECTED --> PENDING_CLOSE : ingress publication MAX_POSITION_EXCEEDED CONNECTED --> AWAIT_NEW_LEADER : egress image closed CONNECTED --> AWAIT_NEW_LEADER_CONNECTION : NewLeaderEvent CONNECTED --> PENDING_CLOSE : SessionEvent CLOSED CONNECTED --> CLOSED : close AWAIT_NEW_LEADER --> AWAIT_NEW_LEADER_CONNECTION : NewLeaderEvent AWAIT_NEW_LEADER --> CLOSED : timeout AWAIT_NEW_LEADER --> CLOSED : close AWAIT_NEW_LEADER_CONNECTION --> CONNECTED : ingress offer successful AWAIT_NEW_LEADER_CONNECTION --> CLOSED : timeout AWAIT_NEW_LEADER_CONNECTION --> CLOSED : close PENDING_CLOSE --> CLOSED CLOSED --> [*]
c6ed271
to
a9d85ce
Compare
Only the |
Currently, AeronCluster does very little to track its connection status. All applications that want to react to, for example, losing network connectivity to the leader or the cluster shutting down, need to implement custom logic. This change provides such logic out of the box.
If connection is lost for more than newLeaderTimeoutNs, or a message can't be sent to a new leader for newLeaderConnectionTimeoutNs, the client will close itself.
For applications which call pollEgress/controlledPollEgress and check isClosed afterward no changes are needed. For applications which use the ingress publication or the egress subscription directly, two methods were added: trackIngressPublicationResult and pollStateChanges.
For the new leader timeout we currently use twice the default leader heartbeat timeout, in future we would ideally make the cluster announce the actual value.
Changed the semantics of pollEgress/controlledPollEgress return value, hopefully not a breaking change.
The state diagram is: