8000 [core] fix applyGroupSequences by gou4shi1 · Pull Request #2735 · Haivision/srt · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[core] fix applyGroupSequences #2735

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from

Conversation

gou4shi1
Copy link
Contributor
@gou4shi1 gou4shi1 commented May 11, 2023

The issue

The log of receiver:

/SRT:RcvQ:w106 D:SRT.gm: applyGroupSequences: @1071584760 gets seq from @1071584834 rcv %787865610 snd %787865661 as GROUPWISE snd-seq
/SRT:RcvQ:w106 D:SRT.gm: @1071584760: synchronizeWithGroup: DERIVED ISN: RCV=%787865661 -> %787865610 (shift by -51) SND=%787865661 -> %787865661 (shift by 0)
/SRT:RcvQ:w136 D:SRT.gm: applyGroupSequences: no socket found connected and transmitting, @1071584722 not changing sequences, storing snd-seq %787865661
/SRT:RcvQ:w136 D:SRT.gm: @1071584722: synchronizeWithGroup: DEFINED ISN: RCV=%787865661 SND=%787865661

The correct recv base seq was %787865610, and then a new member @1071584722 was comming while all other members were broken at the same time.
@1071584722 didn't inherit the correct recv base seq, but defined it's own %787865661

However, at the sender side, when the peer of @1071584722 was coming, the peer of @1071584760 was not yet broken coincidently, so @1071584722 succeeded to inherit the correct send base seq %787865610 from CUDTGroup::m_iLastSchedSeqNo.

As a result, the sender sent from %787865610 while the receiver received from %787865661, the first 51 packets triggered DROPREQ, so only the following packets succeeded to transmit.

The proposed solution 1 (this PR)

CUDTGroup::m_iLastSchedSeqNo is a good design, because members status can't be consistent all the time in both side, we should store group-wise send-base-seq and recv-base-seq. It's defined by the first connected socket, and is updated by the ACK timer (not by every packet, to reduce overhead).

The proposed solution 2

rm this check

if ((!m_config.bRendezvous) && (m_ConnRes.m_iISN != m_iISN)) // secuity check
    e = CUDTException(MJ_SETUP, MN_SECURITY, 0);

so that RESPONDER can just use the CUDTGROUP::m_iLastSchedSeqNo as m_iISN, instead of w_hs.m_iISN, then the whole applyGroupSequences() function can be eliminated.

@gou4shi1 gou4shi1 marked this pull request as draft May 11, 2023 08:30
@gou4shi1 gou4shi1 force-pushed the fix-group-sync branch 3 times, most recently from 774212f to 2b345fa Compare May 11, 2023 09:35

if (was_empty)
{
gp->syncWithSocket(s->core(), HSD_RESPONDER);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To fix the issue, it should keep previous group-wise sequences even if the group is empty.

{
if (m_bConnected) // You are the first one, no need to change.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To fix the issue, it should inherit previous group-wise sequences even if the group is not connected (empty).

@@ -263,7 +201,6 @@ CUDTGroup::CUDTGroup(SRT_GROUP_TYPE gtype)
, m_bSynSending(true)
, m_bTsbPd(true)
, m_bTLPktDrop(true)
, m_iTsbPdDelay_us(0)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was not used.

// number will collide with any ISN provided by a socket.
// Also since now every socket will derive this ISN.
m_iLastSchedSeqNo = generateISN();
resetInitialRxSequence();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To fix the issue, it should keep previous group-wise sequences even if the group is empty.

@gou4shi1 gou4shi1 marked this pull request as ready for review May 11, 2023 10:02
@maxsharabayko maxsharabayko added this to the v1.5.2 milestone May 11, 2023
@maxsharabayko maxsharabayko added Type: Bug Indicates an unexpected problem or unintended behavior [core] Area: Changes in SRT library core labels May 11, 2023
@codecov-commenter
Copy link

Codecov Report

Merging #2735 (4c68203) into master (3cefede) will decrease coverage by 0.74%.
The diff coverage is 90.47%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

@@            Coverage Diff             @@
##           master    #2735      +/-   ##
==========================================
- Coverage   67.51%   66.78%   -0.74%     
==========================================
  Files          99       99              
  Lines       20166    20148      -18     
==========================================
- Hits        13616    13456     -160     
- Misses       6550     6692     +142     
Impacted Files Coverage Δ
srtcore/group.cpp 38.11% <84.61%> (+0.34%) ⬆️
srtcore/core.cpp 60.83% <100.00%> (-1.40%) ⬇️
srtcore/group.h 60.68% <100.00%> (-0.54%) ⬇️

... and 11 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@maxsharabayko maxsharabayko modified the milestones: v1.5.2, v1.6.0 Jun 29, 2023
@ethouris
Copy link
Collaborator

I'm sorry, but I completely don't understand the fix.

The idea for applyGroupSequences was to derive the settings from the socket to the group, if it was the first socket, or to forcefully apply the settings that the group already has otherwise.

If the problem was that the first member was broken and it caused inconsistency between connection sides, maybe it could be done differently: the ISN in case of group could be created already by a group when connecting the first socket (there is a possibility to enforce ISN when connecting) so that every socket will come in with exactly the same ISN, no matter which one succeeds as the first connection and therefore the first group member.

@ethouris
Copy link
Collaborator

Just BTW - I found #2444 that mentions a kind of this problem in the description (Problem 2). Might you be able to check if this solves the same problem or is anyhow related?

@ethouris
Copy link
Collaborator
ethouris commented Sep 6, 2024

Just FYI - I added a test #3020 to try to test the described situation. If this test doesn't represent the problem you have described in this PR, please try to improve it.

If I understand the problem correctly, that is, it came from a problem that a new empty group could derive ISN from the first socket, which in case of failure is dismissed, then this behavior is changed: the ISN for the group is set initially to a generated value, which is declared in the constructor. I believe this comment here directly refers to this problem:

    // Set this data immediately during creation before
    // two or more sockets start arguing about it.
    m_iLastSchedSeqNo = CUDT::generateISN();

@maxsharabayko maxsharabayko modified the milestones: v1.6.0, v1.5.4 Sep 9, 2024
@maxsharabayko
Copy link
Collaborator

@ethouris

If I understand the problem correctly, that is, it came from a problem that a new empty group could derive ISN from the first socket, which in case of failure is dismissed, then this behavior is changed: the ISN for the group is set initially to a generated value, which is declared in the constructor. I believe this comment here directly refers to this problem:

// Set this data immediately during creation before
// two or more sockets start arguing about it.
m_iLastSchedSeqNo = CUDT::generateISN();

This initialization was done in #1438 (SRT v1.4.2). I suppose the reported problem is on the receiver side 🤔

@maxsharabayko maxsharabayko modified the milestones: v1.5.4, v1.6.0 Sep 9, 2024
@ethouris ethouris added the help wanted Indicates that a maintainer wants help on an issue or pull request label Mar 13, 2025
@ethouris
Copy link
Collaborator

This PR cannot be merged at the moment. Changes are unclear as to what has been removed towards the base and why. Refreshing needed.

@ethouris ethouris mentioned this pull request Mar 13, 2025
31 tasks
@ethouris ethouris changed the base branch from master to dev March 18, 2025 16:40
@ethouris
Copy link
Collaborator

@gou4shi1 Would you be able to refresh this PR?

NOTE: this will be nontrivial, to say the least. Due to various circumstances, this fix was finally postponed to 1.6.0.

Conditions and CM rules have slightly changed, however. Multiple PRs have been merged into the intermediate branch, which's view is represented by #3141. Some have also affected changes you provided here.

The problem is mainly with the function srt::CUDT::dropToGroupRecvBase(), which has been completely removed; I failed to track which PR did it. Removal was likely because of some changes; #2527 is one of suspected, but there is this branch removed, too.

It would be helpful if you can provide somehow these changes. Could be also helpful if you provide some procedure that allows to reproduce the problem.

NOTE: Currently we use the dev branch, which should be the base branch for all PRs that are going to go to the next ongoing release (not the currently pending release, which will use master). The branch in that development PR is development, which is a mirror of dev that should be used for checks with the PR (a bit more experimental version of dev where hotfixes potentially to be later withdrawn are accepted).

@gou4shi1
Copy link
Contributor Author

@ethouris Thank you for still caring so much about my PR. I’m more than willing to help, but I haven’t been involved in this project for nearly two years and will need some free time.

@ethouris
82F0 Copy link
Collaborator

No hurry, we are still ongoing with finalizing 1.5.5 release, it will also take quite a long time. And only then we would be able to do tests on candidates for 1.6.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[core] Area: Changes in SRT library core help wanted Indicates that a maintainer wants help on an issue or pull request Type: Bug Indicates an unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0