-
Notifications
You must be signed in to change notification settings - Fork 9k
HDFS-17121. BPServiceActor to provide new thread to handle FBR #5888
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
💔 -1 overall
This message was automatically generated. |
the changes look good at high level, could you please take care of checkstyle? |
Good improvement. Not think carefully but my first feeling, IBR - FBR mis-order could trigger some issues, such as miss some block report? One case, Thanks. |
I agree with @Hexiaoqiao . We have done something similar before, and here we need to ensure the order of |
💔 -1 overall
This message was automatically generated. |
@Hexiaoqiao ,if the order of ibr and fbr is must be guaranteed at datanode side, HDFS-16016 also can not guarantee it. |
And maybe it not a problem. The FBR will only process the reported block list,the replication r will not discard. FBR |
I am sure IT IS (As @tomscut mentioned above.). IMO, if there is issue that confirmed, the proper way is fix it first then improve it. I believe this proposal is very important, actually we have deploy this feature for years, however we should be safe to land it. |
Yes,I know it. It is a problem with mis-order. |
@Hexiaoqiao , well, I user original sendIBRLock to make right order for ibr and fbr. The FBR will execute ibr first, so the sendIBRLock will be successful for it. Please review for it, Thanks. |
@Hexiaoqiao just one question: do you mean you already have IBR and FBR being processed asynchronously? or you mean you have HDFS-16016 deployed?
this sounds good |
@LiuGuH sorry i got bit confused, if we want to solve the ordering problem by using the same lock for both IBR and FBR as part of this PR, do we really need HDFS-17129 as separate Jira? |
Send FBR asynchronously. (Not include IBR, because we didn't meet some performance issue here).
+1, if we could solve this issue here, we DO NOT need create another one. |
About this PR, I didn't get the original purpose (include HDFS-16016). IIUC, we want to keep lightweight channel to send heartbeat avoid to be blocked, Right? If true, is it enough to make IBR and FBR asynchronously together and use only one thread? Thanks. |
|
got it, thank you @Hexiaoqiao! |
💔 -1 overall
This message was automatically generated. |
|
Suggest to submit at one PR (rather than different PR) , thus we could focus and discuss at the same place. Thanks. |
|
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
Please check report from Yetus first. |
💔 -1 overall
This message was automatically generated. |
…xecution interval; change fullBlockReportLeaseId to volatile for synchronized in two thread(fbr and heartbeat)
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
…terval control when nn restart
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
d625c50
to
bb98740
Compare
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
Please disscuss in #6005 |
This comment was marked as outdated.
This comment was marked as outdated.
💔 -1 overall
This message was automatically generated. |
@LiuGuH are you still working on this PR? |
This version is running on my production environment for long time. It is almost already done in my side. Thanks |
Description of PR
After HDFS-16016 , it makes ibr in a thread to avoid heartbeat blocking with ibr when require readlock in Datanode.
Now fbr should do as this. The reason is this:
(1)heartbeat maybe block because of fbr with readlock in datanode
(2)fbr will only may return FinalizeCommand
How was this patch tested?
There is no a new test for it. We can use original test case.