-
Notifications
You must be signed in to change notification settings - Fork 9k
HDFS-16939. Fix the thread safety bug in LowRedundancyBlocks. #5450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
💔 -1 overall
This message was automatically generated. |
LGTM. |
@@ -369,7 +369,7 @@ synchronized boolean remove(BlockInfo block, | |||
* @return true if the block was found and removed from one of the priority | |||
* queues | |||
*/ | |||
boolean remove(BlockInfo block, int priLevel) { | |||
synchronized boolean remove(BlockInfo block, int priLevel) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should boolean remove(BlockInfo block, int priLevel, int oldExpectedReplicas)
be made synchronized instead. Its other callers are synchronized methods:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will just ensure only one thread getting into the remove method. But what if there is one thread trying to remove, other trying to add or size(). I feel following should help:
private Object syncOnBlockInfoSet(int priority, BlockInfo blockInfo, String operation) {
LightWeightLinkedSet<BlockInfo> blockInfos =priorityQueues.get(priority);
synchronized (blockInfos) {
if("size".equalsIgnoreCase(operation)) {
return blockInfos.size();
}
if("remove".equalsIgnoreCase(operation)) {
return blockInfos.remove(blockInfo);
}
//implement other required methods.
else {
return null;
}
}
}
all code-pieces where we do operation on an element in priorityQueues
, we call syncOnBlockInfoSet
to do that operation. for ex: size=syncOnBlockInfoSet(priority, null, "size")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should
boolean remove(BlockInfo block, int priLevel, int oldExpectedReplicas)
be made synchronized instead. Its other callers are synchronized methods:
Sorry I don't quite understand. Since the callers are already synchronized, why is it necessary to made boolean remove(BlockInfo block, int priLevel, int oldExpectedReplicas)
synchronized?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will just ensure only one thread getting into the remove method. But what if there is one thread trying to remove, other trying to add or size().
add() and size() are already synchronized in the current code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should
boolean remove(BlockInfo block, int priLevel, int oldExpectedReplicas)
be made synchronized instead. Its other callers are synchronized methods:Sorry I don't quite understand. Since the callers are already synchronized, why is it necessary to made
boolean remove(BlockInfo block, int priLevel, int oldExpectedReplicas)
synchronized?
Since boolean remove(BlockInfo block, int priLevel)
forwards the call to boolean remove(BlockInfo block, int priLevel, int oldExpectedReplicas)
, I am suggesting that lets make boolean remove(BlockInfo block, int priLevel, int oldExpectedReplicas)
synchronized instead.
Callers of boolean remove(BlockInfo block, int priLevel, int oldExpectedReplicas)
are synchronized means that we can have boolean remove(BlockInfo block, int priLevel, int oldExpectedReplicas)
as synchronized without any perf-loss.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will just ensure only one thread getting into the remove method. But what if there is one thread trying to remove, other trying to add or size().
add() and size() are already synchronized in the current code.
these method being synchronized means that only one thread can enter into the synchronized method, but doesn't make the object its working on synchronized. There could be one thread calling size which leads to priorityQueues.get(i).size()
, and other calling add which leads to priorityQueues.get(priLevel).add(blockInfo)
simultaneously.
Example of probable issue is:
for adding element
Lines 87 to 125 in ccdb978
protected boolean addElem(final T element) { | |
// validate element | |
if (element == null) { | |
throw new IllegalArgumentException("Null element is not supported."); | |
} | |
// find hashCode & index | |
final int hashCode = element.hashCode(); | |
final int index = getIndex(hashCode); | |
// return false if already present | |
if (getContainedElem(index, element, hashCode) != null) { | |
return false; | |
} | |
modification++; | |
size++; | |
// update bucket linked list | |
DoubleLinkedElement<T> le = new DoubleLinkedElement<T>(element, hashCode); | |
le.next = entries[index]; | |
entries[index] = le; | |
// insert to the end of the all-element linked list | |
le.after = null; | |
le.before = tail; | |
if (tail != null) { | |
tail.after = le; | |
} | |
tail = le; | |
if (head == null) { | |
head = le; | |
bookmark.next = head; | |
} | |
// Update bookmark, if necessary. | |
if (bookmark.next == null) { | |
bookmark.next = le; | |
} | |
return true; | |
} |
Line 101 in ccdb978
size++; |
Please feel free to disagree.
Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your suggestion, I'll make both boolean remove(BlockInfo block, int priLevel, int oldExpectedReplicas)
and boolean remove(BlockInfo block, int priLevel)
synchronized to ensure correctness.
However, I think you misunderstood the semantics of synchronized
a method. Refer to java doc:
When one thread is executing a synchronized method for an object, all other threads that invoke synchronized methods for the same object block (suspend execution) until the first thread is done with the object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@saxenapranav Thanks for your pretty review comment. I am not sure if get your points totally. IMO, this improvement is safe and self-contained, because synchronized
is reentrant and exclusive. So I am confused if it could involve other consistency issues.
I would like to give my +1 if you were worried about perf-loss only for synchronized-synchronized
, for this case I think it could be acceptable, anyway totally agree that both changes about performance we should given the benchmark comparison.
Thanks @zhangshuyan0 and @saxenapranav , Please feel free to correct me if something wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreeing with both @Hexiaoqiao @zhangshuyan0.
Thanks @zhangshuyan0 for taking suggestion.
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. +1 from my side.
Committed to trunk. @zhangshuyan0 Thanks for your contributions. @saxenapranav And thanks for your reviews. |
@zhangshuyan0 Would you mind to check if we need to backport to other active branches. |
…#5450). Contributed by Shuyan Zhang. Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
…5471). Contributed by Shuyan Zhang. Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
…#5450). Contributed by Shuyan Zhang. Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
…ks. (apache#5450 apache#5471). Contributed by Shuyan Zhang. Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org> (cherry picked from commit 8cc57f5)
…ks. (apache#5450 apache#5471). (#60) Contributed by Shuyan Zhang. Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org> (cherry picked from commit 8cc57f5) Co-authored-by: zhangshuyan <81411509+zhangshuyan0@users.noreply.github.com>
The remove method in LowRedundancyBlocks is not protected by synchronized. This method is private and is called by BlockManager. As a result, priorityQueues has the risk of being accessed concurrently by multiple threads.