-
Notifications
You must be signed in to change notification settings - Fork 9.1k
HDFS-16158. Discover datanodes with unbalanced volume usage by the st… #3288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
These failed unit tests work fine locally. Hi @tasanuma @jojochuang @Hexiaoqiao @ayushtkn , Could you please help review the code. Thanks. |
Hi @aajisaka , could you please help review the code. Thanks a lot. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some quick comments.
@@ -675,6 +675,10 @@ public long getDfsUsed() throws IOException { | |||
8000 | return volumes.getDfsUsed(); | ||
} | |||
|
|||
public long setDfsUsed() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this API doesn't appear to be used.
for (Float usage : usages) { | ||
dev += (usage - totalDfsUsed) * (usage - totalDfsUsed); | ||
} | ||
dev = (float) Math.sqrt(dev / usages.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add a check to ensure usages.size() never returns 0?
@@ -6515,15 +6518,16 @@ public String getLiveNodes() { | |||
.put("nonDfsUsedSpace", node.getNonDfsUsed()) | |||
.put("capacity", node.getCapacity()) | |||
.put("numBlocks", node.numBlocks()) | |||
.put("version", node.getSoftwareVersion()) | |||
.put("version", "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this removed?
Thanks @jojochuang for your review, I will fix these problems ASAP. |
Hi @jojochuang , this PR has a lot of changes, which can make rolling updates difficult. I re-implemented this feature and will submit another PR later. Thank you for your review. |
JIRA: HDFS-16158
Discover datanodes with unbalanced volume usage by the standard deviation
In some scenarios, we may cause unbalanced datanode disk usage:
In the case of unbalanced disk usage, a sudden increase in datanode write traffic may result in busy disk I/O with low volume usage, resulting in decreased throughput across datanodes.
In this case, we need to find these nodes in time to do diskBalance, or other processing. Based on the volume usage of each datanode, we can calculate the standard deviation of the volume usage. The more unbalanced the volume, the higher the standard deviation.
To prevent the namenode from being too busy, we can calculate the standard variance on the datanode side, transmit it to the namenode through heartbeat, and display the result on the Web of namenode. We can then sort directly to find the nodes on the Web where the volumes usages are unbalanced.