-
Notifications
You must be signed in to change notification settings - Fork 9.1k
HDFS-15987. Improve oiv tool to parse fsimage file in parallel with d… #2918
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
* | ||
* STEP2: Merge files. | ||
*/ | ||
private void outputInParallel(Configuration conf, FileSummary summary, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a great idea to improve OIV performance, but it seems only considering INODE_SUB section now, right? Actually there are also some other subsections, such as INODE_REFERENCE_SUB, INODE_DIR_SUB, SNAPSHOT_DIFF_SUB. I think we could process them using the same way.
cc @sodonnel @jojochuang any ideas for this improvement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Hexiaoqiao for guidance. Other possible sub-sections can also be optimized, but may not be the focus of optimization, i think. Analyse as below.
There are several steps to parse fsimage in the case of DELIMITED format:
-
- Load string table
-
- Load inode references
-
- Handle INODE to memory or levelDB
-
- Handle INODE_DIR to memory or levelDB
-
- Output INODE
For example In our practice, it takes 7 hours for to parse a large fsimage file, and just the 5th step which only uses INODE takes more than 6 hours. So I did parallelization in 5th step.
The 3rd and 4th steps are basically memory operations, which are not very time-consuming. It may be possible to use INODE_SUB or INODE_DIR_SUB feature for parallel processing, but I am not sure if it is necessary to do so.
Hope to discuss further to clarify whether other sub-sections need to be processed, Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think its a good idea to make this change in stages. If using just the parallel part of the INODE sections cuts the runtime in half, then its a good change. We can always open more Jiras to parallel the other sections after this one is done, if it makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense to me. Leave some nit comment inline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @whbing for your works. leave comments inline, FYI.
sync trunk
All failed tests are passed in local. It seems unrelated to this PR. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
@@ -132,6 +133,7 @@ private static Options buildOptions() { | |||
options.addOption("delimiter", true, ""); | |||
options.addOption("sp", false, ""); | |||
options.addOption("t", "temp", true, ""); | |||
options.addOption("threads", true, ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am concerned that parameter -threads
will collide with -t
, and it could be parsed to -t hreads
here. It is safe to changes names to another one to avoid abuse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it should be renamed and will be fixed in next commit. By the way, is there a suitable parameter name recommendation ? 😊
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This option name threads
has to rename here. Otherwise it will conflict with -t
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This option name
threads
has to rename here. Otherwise it will conflict with-t
.
New option as follow. Point it out If any better option, thanks!
options.addOption("m", "multiThread", true, "");
@@ -146,7 +146,13 @@ public String build() { | |||
PBImageDelimitedTextWriter(PrintStream out, String delimiter, | |||
String tempPath, boolean printStoragePolicy) | |||
throws IOException { | |||
super(out, delimiter, tempPath); | |||
this(out, delimiter, tempPath, printStoragePolicy, 1, "-"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused why we use static string "-" rather than null. any other consideration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering the default value of outputFile
in OfflineImageViewerPB.java
is "-", I continue this style.
String outputFile = cmd.getOptionValue("o", "-");
@@ -640,6 +657,19 @@ long getParentId(long id) throws IOException { | |||
private void output(Configuration conf, FileSummary summary, | |||
FileInputStream fin, ArrayList<FileSummary.Section> sections) | |||
throws IOException { | |||
ArrayList<FileSummary.Section> allINodeSubSections = | |||
getINodeSubSections(sections); | |||
if (numThreads > 1 && !parallelOut.equals("-") && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the same as above comments.
* | ||
* STEP2: Merge files. | ||
*/ | ||
private void outputInParallel(Configuration conf, FileSummary summary, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense to me. Leave some nit comment inline.
Add a report Improve_oiv_tool_001.pdf. |
🎊 +1 overall
This message was automatically generated. |
@whbing This feature is useful for parse fsimage. It is near to check in when fix the comment. Any plan to push this forward? Thanks. |
public void visit(RandomAccessFile file) throws IOException { | ||
public void visit(String filePath) throws IOException { | ||
filename = new File(filePath); | ||
RandomAccessFile file = new RandomAccessFile(filePath, "r"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it is unnecessary to change here.
Thanks for attention. The PR works well and I will improve some details if needed. |
String compressionCodec, Configuration conf) | ||
throws IOException { | ||
// channel of RandomAccessFile is not thread safe, use File | ||
FileInputStream fin = new FileInputStream(filename); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RandomAccessFile
is not thread safe.
The change here is to get stream for each thread through File filename
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it is unnecessary to change here.
@ferhui Thanks for the detailed review.
Considering the thread-safe channel, RandomAccessFile
is avoided here. I haven't found a more elegant way for the time being.
💔 -1 overall
This message was automatically generated. |
ArrayList<FileSummary.Section> allINodeSubSections = | ||
getINodeSubSections(sections); | ||
if (numThreads > 1 && !parallelOut.equals("-") && | ||
allINodeSubSections.size() > 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to add some logs showing it's using serial or parallel output. And if the user is using the tool with option "threads" but her fsimage doesn't support the feature, better add some logs notifying her it's falling back to the serial output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good advice. It will be optimized in the next commit. Thanks for review! @sodonnel
} | ||
} | ||
|
||
if (exceptions.size() != 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this check happening after all threads are finished? Can we return the exception promptly?
mark = end; | ||
String path = paths[i]; | ||
|
||
threads[i] = new Thread(() -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe thread pool is better here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe thread pool is better here?
@symious Thanks! I will try this suggestion in next commit.
It seems that some comments are not fixed now. Any plan to push this improvement forwards? This PR has pending here for near year, we should check in ASAP once ready. Thanks. |
Sorry for the late follow-up and reply. The new commit will be tested later in our cluster and it will be submitted these days. |
💔 -1 overall
This message was automatically generated. |
The latest commit works well in prod env. @Hexiaoqiao @symious PTAL. |
🎊 +1 overall
This message was automatically generated. |
Thanks @whbing for your updates. It almost look good to me. Please fix checkstyle issue also. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. +1 from my side. Will commit if no other furthermore comments util double days later.
…elimited format. (#2918). Contributed by Hongbing Wang. Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
Committed to trunk. Thanks @whbing for your contributions! Thanks all to reviews! |
…elimited format. (#2918). Contributed by Hongbing Wang. Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
revert and re-check in to correct the git message. |
…elimited format. (apache#2918). Contributed by Hongbing Wang. Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
…l with delimited format. (apache#2918). Contributed by Hongbing Wang." This reverts commit 8897549.
…elimited format. (apache#2918). Contributed by Hongbing Wang. Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
…elimited format
NOTICE
Please create an issue in ASF JIRA before opening a pull request,
and you need to set the title of the pull request which starts with
the corresponding JIRA issue number. (e.g. HADOOP-XXXXX. Fix a typo in YYY.)
For more details, please see https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute