Use regular expressions to parse image data text files. #1971
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #1951.
This pull request consists of two changes:
ifstream
method of parsing image data files, this pull request uses regular expressions for more robust matching.tools/convert_imageset.cpp
andsrc/caffe/layers/image_data_layer.cpp
. This pull request pulls that common code out into a new function insrc/caffe/util/io.cpp
for ease of maintenance.More details follow.
Each line of the input text file is matched against the following regular expression:
Feel free to play around with an interactive version so you can test it out and see what it matches. This regular expression handles a lot of cases that would've been difficult to handle using the previous naive approach. It captures whitespace within a filename, and enables quoting of filenames in case for some insane reason you have a space at the beginning of a file name.
Some concrete examples of really degenerate cases that will parse correctly:
One drawback is that this introduces
boost_regex
as an additional dependency. However, since we already require Boost, this seems like an acceptable tradeoff.Implementation-wise, this pull request should be complete, though it's lacking tests, which I will get around to writing at some point in the near future.