Use regular expressions to parse image data text files. #1971

erictzeng · 2015-02-25T23:03:19Z

8000

Fixes #1951.

This pull request consists of two changes:

Rather than the brittle ifstream method of parsing image data files, this pull request uses regular expressions for more robust matching.
Previously, the parsing code was duplicated across two files, tools/convert_imageset.cpp and src/caffe/layers/image_data_layer.cpp. This pull request pulls that common code out into a new function in src/caffe/util/io.cpp for ease of maintenance.

More details follow.

Each line of the input text file is matched against the following regular expression:

\h*("?)(.+?)\1\h+(\d+)\h*

Feel free to play around with an interactive version so you can test it out and see what it matches. This regular expression handles a lot of cases that would've been difficult to handle using the previous naive approach. It captures whitespace within a filename, and enables quoting of filenames in case for some insane reason you have a space at the beginning of a file name.

Some concrete examples of really degenerate cases that will parse correctly:

file name with spaces.jpg 1
" file_name_with_leading_space.jpg" 2
file_name_with_"_symbol.jpg 3
" really disgusting " file  ""name  .jpg" 4

One drawback is that this introduces boost_regex as an additional dependency. However, since we already require Boost, this seems like an acceptable tradeoff.

Implementation-wise, this pull request should be complete, though it's lacking tests, which I will get around to writing at some point in the near future.

shelhamer · 2015-03-07T06:22:05Z

@erictzeng this looks right -- thanks for fixing the brittle format -- but I think you need to update the travis script to install boost regex: https://github.com/BVLC/caffe/blob/master/scripts/travis/travis_install.sh.

bchu · 2016-03-30T00:01:29Z

Any updates on this?

Use regular expressions to parse image data text files.

801d217

shelhamer added the in progress label Mar 7, 2015

bchu mentioned this pull request Dec 14, 2015

Fix ImageDataLayer's silent failure on file paths with spaces #3433

Closed

bchu mentioned this pull request May 16, 2016

handle spaces in image file names #4059

Merged

malreddysid mentioned this pull request Jun 1, 2016

Resolve SIGSEGV error in image_data_layer.cpp #4218

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use regular expressions to parse image data text files. #1971

Use regular expressions to parse image data text files. #1971

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Use regular expressions to parse image data text files. #1971

Are you sure you want to change the base?

Use regular expressions to parse image data text files. #1971

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!