This repository was archived by the owner on Jan 7, 2025. It is now read-only.
This repository was archived by the owner on Jan 7, 2025. It is now read-only.
Closed
Description
Currently DIGITS supports only a single integer label per image. For many applications, like regression or multiclass classification, this is not enough. I would like to propose adding support for both of these features.
There are several problems with this.
- The interface for feeding in a new dataset. Currently, DIGITS supports parsing a folder with a bunch of subfolders, each of which contains images from a single class, and also loading from a pregenerated text file. While the former approach cannot be extended to any of the proposed enhancements easily, the latter can, although not quite straightforwardly. Each line in such text file is currently matched against
(.+)\s+(\d+)\s*$
(path/to/image 123
). This could be replaced with(.+)((?:\s+\d+(?:\.\d*))+)\s*$
to check for a list of ints or floats (path/to/image 123 4. 5.67
). - The format for storing a dataset. Currently, that's an LMDB (I think it's always an LMDB and never LevelDB, although the code seems to support both; correct me if I'm wrong) which stores Caffe's
Datum
structs. The problem is that aDatum
has a field for a label, and that's a singleint
(proof). Currenlty DIGITS dumps the class label into that field. There are at least three solutions to this, but none seem particularly easy:- Split all databases DIGITS creates in two: one for images and one for labels. Breaks compatibility with previous versions, no reason to do this for single-class classification which most people use.
- Add support for two kinds of databases to DIGITS: the old-style consolidated ones and the new-style split ones. Hard to implement and maintain.
- Patch Caffe, adding something like
float_label
orint_labels
orfloat_labels
to Datum. Increases memory usage (not much) and changes a widely-used structure (very bad).
I am currently working on patching create_db.py
to support split databases, although I'm not sure that this is the best approach. Comments would be very much appreciated.