8000 ENH: Make it possible to specify different splits for datasets · Issue #749 · vocalpy/vak · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
ENH: Make it possible to specify different splits for datasets #749
Open
@NickleDave

Description

@NickleDave

Related to #748

We should make it possible to specify different splits for the same dataset.
This avoids the need to re-"prep" a dataset every time; a dataset will just be a set of files in a folder--without sub-directories for "train"/"val"/"test"--and the splits will be in a separate file in that directory.

Dataset/datapipe classes should accept a splits_path argument, that will default to None.
If the splits_path argument is None, then the datapipe class looks in a default location for a single splits path (and raises a FileNotFoundError if it's not found).

The splits_path wil be distinct from what we now call dataset_csv_path. It will be a json file, basically metadata, that declares not only what we now call dataset_csv_path but also any other paths needed for a split. In the case of a frame classification dataset, this includes the vectors of sample IDs and indices within each sample.

Probably we should rename dataset_csv_path to something like inputs_targets_paths_csv for clarity.

So we'll need to:

  • add splits_path to dataset classes
  • modify how prep.frame_classification works to not make split sub-directories

Metadata

Metadata

Assignees

Labels

ENH: enhancementenhancement; new feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    0