Paidiverpy is a Python package designed to create pipelines for preprocessing image data for biodiversity analysis.
Note: This package is still in active development, and frequent updates and changes are expected. The API and features may evolve as we continue improving it.
The official documentation is hosted on ReadTheDocs.org: https://paidiverpy.readthedocs.io/
Note: Comprehensive documentation is under construction.
To install paidiverpy, run:
pip install paidiverpy
You can install paidiverpy
locally or on a notebook server such as JASMIN or the NOC Data Science Platform (DSP). The following steps are applicable to both environments, but steps 2 and 3 are required if you are using a notebook server.
-
Clone the repository:
# ssh git clone git@github.com:paidiver/paidiverpy.git # https # git clone https://github.com/paidiver/paidiverpy.git cd paidiverpy
-
(Optional) Create a Python virtual environment to manage dependencies separately from other projects. For example, using
conda
:conda env create -f environment.yml conda activate Paidiverpy
-
Install the paidiverpy package:
Finally, you can install the paidiverpy package:
pip install -e .
You can run your preprocessing pipeline using Paidiverpy in several ways, typically requiring just one to three lines of code:
Install the package and utilize it in your Python scripts.
# Import the Pipeline class
from paidiverpy.pipeline import Pipeline
# Instantiate the Pipeline class with the configuration file path
# Please refer to the documentation for the configuration file format
pipeline = Pipeline(config_file_path="../examples/config_files/config_simple2.yml")
# Run the pipeline
pipeline.run()
# You can export the output images to the specified output directory
pipeline.save_images(image_format="png")
Pipelines can be executed via command-line arguments. For example:
paidiverpy -c examples/config_files/config_simple.yml
This runs the pipeline according to the configuration file, saving output images to the directory defined in the output_path
.
Together with the documentation, you can explore various use cases through sample notebooks in the examples/example_notebooks
directory:
- Open and display a configuration file and a metadata file
- Run processing steps without creating a pipeline
- Run a pipeline and interact with outputs
- Run pipeline steps in test mode
- Create pipelines programmatically
- Rerun pipeline steps with modified configurations
- Use parallelization with Dask
- Create a LocalCluster and run a pipeline
- Run a pipeline using a public dataset with IFDO metadata
- Run a pipeline using a data on a object store
- Add a custom algorithm to a pipeline
- Open and process raw images
- Export and validate metadata
If you'd like to manually download example data for testing, you can use the following command:
from paidiverpy.utils.data import PaidiverpyData
PaidiverpyData().load(DATASET_NAME)
Available datasets:
- plankton_csv: Plankton dataset with CSV file metadata
- benthic_csv: Benthic dataset with CSV file metadata
- benthic_ifdo: Benthic dataset with IFDO metadata
- nef_raw: Sample images in Nef format (raw images) with CSV file metadata
- benthic_raw_images: Benthic dataset in raw format with CSV file metadata
Example data will be automatically downloaded when running the example notebooks.
Note: Please check the documentation for more information about Paidiverpy: https://paidiverpy.readthedocs.io/
Want to support or improve paidiverpy? Check out our contribution guide to learn how to get started.
This project was supported by the UK Natural Environment Research Council (NERC) through the Tools for automating image analysis for biodiversity monitoring (AIAB) Funding Opportunity, reference code UKRI052.