Python package for generating synthetic datasets of the cellular context for Cryo-Electron Tomography.
- IMOD must be installed on the system since PolNet calls to some of its standalone commands: https://bio3d.colorado.edu/imod/doc/guide.html
- Miniconda or Anaconda with Python 3.
- Git.
- IMOD can be used for MRC files visualization. Paraview can be used for VTK (.vtp) files visualization. Pandas is recommended for managing the CSV files.
Here is how to get it installed:
-
Download PolNet source code:
git clone https://github.com/anmartinezs/polnet.git cd polnet
-
Create a conda virtual environment
conda create --name polnet pip conda activate polnet
-
Install PolNet package with its requirements:
pip install -e .
For developers who do not want to install PolNet in the virtual environment as a package, you can only install the requirements by:
pip install -r requirements.txt
You can check all requirements in the requirements.txt file (JAX is optional).
The installation has been tested in Ubuntu 22.04 and Windows 10 and 11.
First open Jupyter by running:
jupyter notebook
If you are not familiar with jupyter notebooks, first get started with https://docs.jupyter.org/en/latest/running.html
To generate a synthetic dataset run on Jupyter next notebook: gui/gen_dataset.ipynb
To create you own structural models next Jupyter notebooks are available:
- Membranes: gui/create_membrane_models.ipynb
- Filaments: gui/create_filament_models.ipynb
- Macromolecules:
- Atomic model (PDB) to electron density map (MRC): gui/atomic_to_density.ipynb
- Only for membrane bound macromolecules: gui/align_membrane_proteins.ipynb
- Models: gui/create_macromolecule_models.ipynb
Exemplary videos for using the GUI are avilable at Zenodo.
Important note: all Jupyter notebooks are thoroughly self-documented in order to guide the user in the process. In addition, they contain graphic objects and default setting to facilitate the process.
bash create_docker.sh
First, you need to modify the config file (example:scripts/config_sample.yaml
).
bash run_docker.sh --out_dir /path/to/output/directory --config /path/to/config_script.yaml
First, you need to modify the config file (example:scripts/config_acquisition.yaml
).
bash run_docker.sh --out_dir /path/to/output/directory --config /path/to/config_script.yaml
Folder docs contains the file default_settings.pdf, it describes the defaults settings for the hardcoded script to generate synthetic tomogram scripts/data_gen/all_features.py.
In addition, table in docs/molecules_table.md contains more detailed descriptions of the PDB models used to create macromolecular models provided in data folder.
- polnet: python package with the Python implemented functionality for generating the synthetic data.
- gui: set of Jupyter notebooks with Graphic User Interface (GUI).
- core: functionality required by the notebooks.
- scripts: python scripts for generating different types of synthetic datasets. Folders:
- data_gen: scripts for data generation.
- deprecated: contains
some scripts for evaluations carried out during the software development, they are not prepared for external users
because some hardcoded paths need to be modified.
- templates: scripts for building the structural units for macromolecules (requires the installation EMAN2). Their usage is strongly deprecated, now GUI notebooks include all functionality.
- deprecated: contains
some scripts for evaluations carried out during the software development, they are not prepared for external users
because some hardcoded paths need to be modified.
- csv: scripts for postprocessing the CSV generated files.
- data_prep: script to convert the generated dataset in nn-UNet format.
- data_gen: scripts for data generation.
- tests: unit tests for functionalities in polnet. The script tests/test_transformations.py requires to generate at least 1 output tomo with the script scripts/all_features.py and modified the hardcoded input paths, that is because the size of the input data avoid to upload them to the repository.
- data: contains input data, mainly macromolecules densities and configuration input files, that con be used to simulate tomograms. These are the default input, an user can add/remove/modify these input data using the notebooks in GUI.
- in_10A: input models for macromolecules at 10A voxel size.
- in_helix: input models for helical structures.
- in_mbsx: input models for membrane structures.
- tempaltes: atomic models and density maps used by macromolecular models.
- docs:
- API documentation.
- A PDF with the suplementary material for [1] with the next tables:
- Glossary of acronyms by order of appearance in the main text.
- Glossary mathematical symbols defined in the main text organized by scope
- Table Variables used by the input files to model the generators.
- Table with the structures used to simulate the cellular context.
The API documentation for polnet Python package is available in docs/apidoc/index.html
[1] Martinez-Sanchez A.*, and Lamm L., Jasnin M. and Phelippeau H. (2024) "Simulating the cellular context in synthetic datasets for cryo-electron tomography" IEEE Transactions on Medical Imaging 10.1109/TMI.2024.3398401