A Python package providing standardized biological constants and scoring matrices for bioinformatics pipelines. Biobase aims to eliminate the need to repeatedly recreate common biological data structures and scoring systems in your code.
- Quick Start
- Requirements
- Installation
- Running Files
- Project Structure
- Project Goals
- Contributing
- Project Status
- License
from biobase.constants.amino_acid import ONE_LETTER_CODES, MONO_MASS
print(ONE_LETTER_CODES) # 'ACDEFGHIKLMNPQRSTVWY'
print(MONO_MASS['A']) # 71.037113805
from biobase.matrix import BLOSUM
blosum62 = BLOSUM(62)
print(blosum62['A']['A']) # 4
print(blosum62['W']['C']) # -2
from biobase.constants.analysis.nucleic_analysis import Dna
sequence = "ATCGTAGC"
print(Dna.transcribe(sequence)) # 'AUCGUAGC'
print(Dna.complement_dna(sequence)) # 'GCTACGAT'
print(Dna.calculate_gc_content(sequence)) # 50.0
from biobase.constants.analysis.motif import find_motifs
sequence = "ACDEFGHIKLMNPQRSTVWY"
print(find_motifs(sequence, "DEF")) # [3]
- Python 3.10+
- pip (for installation)
pip install biobase
PROJECT IS NOT YET ON PYPI, SO NORMAL PIP INSTALLATION WILL NOT WORK
Clone the repository and install in editable mode:
git clone https://github.com/lignum-vitae/biobase.git
cd biobase
pip install -e .
To ensure relative imports work correctly, always run files using the module path from the project root:
python -m src.biobase.matrix
src/biobase/matrix.py
: Scoring matrix implementations (BLOSUM, PAM, etc.)src/biobase/constants/
: Core biological constantsamino_acid.py
: Amino acid codes, masses, and codon tablesnucleic_acid.py
: DNA/RNA constants and complementary bases
src/biobase/constants/analysis/
:motif.py
: Protein motif search functionalitynucleic_analysis.py
: DNA/RNA sequence analysis tools
src/biobase/matrices/
: Scoring matrix data stored in JSON file format
For detailed documentation of each component, please refer to our Wiki.
Biobase aims to provide Python-friendly versions of common biological constants and tools for bioinformatics pipelines. Key objectives:
- Standardize biological data structures
- Provide efficient implementations of common scoring systems
- Ensure type safety and validation
- Maintain comprehensive documentation
- Support modern Python practices
We welcome contributions! Please read our:
- ✅ BLOSUM and PAM matrix implementations
- ✅ Basic amino acid constants and conversions
- ✅ DNA/RNA sequence analysis tools
- ✅ Protein motif searching
- ✅ Core biological constants
- 🚧 Additional scoring matrices
- 🚧 Extended amino acid properties
- 📋 Protein structure constants
- 📋 Enzyme and reaction constants
- 📋 Integration with common bioinformatics tools
- ✅ GC content calculation
- ✅ DNA/RNA transcription
- ✅ DNA complementation
- ✅ Motif finding
- 📋 Statistical analysis tools
- 📋 File format parsers (FASTA, GenBank, etc.)
- ✅ Basic README
- ✅ Code of Conduct
- ✅ Contributing Guidelines
- 🚧 API Documentation
- 🚧 Wiki Pages
- 🚧 Usage Examples
- 🚧 PyPI package deployment
- 🚧 CI/CD Pipeline
- 🚧 Code Coverage
- 📋 Automated Releases
- ✅ Complete
- 🚧 In Progress
- 📋 Planned
This project is in the alpha stage. APIs may change without warning until version 1.0.0.
This project is licensed under the MIT License - see the LICENSE file for details.