Biobase

A Python package providing standardized biological constants and scoring matrices for bioinformatics pipelines. Biobase aims to eliminate the need to repeatedly recreate common biological data structures and scoring systems in your code.

Quick Start

Access amino acid properties:

from biobase.constants.amino_acid import ONE_LETTER_CODES, MONO_MASS
print(ONE_LETTER_CODES)  # 'ACDEFGHIKLMNPQRSTVWY'
print(MONO_MASS['A'])    # 71.037113805

Use scoring matrices:

from biobase.matrix import BLOSUM
blosum62 = BLOSUM(62)
print(blosum62['A']['A'])  # 4
print(blosum62['W']['C'])  # -2

Analyze DNA sequences:

from biobase.constants.analysis.nucleic_analysis import Dna
sequence = "ATCGTAGC"
print(Dna.transcribe(sequence))         # 'AUCGUAGC'
print(Dna.complement_dna(sequence))     # 'GCTACGAT'
print(Dna.calculate_gc_content(sequence))  # 50.0

Find protein motifs:

from biobase.constants.analysis.motif import find_motifs
sequence = "ACDEFGHIKLMNPQRSTVWY"
print(find_motifs(sequence, "DEF"))  # [3]

Requirements

Python 3.10+
pip (for installation)

Installation

Regular Installation

pip install biobase

PROJECT IS NOT YET ON PYPI, SO NORMAL PIP INSTALLATION WILL NOT WORK

Development Installation

Clone the repository and install in editable mode:

git clone https://github.com/lignum-vitae/biobase.git
cd biobase
pip install -e .

Running Files

To ensure relative imports work correctly, always run files using the module path from the project root:

Run a specific file

python -m src.biobase.matrix

Project Structure

Core Components

src/biobase/matrix.py: Scoring matrix implementations (BLOSUM, PAM, etc.)
src/biobase/constants/: Core biological constants
- amino_acid.py: Amino acid codes, masses, and codon tables
- nucleic_acid.py: DNA/RNA constants and complementary bases

Analysis Tools

src/biobase/constants/analysis/:
- motif.py: Protein motif search functionality
- nucleic_analysis.py: DNA/RNA sequence analysis tools

Data Files

src/biobase/matrices/: Scoring matrix data stored in JSON file format

For detailed documentation of each component, please refer to our Wiki.

Project Goals

Biobase aims to provide Python-friendly versions of common biological constants and tools for bioinformatics pipelines. Key objectives:

Standardize biological data structures
Provide efficient implementations of common scoring systems
Ensure type safety and validation
Maintain comprehensive documentation
Support modern Python practices

Contributing

We welcome contributions! Please read our:

Project Status

Current Version: 0.4.1-alpha

Core Features

✅ BLOSUM and PAM matrix implementations
✅ Basic amino acid constants and conversions
✅ DNA/RNA sequence analysis tools
✅ Protein motif searching
✅ Core biological constants
🚧 Additional scoring matrices
🚧 Extended amino acid properties
📋 Protein structure constants
📋 Enzyme and reaction constants
📋 Integration with common bioinformatics tools

Analysis Tools

✅ GC content calculation
✅ DNA/RNA transcription
✅ DNA complementation
✅ Motif finding
📋 Statistical analysis tools
📋 File format parsers (FASTA, GenBank, etc.)

Documentation

✅ Basic README
✅ Code of Conduct
✅ Contributing Guidelines
🚧 API Documentation
🚧 Wiki Pages
🚧 Usage Examples

Development

🚧 PyPI package deployment
🚧 CI/CD Pipeline
🚧 Code Coverage
📋 Automated Releases

Legend

✅ Complete
🚧 In Progress
📋 Planned

Stability

This project is in the alpha stage. APIs may change without warning until version 1.0.0.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.github		.github
docs		docs
src/biobase		src/biobase
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
REFERENCES.md		REFERENCES.md
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
setup.py		setup.py

License

lignum-vitae/biobase

Folders and files

Latest commit

History

Repository files navigation

Biobase

Table of Contents

Quick Start

Access amino acid properties:

Use scoring matrices:

Analyze DNA sequences:

Find protein motifs:

Requirements

Installation

Regular Installation

Development Installation

Running Files

Run a specific file

Project Structure

Core Components

Analysis Tools

Data Files

Project Goals

Contributing

Project Status

Current Version: 0.4.1-alpha

Core Features

Analysis Tools

Documentation

Development

Legend

Stability

License

About

Topics

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages