Bacteriophages (phages), viruses that infect bacteria, play a central role in microbial ecology and phage therapy. Yet, understanding their diversity and behavior in metagenomic samples remains a significant challenge.
Our goal: harness the representational power of DNA-specific foundation models to address two key questions:
- Can we isolate phage contigs from complex metagenomic mixtures?
- Can we infer a phage’s life cycle—virulent or temperate—directly from its genomic sequence?
This project was developed during the Phagos x AWS — Hackdays 2025, held in Paris. In 48 hours, we built an end-to-end, biology-informed ML pipeline that combines metagenomic insights with cutting-edge genomic representation models.
We combined two public datasets:
- Dataset 1 – Metagenomic fragments (bacteria, plasmids, phages)
- Dataset 2 – DeePhage phages with life cycle annotations
We ensured high-confidence labeling by retaining only overlapping phage entries between datasets and fragmenting sequences with the script from the Gauge your phage benchmark study.
➡️ Final classes:
Viru
(virulent phage),Temp
(temperate phage),Bact
(bacterial contigs)
We employed Evo, a transformer-based foundation model trained for molecular sequences. The Evo encoder transforms genomic fragments into dense vector representations capturing sequence-level semantics beyond motifs or k-mers.
- Jupyter-based PyTorch pipeline
- Evo embeddings fed to a shallow classifier
- 3-way prediction task (Viru / Temp / Bact)
- Mixed precision training with AMP (
autocast
,GradScaler
) - Performance monitored using
accuracy_score
and confusion matrices
From Ho et al. (Microbiome, 2023) – contains labeled bacterial and phage fragments:
Includes phage genomes with lifestyle annotations (automated + expert-curated):
- Python 3.9+
- pip
- Jupyter Notebook
git clone https://github.com/phagos-hackathon25/project-gamma.git
cd phage-llm-classification
pip install -r requirements.txt
Open prediction.ipynb
using Jupyter and execute the cells sequentially.
- Ho, S.F.S. et al. Gauge your phage, Microbiome (2023): DOI
- Evo Foundation Model: https://github.com/evo-design/evo
- Kevin KURTZ @ktzkvin
- Virgile MARTEL @Skrinox
- Marion Fresquet @MarionFresquet
- Jiwoo CHOI @yellowsmob
This work demonstrates how biological insight and modern ML tooling can meet in a hackathon setting to generate meaningful scientific workflows.