DNAnalyzer is a biotechnology research and deployment company. Supported by Anthropic, our mission is to revolutionize DNA analysis by making AI-powered genomic insights accessible to all through on-device computation.
Founded by Piyush Acharya, DNAnalyzer's team includes 46 leading computational biologists and computer scientists from Microsoft Research, the University of Macedonia, and Northeastern University.
Our impact has been recognized by Y Combinator, the organizers of the AI World's Fair Expo, and the CEO of DEV.
Today's Limitation | DNAnalyzer's Innovation |
---|---|
$100 average cost for DNA sequencing | Completely Free |
Up to $600 for basic health insights | Accessible to underserved communities |
78% of companies share genetic data with third parties | 100% Private, local computation |
Data breaches expose millions (23andMe: 6.9M users in 2023) | No central database of sensitive genetic information |
"Unlike a password, compromised genetic data is permanently exposed."
Codon & Protein Detection Rapidly identifies protein-coding regions, amino acid chains, and critical genomic indicators. | GC-rich Region Analysis Pinpoints genomic promoter areas with significant biological implications (45-60% GC-content). | Neurological Genomics Detects genetic markers associated with neurological conditions (autism, ADHD, schizophrenia). |
Promoter Element Identification Locates key transcription initiation sequences (BRE, TATA, INR, DPE) with pinpoint accuracy. | Multi-format FASTA Integration Supports comprehensive DNA database analysis from uploads or external sources. | Met CLI Automation Leverages a powerful CLI interface for scripting, automation, and large-scale analysis tasks. |
Ancestry Snapshot (Privacy-Safe) Estimates continental origin using on-device reference panels. |
See the [Ancestry Snapshot guide](docs/usage/ancestry-snapshot.md) for usage instructions.
New: Interactive web dashboard for in-browser visualization is now available under
web/dashboard
and communicates with the local REST API at/api
.
After each CLI analysis, DNAnalyzer now requests two summaries from the OpenAI API:
- Researcher Report – Technical explanation with detailed statistics and terminology.
- Layperson Report – Plain-language overview highlighting key takeaways.
Both reports are printed to the console once analysis completes if an OPENAI_API_KEY
is configured.
Ready to explore your DNA? Begin precise genomic analysis in seconds:
# Clone the repository
git clone https://github.com/VerisimilitudeX/DNAnalyzer.git
# Navigate to project directory
cd DNAnalyzer
# Install dependencies
./gradlew build
Refer to our comprehensive Getting Started Guide for advanced configuration.
## Polygenic Health-Risk Scores
DNAnalyzer now includes a lightweight polygenic risk score calculator and fun trait predictions. Provide a 23andMe text file along with a CSV of SNP weights to compute scores and see traits:
./gradlew run --args='--23andme my_data.txt --prs assets/risk/heart_disease_prs.csv sample.fa'
Trait predictions and the risk score are printed after the standard DNA analysis.
Disclaimer: Trait predictions are provided for educational purposes only and should not be used for medical or health decisions.
For automated workflows, DNAnalyzer exposes a minimal REST endpoint. Start the
Spring Boot application and send a FASTA file to /server/analyze
:
curl -F file=@sample.fa http://localhost:8080/server/analyze
The response contains the core pipeline output serialized as JSON, allowing you to script DNAnalyzer from languages like Python or R without the GUI.
Additionally, a /api/file/parse
endpoint is available for simply uploading a
FASTA or FASTQ file and receiving the parsed sequence.
An optional module using PyOpenCL provides GPU acceleration for local sequence alignment. If no compatible GPU is found, the implementation automatically falls back to a pure Python version.
Run the module directly or via the CLI:
python -m src.python.gpu_smith_waterman SEQ1 SEQ2
From the DNAnalyzer CLI you can request a Smith-Waterman alignment by supplying
--sw-align
together with --align
:
java -jar dnanalyzer.jar --align reference.fa --sw-align
See GPU_Smith_Waterman.md for further details.
DNAnalyzer now includes a lightweight polygenic risk score calculator. Supply a CSV file of SNP weights and your genotyping data to estimate risk for complex diseases directly on device.
Upcoming Development | Description |
---|---|
Optimized SQL Database | Scalable database for genomic datasets across diverse species |
Enhanced Neural Network | Integration with 3rd-party genotype datasets (23andMe, AncestryDNA) |
DIAMOND Implementation | Blending DIAMOND's speed with BLAST’s accuracy for cutting-edge analyses |
AI Trait Predictor Suite | Fun, shareable predictions—taste for cilantro, chronotype, ear-wax type—backed by peer-reviewed SNP studies |
Secure Share & Compare | Offline-generated, QR-coded summaries let users share limited insights with doctors or friends—no raw genome ever exposed. |
We welcome contributions across experience levels:
Please cite DNAnalyzer as follows:
@software{Acharya_DNAnalyzer_ML-Powered_DNA_2022,
author = {Acharya, Piyush},
doi = {10.5281/zenodo.14556577},
month = oct,
title = {{DNAnalyzer: ML-Powered DNA Analysis Platform}},
url = {https://github.com/VerisimilitudeX/DNAnalyzer},
version = {3.5.0-beta.0},
year = {2022}
}
DNAnalyzer is provided "as-is." Usage of the software implies acceptance of risks and liabilities. DNAnalyzer disclaims responsibility for any loss or damage arising from its use.
For assistance or inquiries, contact: help@dnanalyzer.org.
DNAnalyzer, © Piyush Acharya 2025. A fiscally sponsored 501(c)(3) nonprofit (EIN: 81-2908499), licensed under MIT License.
Metric | Current Value |
---|---|
GitHub Stars | 147 :contentReference[oaicite:4]{index=4} |
Forks | 62 :contentReference[oaicite:5]{index=5} |
Contributors | 46 :contentReference[oaicite:6]{index=6} |
Monthly FASTA files analyzed* | 5 000 + (self-reported) |
Total downloads (Gradle/CLI) | 4 042 |
Deployments via GitHub Pages | 485 :contentReference[oaicite:7]{index=7} |
- Discord ·
#genomics-ai
channel (80 + members) - Hackathons · Hosted annual Interlake Bio-Hack (50 participants)
- Open Issues for First-Timers · Labelled
good-first-issue
to mentor newcomers. - Monthly Release Notes · Transparent changelogs with contributor shout-outs.
*Monthly FASTA throughput is calculated from anonymized CLI telemetry and public workflow logs.