Dia: Run realistic text-to-dialogue audio generation locally

This repository provides a Cog container for Dia, a 1.6 billion parameter text-to-speech model developed by Nari Labs. Dia generates highly realistic dialogue audio directly from text, including multiple speakers and non-verbal sounds like (laughs).

Model Links:

Original Model: nari-labs/Dia-1.6B on Hugging Face
Original Code: github.com/nari-labs/dia
This Cog packaging by: zsxkib on GitHub / @zsakib_ on Twitter

Prerequisites

Docker: To build and run the container. Install Docker.
Cog: To build and run this model locally. Install Cog.
NVIDIA GPU: An NVIDIA GPU is required to run this model.

Run locally with Cog

Running this model locally is straightforward with Cog. It handles building the environment and downloading the model weights automatically.

Clone this repository:

git clone https://github.com/zsxkib/cog-dia.git
cd cog-dia

Run a prediction: The first time you run cog predict, it builds the container and downloads the weights, which takes a few minutes. Subsequent runs are much faster.
```
# Example prediction
cog predict -i text="[S1] This is a test using Cog! [S2] It downloads the weights automatically. (laughs)"
```
Cog will output the path to the generated .wav file.

You can pass other inputs too:
```
cog predict \
    -i text="[S1] Another example! [S2] With different settings." \
    -i cfg_scale=3.5 \
    -i temperature=1.1
```
Check predict.py for all available inputs like audio_prompt, seed, etc.

How it works (briefly)

Cog uses cog.yaml to define the environment and predict.py to run the model. The setup() function in predict.py automatically downloads the model weights from a Replicate CDN using pget if they aren't already cached locally within the container.

License

The original Dia model is licensed under Apache 2.0. This Cog packaging code is MIT licensed. Please respect the original model's usage restrictions.

⭐ Star this repo on GitHub!

👋 Follow me on Twitter/X

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
dia		dia
.dockerignore		.dockerignore
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cog.yaml		cog.yaml
predict.py		predict.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Dia: Run realistic text-to-dialogue audio generation locally

Prerequisites

Run locally with Cog

How it works (briefly)

License

About

Uh oh!

Releases

Packages

Languages

License

as-himself/cog-dia

Folders and files

Latest commit

History

Repository files navigation

Dia: Run realistic text-to-dialogue audio generation locally

Prerequisites

Run locally with Cog

How it works (briefly)

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages