8000 GitHub - as-himself/cog-dia: Cogified TTS model capable of generating ultra-realistic dialogue in one pass
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

as-himself/cog-dia

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dia: Run realistic text-to-dialogue audio generation locally

Run on Replicate Built with Cog

This repository provides a Cog container for Dia, a 1.6 billion parameter text-to-speech model developed by Nari Labs. Dia generates highly realistic dialogue audio directly from text, including multiple speakers and non-verbal sounds like (laughs).

Model Links:

Prerequisites

  • Docker: To build and run the container. Install Docker.
  • Cog: To build and run this model locally. Install Cog.
  • NVIDIA GPU: An NVIDIA GPU is required to run this model.

Run locally with Cog

Running this model locally is straightforward with Cog. It handles building the environment and downloading the model weights automatically.

  1. Clone this repository:

    git clone https://github.com/zsxkib/cog-dia.git
    cd cog-dia
  2. Run a prediction: The first time you run cog predict, it builds the container and downloads the weights, which takes a few minutes. Subsequent runs are much faster.

    # Example prediction
    cog predict -i text="[S1] This is a test using Cog! [S2] It downloads the weights automatically. (laughs)"

    Cog will output the path to the generated .wav file.

    You can pass other inputs too:

    cog predict \
        -i text="[S1] Another example! [S2] With different settings." \
        -i cfg_scale=3.5 \
        -i temperature=1.1

    Check predict.py for all available inputs like audio_prompt, seed, etc.

How it works (briefly)

Cog uses cog.yaml to define the environment and predict.py to run the model. The setup() function in predict.py automatically downloads the model weights from a Replicate CDN using pget if they aren't already cached locally within the container.

License

The original Dia model is licensed under Apache 2.0. This Cog packaging code is MIT licensed. Please respect the original model's usage restrictions.


⭐ Star this repo on GitHub!

👋 Follow me on Twitter/X

About

Cogified TTS model capable of generating ultra-realistic dialogue in one pass

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%
0