8000 GitHub - wlsdml1114/diff-svc: Singing Voice Conversion via diffusion model
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

wlsdml1114/diff-svc

ย 
ย 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Diff-SVC

Singing Voice Conversion via diffusion model

DDSP-SVC-KOR

DDSP ๊ด€๋ จ ๋ ˆํฌ์ง€ํ† ๋ฆฌ๋„ ์žˆ์Šต๋‹ˆ๋‹ค. diff-svc๋ณด๋‹ค DDSP๊ฐ€ ๋” ์„ฑ๋Šฅ์ด ์ข‹๋‹ค๋Š” ํ‰๊ฐ€๊ฐ€ ๋งŽ์•„ ์ด์ชฝ์œผ๋กœ ๋ฐœ์ „์‹œ์ผœ๋ณผ ์˜ˆ์ •์ž…๋‹ˆ๋‹ค. https://github.com/wlsdml1114/DDSP-SVC-KOR

Discord server (๋ณธ ๋ ˆํฌ์ง€ํ† ๋ฆฌ ์ปค๋ฎค๋‹ˆํ‹ฐ)

Discord

* ์ฃผ์˜์‚ฌํ•ญ *

1. ๋ชจ๋ฅด๋Š”๊ฑด ๋ฌผ์–ด๋ด๋„ ๋˜๋Š”๋ฐ ๋Œ€๋‹ต์ด ๋А๋ฆด ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค(์œ„์ชฝ์— Issues์— ์ง€๊ธˆ๊นŒ์ง€์˜ ์งˆ๋‹ต๋“ค์ด ์žˆ์Šต๋‹ˆ๋‹ค. ๋ชจ๋ฅด๋Š”๊ฑด ๋””์Šค์ฝ”๋“œ์—์„œ ๋ฌผ์–ด๋ณด๋Š”๊ฒŒ ๋” ๋น ๋ฆ…๋‹ˆ๋‹ค.)

2. ์ด ํ”„๋กœ์ ํŠธ๋Š” ํ•™์ˆ  ๊ต๋ฅ˜ ๋ชฉ์ ์œผ๋กœ ์„ค๋ฆฝ๋˜์—ˆ์œผ๋ฉฐ ํ”„๋กœ๋•์…˜ ํ™˜๊ฒฝ์„ ์œ„ํ•œ ๊ฒƒ์ด ์•„๋‹™๋‹ˆ๋‹ค. ๋ณธ ํ”„๋กœ์ ํŠธ์˜ ๋ชจ๋ธ์—์„œ ๋ฐœ์ƒํ•˜๋Š” ์Œ์›์œผ๋กœ ์ธํ•ด ๋ฐœ์ƒํ•˜๋Š” ์ €์ž‘๊ถŒ ๋ฌธ์ œ์— ๋Œ€ํ•ด์„œ๋Š” ์ฑ…์ž„์„ ์ง€์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

ํŠœํ† ๋ฆฌ์–ผ ์˜์ƒ(์•„๋ž˜ ์‚ฌ์šฉ๋ฐฉ๋ฒ•๊ณผ ์ผ์น˜ํ•ฉ๋‹ˆ๋‹ค)

์œ ํŠœ๋ธŒ ์˜์ƒ

batํŒŒ์ผ์„ ์ด์šฉํ•ด์„œ ์›ํด๋ฆญ ํ•™์Šต ํ•˜๋Š” ๋ฐฉ๋ฒ• (2023.08.24 ์ถ”๊ฐ€)

checkpoint ๋‹ค์šด๋กœ๋“œ ๊นŒ์ง€ ์™„๋ฃŒํ•œ ์ƒํƒœ์—์„œ ์ ์šฉ ๊ฐ€๋Šฅํ•œ ๋ฐฉ๋ฒ•

Thanks to ํŽ˜ in arca.live

์„ค๋ช…๋ณด๊ณ  ์˜ค์„ธ์š”!

local GPU์—์„œ Diff-SVC ์‚ฌ์šฉ๋ฐฉ๋ฒ•

์ฝ”๋“œ ๊ตฌ๋™์„ ์œ„ํ•œ ํ”„๋กœ๊ทธ๋žจ ์„ค์น˜ ๋ฐ ์ฝ”๋“œ, ์ฒดํฌํฌ์ธํŠธ ๋‹ค์šด๋กœ๋“œ

  1. ์•„๋‚˜์ฝ˜๋‹ค3 ์„ค์น˜ (https://www.anaconda.com/products/distribution)

    • ์„ค์น˜ ์ค‘๊ฐ„์— PATHํ™˜๊ฒฝ๋ณ€์ˆ˜์— ์ถ”๊ฐ€ํ•˜๊ฒ ๋ƒ๋Š” ์งˆ๋ฌธ์ด ์žˆ๋Š”๋ฐ, ์ด ๋‹จ๊ณ„์—์„œ ๋“ฑ๋กํ•˜๋Š”๊ฒŒ ๋งˆ์Œ์ด ํŽธํ•จ
  2. ffmpeg ์„ค์น˜ (https://www.gyan.dev/ffmpeg/builds/)

    • ์••์ถ•ํ•ด์ œํ•œ ํด๋”/bin ์„ PATHํ™˜๊ฒฝ๋ณ€์ˆ˜์— ์ถ”๊ฐ€ํ•ด์ค˜์•ผ ํ•จ
    • Debian ํ˜น์€ Ubuntu Linux์˜ ๊ฒฝ์šฐ ๋‹ค์Œ ๋ช…๋ น์œผ๋กœ ์„ค์น˜
    sudo apt install ffmpeg
    
  3. CUDA 11.6 ์„ค์น˜ (https://developer.nvidia.com/cuda-11-6-2-download-archive?target_os=Windows&target_arch=x86_64&target_version=10&target_type=exe_local)

    • ์žฌ์‹œ์ž‘์ด ์žˆ์„ ์ˆ˜ ์žˆ์Œ
    • ๋ฆฌ๋ˆ…์Šค์˜ ๊ฒฝ์šฐ nvidia๋“œ๋ผ์ด๋ฒ„๊ฐ€ ์„ค์น˜๋˜์–ด ์žˆ๋‹ค๋ฉด ๋”ฐ๋กœ ์„ค์น˜ํ•  ํ•„์š”์—†์Œ.
  4. ํ˜„์žฌ repository๋ฅผ .zip์œผ๋กœ ๋‹ค์šด๋กœ๋“œ

    • ์••์ถ•ํ•ด์ œ ๊ฒฝ๋กœ ์ „์ฒด์— ํ•œ๊ธ€์ด ์—†๋Š”๊ฒŒ ์ข‹์Œ
    • ์••์ถ•ํ•ด์ œํ•˜๋ฉด diff-svc-main ํด๋”๊ฐ€ ์ƒ๊น€
    • Linux๋ผ๋ฉด ๋‹ค์Œ ๋ช…๋ น์–ด๋กœ ๋‹ค์šด๋กœ๋“œ
    sudo apt install git
    git clone https://github.com/wlsdml1114/diff-svc.git
    
  5. checkpoint ๋‹ค์šด๋กœ๋“œ

    • Hubert checkpoint ๋‹ค์šด๋กœ๋“œ (Hubert ckpt ํŒŒ์ผ์€ ๋‚˜์—๊ฒŒ ์ €์ž‘๊ถŒ์ด ์—†์œผ๋‹ˆ ๋‚˜ํ•œํ…Œ ๋‹ฌ๋ผ๊ณ  ํ•˜์ง€๋งˆ์…ˆ)
      • ์•„๋ž˜ ๋””์Šค์ฝ”๋“œ์ฑ„๋„์— ๋“ค์–ด๊ฐ€๊ธฐ
      • verification step ํ†ต๊ณผ
      • ์™ผ์ชฝ ์ฑ„๋„์ค‘์— ARCHIVE - pre-trained-model ์ฑ„๋„์— ๋“ค์–ด๊ฐ€๊ธฐ
      • ๋งจ์œ„์— 451.48MB์งœ๋ฆฌ ๋“œ๋ผ์ด๋ธŒ ๋งํฌ๊ฐ€ ์žˆ์Œ (mega.nz/~~๋กœ ์‹œ์ž‘)
      • folder ๋‹ค์šด๋กœ๋“œ ๋ฐ›๊ธฐ
      • ์œ„์—์„œ ์••์ถ•ํ•ด์ œํ•œ ํด๋”๋กœ ์˜ฎ๊ฒจ์„œ "์—ฌ๊ธฐ์— ์••์ถ•ํ•ด์ œ" ํ•ด๋ฒ„๋ฆฌ๊ธฐ
    • (Optional) GPU ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ 6GB์ด์ƒ์ธ ๊ฒฝ์šฐ
      • Nsf Hifigan checkpoint ๋‹ค์šด๋กœ๋“œ
        • ์—ฌ๊ธฐ์—์„œ ๋‹ค์šด๋กœ๋“œ ๋ฐ›๊ธฐ
        • ์œ„์—์„œ ์••์ถ•ํ•ด์ œํ•œ ํด๋”๋กœ ์˜ฎ๊ฒจ์„œ "์—ฌ๊ธฐ์— ์••์ถ•ํ•ด์ œ" ํ•ด๋ฒ„๋ฆฌ๊ธฐ

ํ•™์Šตํ™˜๊ฒฝ ์„ธํŒ…

  1. ์ฝ˜์†”ํ”„๋กœ๊ทธ๋žจ ์‹คํ–‰

    • Windows๋ผ๋ฉด anaconda prompt๋ฅผ ๊ด€๋ฆฌ์ž ๊ถŒํ•œ์œผ๋กœ ์—ด๊ธฐ
    • ๋ฆฌ๋ˆ…์Šค๋ผ๋ฉด ํ„ฐ๋ฏธ๋„์‹คํ–‰
  2. ํ”„๋กœ์ ํŠธ ํด๋”๋กœ ์ด๋™ (๋‹˜์ด ์–ด๋””์— ์••์ถ•ํ’€์—ˆ๋Š”์ง€์— ๋”ฐ๋ผ ๋‹ค๋ฆ„)

    cd /path/to/project/diff-svc-main/
    
  3. Anaconda ๊ฐ€์ƒํ™˜๊ฒฝ ์ƒ์„ฑ ๋ฐ ํ™œ์„ฑ

    conda create -n diff-svc python=3.9
    # ์„ค์น˜ ๋ญ ๋งŽ์ดํ• ๊ฑฐ์ž„ ์—”ํ„ฐ๋ˆ„๋ฅด๊ณ  ์„ค์น˜ ๋๋‚ ๋•Œ ๊นŒ์ง€ ๋Œ€๊ธฐ
    conda activate diff-svc
    
  4. library ์„ค์น˜ (์ด๊ฒƒ๋„ ์„ค์น˜ ๋ญ ๋งŽ์ดํ• ๊ฑฐ์ž„ ์—”ํ„ฐ์—”ํ„ฐํ•˜๋ฉด ๋Œ)

    pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu116
    pip install -r requirements.txt
    
  5. ํ™˜๊ฒฝ ๋ณ€์ˆ˜ ์„ธํŒ…

    -Windows์˜ ๊ฒฝ์šฐ:

    # ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ๋ถˆ๋Ÿฌ์˜ฌ๋•Œ ์šฉ์ดํ•˜๊ฒŒ ํ•˜๋ ค๊ณ  ์ถ”๊ฐ€
    set PYTHONPATH=.
    # ์ฒซ๋ฒˆ์งธ GPU์ด์šฉํ•ด์„œ ํ•™์Šตํ•˜๊ฒ ๋‹ค๋Š” ๋งˆ์ธ๋“œ
    #Windows์˜ ๊ฒฝ์šฐ์—๋งŒ
    set CUDA_VISIBLE_DEVICES=0
    

    -Linux์˜ ๊ฒฝ์šฐ:

    # ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ๋ถˆ๋Ÿฌ์˜ฌ๋•Œ ์šฉ์ดํ•˜๊ฒŒ ํ•˜๋ ค๊ณ  ์ถ”๊ฐ€
    export PYTHONPATH=.
    # GPU์‚ฌ์šฉ์€ ํ•™์Šต์‹œ์— ์„ค์ •ํ•จ.
    

ํ•™์Šต์šฉ ๋ฐ์ดํ„ฐ ์ค€๋น„

์ธ๊ณต์ง€๋Šฅ ํ•™์Šต์— ์žˆ์–ด์„œ ๋ฐ์ดํ„ฐ์…‹์˜ ์ค‘์š”๋„๋Š” 80%์ด์ƒ, ๋ฐ์ดํ„ฐ์…‹์˜ ํ€„๋ฆฌํ‹ฐ๊ฐ€ ๊ณง ๋ชจ๋ธ ๊ฒฐ๊ณผ์˜ ํ€„๋ฆฌํ‹ฐ์™€ ์ง๊ฒฐ๋œ๋‹ค๊ณ  ๋ณด๋ฉด ๋จ

  1. wavํŒŒ์ผ์ด๋‚˜ mp4ํŒŒ์ผ์„ ์ค€๋น„ํ•ด์„œ ํ”„๋กœ์ ํŠธ ํด๋”์— "preprocess"ํด๋”๋ฅผ ๋งŒ๋“ค๊ณ  ๊ฑฐ๊ธฐ ์•ˆ์—๋‹ค๊ฐ€ ๋‹ค ๋„ฃ์–ด์ค€๋‹ค. ํŒŒ์ผ ๊ฐœ์ˆ˜๊ฐ€ ๋งŽ์•„๋„ ์ƒ๊ด€์—†๊ณ , wav, mp4ํŒŒ์ผ ์„ž์—ฌ์žˆ์–ด๋„ ์ƒ๊ด€์€ ์—†๋Š”๋ฐ ํŒŒ์ผ ์ด๋ฆ„์— ๊ณต๋ฐฑ์ด๋ž‘ ํ•œ๊ธ€์ด ์—†์–ด์•ผํ•จ. (์ด ํŒŒ์ผ๋“ค์˜ ํ€„๋ฆฌํ‹ฐ์— ๋”ฐ๋ผ ๋ชจ๋ธ์˜ ๊ฒฐ๊ณผ๊ฐ€ ๋‹ฌ๋ผ์ง)
  2. wavํŒŒ์ผ์ด๋‚˜ mp4ํŒŒ์ผ์ด ๊ธธ์–ด๋„ ์ƒ๊ด€์—†์Œ, ์ „์ฒ˜๋ฆฌ๋‹จ๊ณ„์—์„œ 12์ดˆ ๋‚ด์™ธ๋กœ ๋‹ค ์•Œ์•„์„œ ์ž˜๋ผ์ค„๊ฑฐ๊ธฐ ๋•Œ๋ฌธ์— (๋ฌผ๋ก  ์ง์ ‘ ์ •์„ฑ๋“ค์—ฌ ์ž˜๋ผ๋งŒ๋“œ๋Š”๊ฒŒ ์„ฑ๋Šฅ์ด ์ข‹๊ธดํ•œ๋ฐ ๋„ˆ๋ฌด ๊ท€์ฐฎ์Œ)
    1. ์ค€๋น„ํ•œ ๋ฐ์ดํ„ฐ๋“ค์˜ ํฌ๋งท์„ ํ†ต์ผ ํ•ด์ฃผ๋Š” ํ•จ์ˆ˜๋ฅผ ์ถ”๊ฐ€ํ–ˆ์Œ. ์ฑ„๋„์€ ๋ชจ๋…ธ๋กœ, ์ƒ˜ํ”Œ๋ ˆ์ดํŠธ๋Š” 44100์œผ๋กœ ๋ณ€๊ฒฝํ•ด์ฃผ๋Š”๋ฐ, ์ด๊ฒŒ ์‹ซ๋‹ค๋ฉด sep_wav.pyํŒŒ์ผ์˜ 280๋ฒˆ์งธ ์ค„์˜ use_extract๋ฅผ False๋กœ ๋ฐ”๊ฟ”์ฃผ๋ฉด๋œ๋‹ค. ์ž์„ธํ•œ ๋‚ด์šฉ
      use_preprocessing = False
      
    2. ์ค€๋น„ํ•œ ๋ฐ์ดํ„ฐ๋“ค์ด ๋ฐฐ๊ฒฝ์Œ์ด ๋ชจ๋‘ ์ œ๊ฑฐ๋˜์–ด ํ•™์Šตํ•˜๊ณ ์žํ•˜๋Š” ์‚ฌ๋žŒ์˜ ๋ชฉ์†Œ๋ฆฌ๋งŒ ์žˆ๋Š” ๊ฒฝ์šฐ์—๋Š” sep_wav.pyํŒŒ์ผ์˜ 282๋ฒˆ์งธ ์ค„์˜ use_extract๋ฅผ False๋กœ ๋ฐ”๊ฟ”์ฃผ๋ฉด๋œ๋‹ค. use_extract๋Š” ๋ฐฐ๊ฒฝ์Œ์„ ์ž๋™์œผ๋กœ ์ง€์›Œ์ฃผ๋Š” ํ•จ์ˆ˜๋ฅผ ์ ์šฉํ•  ๊ฒƒ์ธ์ง€ ์—ฌ๋ถ€๋ฅผ ๊ฒฐ์ •ํ•˜๋Š” ๋ณ€์ˆ˜์ด๋‹ค.
      use_extract = False
      
    3. ๊ทธ๋ƒฅ ๋…ธ๋ž˜๋ฑ… 2์‹œ๊ฐ„์งœ๋ฆฌ mp4ํŒŒ์ผ ๋•Œ๋ ค๋ฐ•์€๊ฒฝ์šฐ์—๋Š” ์–Œ์ „ํžˆ ํ”„๋กœ๊ทธ๋žจ์ด ์ž˜๋ผ์ฃผ๊ณ  ๋ฐฐ๊ฒฝ์Œ์•… ์ง€์›Œ์ค„๋•Œ๊นŒ์ง€ ๊ธฐ๋‹ค๋ฆฌ์ž (๊ทผ๋ฐ ์ค‘๊ฐ„์— ๋„๋„ค์ด์…˜์œผ๋กœ tts๊ฐ€ ๋‚˜์˜ค๋Š” ๊ฒฝ์šฐ๋Š” ์ด๊ฒƒ๋„ ๋ชฉ์†Œ๋ฆฌ๋กœ ์น˜๊ธฐ๋•Œ๋ฌธ์— ์ˆ˜๋™์œผ๋กœ ์ง€์›Œ์ค˜์•ผํ•œ๋‹ค)
  3. preprocess ํด๋”์— ๋‹ค ๋•Œ๋ ค ๋„ฃ์—ˆ์œผ๋ฉด ๋‹ค์Œ ์ฝ”๋“œ ์‹คํ–‰ํ•˜๊ณ  ๋‹ค ์ž๋ฅด๊ณ  ๋ณ€ํ™˜ํ• ๋•Œ๊นŒ์ง€ ๊ธฐ๋‹ค๋ ค์ค€๋‹ค.
    python sep_wav.py
    # progress bar๊ฐ€ ์ƒ๊ธฐ๋ฉด์„œ ์ญˆ๋ฃจ๋ฃฉ ๋ญ”๊ฐ€ ์ฒ˜๋ฆฌ๊ฐ€ ๋˜์–ด๊ฐ€๋Š”๊ฒŒ ๋ณด์ผ๊ฒƒ์ด๋‹ค.
    # ์ž…๋ ฅ ํŒŒ์ผ์˜ ๊ธธ์ด๊ฐ€ ๋ช‡์‹œ๊ฐ„๋‹จ์œ„๋กœ ๊ธธ๋ฉด ๋งจ์ฒ˜์Œ์—๋Š” ์ข€ ์˜ค๋ž˜๊ฑธ๋ฆด์ˆ˜ ์žˆ์Œ(ํŒŒ์ผ 1๊ฐœ๋‹น 3~7๋ถ„?)
    
  4. preprocess_out ํด๋”์— finalํด๋”(use_extract=False)๋‚˜ voiceํด๋”(use_extract=True)์— wavํŒŒ์ผ๋“ค์ด ์ž”๋œฉ ์žˆ์„ ๊ฒƒ์ด๋‹ค. ๊ทธ๊ฒƒ๋“ค์„ ๋ณต์‚ฌํ•ด์„œ ๋ฐ”๋กœ ์•„๋ž˜ configure์—์„œ ์„ค์ •ํ•  raw_data_dir์— ๋ณต๋ถ™ํ•ด์ค€๋‹ค.
  5. ํ•™์Šต configure๋ฅผ ์„ค์ •
    1. trainingํด๋”์— config.yamlํŒŒ์ผ(ํ˜น์€ config_nsf.yaml)์„ ๋ฉ”๋ชจ์žฅ์œผ๋กœ ์—ด์–ด์ค€๋‹ค.
    2. ๋ฐ”๊ฟ”์•ผํ•˜๋Š” ํ•ญ๋ชฉ๋“ค์„ ๋ณด๊ธฐ์ข‹๊ฒŒ ์œ„์—๋‹ค๊ฐ€ ์˜ฌ๋ ค๋†จ๋‹ค. ์•„๋ž˜ ๋‚ด์šฉ์—์„œ 'test'๊ฐ€ ๋“ค์–ด๊ฐ„ ๋ถ€๋ถ„์„ ๋‹˜๋“ค ๋ง˜์— ๋งž๊ฒŒ ์ˆ˜์ •ํ•ด์ฃผ๋ฉด ๋œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ €์žฅ (๊ฐœ๋ฐœ์ž๊ฑฐ๋‚˜ ์ข€ ๋” ์ข‹์€ ํ€„๋ฆฌํ‹ฐ๋ฅผ ์œ„ํ•ด ์ปค์Šคํ…€ํ•  ์‚ฌ๋žŒ์€ ์•„๋ž˜ ๋ณ€์ˆ˜๋“ค์„ ์ถ”๊ฐ€๋กœ ๋ฐ”๊ฟ”์ฃผ๋ฉด ๋œ๋‹ค)
      ## original wav dataset folder
      ## 3๋ฒˆ์—์„œ ์ž๋ฅด๊ณ  ๋ณ€ํ™˜ํ•œ ๊ฒฐ๊ณผ wavํŒŒ์ผ๋“ค์„ ํ•™์Šต๋ฐ์ดํ„ฐ๋กœ ๋งŒ๋“ค๊ธฐ์œ„ํ•ด ๋„ฃ์–ด์ค„ ํด๋” ์ด๋ฆ„
      raw_data_dir: data/raw/test
      ## after binarized dataset folder
      ## ์œ„ ํด๋”์— ์žˆ๋Š” ํ•™์Šต๋ฐ์ดํ„ฐ๋“ค์„ ์‹ค์ œ ํ•™์Šต์— ์‚ฌ์šฉํ•˜๊ธฐ์œ„ํ•ด binarizeํ•œ ๊ฒฐ๊ณผ๋ฌผ์„ ์ €์žฅํ•  ํด๋”
      binary_data_dir: data/binary/test
      ## speaker name
      ## ์ด๊ฑด ๋‚˜์ค‘์— ๊ฒฐ๊ณผ๋ฌผ ๋ฝ‘์„ ๋•Œ ์“ฐ๊ฒŒ๋ ๊ฒƒ
      speaker_id: test
      ## trained model will be save this folder
      ## ํ•™์Šต๋ฐ์ดํ„ฐ๋กœ ํ•™์Šตํ•œ ๋ชจ๋ธ์„ ์ €์žฅํ•  ์žฅ์†Œ
      work_dir: checkpoints/test
      ## batch size
      ## ๋ชจ๋ธ์ด ํ•œ๋ฒˆ์— ํ•™์Šตํ•  ์–‘์„ ์ •ํ•œ๋‹ค (CUDA out of memory์—๋Ÿฌ๊ฐ€ ๋‚˜๋ฉด ์ด ์ˆซ์ž๋ฅผ ์ค„์ด๋ฉด ๋œ๋‹ค)
      max_sentences: 10
      ## AMP(Automatic Mixed Precision) setting(only GPU) for less VRAM
      ## AMP๋ฅผ ์‚ฌ์šฉํ• ๊ฒƒ์ธ์ง€ ์„ค์ •, ํ•™์Šต์‹œ๊ฐ„์— ์ฐจ์ด๋Š” ์—†์ง€๋งŒ, ํ•œ๋ฒˆ์— ๋” ๋งŽ์€ batch๋ฅผ ํ•™์Šต์‹œํ‚ฌ ์ˆ˜ ์žˆ์Œ.
      use_amp: true
      
      • AMP ํšจ๊ณผ AMP ๊ด€๋ จ ํŒ

        amp switch batch size VRAM cost๏ผˆGB) time for 100 batchs
        on 32 7.9 02:17
        off 32 OOM(Out Of Memory) N/A
        on 16 5.4 01:22
        off 16 7.4 01:23
  6. ์‹ค์ œ ํ•™์Šต์— ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๊ฒŒ binarize ํ•ด์ค€๋‹ค.
    • GPU ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ 6GB๋ฏธ๋งŒ์ธ ๊ฒฝ์šฐ
    python preprocessing/binarize.py --config training/config.yaml
    
    • GPU๋ฉ”๋ชจ๋ฆฌ๊ฐ€ 6GB์ด์ƒ์ธ ๊ฒฝ์šฐ
    python preprocessing/binarize.py --config training/config_nsf.yaml
    

๋ชจ๋ธ ํ•™์Šต ๋ฐ ๊ฒฐ๊ณผ๋ฌผ ๋ฝ‘๊ธฐ

  1. ํ•™์Šต์ฝ”๋“œ ์‹คํ–‰

    • ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ exp_name์— test๋“ค์–ด๊ฐ€๋Š” ๋ถ€๋ถ„์„ ๋‹˜๋“ค์ด ์œ„์— ์„ค์ •ํ•œ ์ด๋ฆ„์œผ๋กœ ๋ฐ”๊ฟ”์ฃผ๋ฉด ๋œ๋‹ค.

    • ๊ทธ๋ฆฌ๊ณ  ์ด๊ฑฐ ์—„์ฒญ ์˜ค๋ž˜๊ฑธ๋ฆผ (๋‚ด ๊ฒฝ์šฐ 20์‹œ๊ฐ„์€ ๋„˜๊ฒŒ ๊ฑธ๋ฆฐ๋“ฏ)

    • total loss๊ฐ€ ํ•™์Šต์„ ๊ณ„์†ํ•ด๋„ ๋ณ„๋กœ ์•ˆ์ค„์–ด๋“œ๋Š”๊ฑฐ ๊ฐ™์œผ๋ฉด ๊ทธ๋ƒฅ ctrl+cํ•ด์„œ ๋‚˜์™€๋ฒ„๋ฆฌ๋ฉด ๋œ๋‹ค.

    • Windows์˜ ๊ฒฝ์šฐ

      • GPU ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ 6GB๋ฏธ๋งŒ์ธ ๊ฒฝ์šฐ
      python run.py --config training/config.yaml --exp_name test --reset
      
      • GPU๋ฉ”๋ชจ๋ฆฌ๊ฐ€ 6GB์ด์ƒ์ธ ๊ฒฝ์šฐ
      python run.py --config training/config_nsf.yaml --exp_name test --reset
      
    • Linux์˜ ๊ฒฝ์šฐ:

      • GPU ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ 6GB๋ฏธ๋งŒ์ธ ๊ฒฝ์šฐ
      CUDA_VISIBLE_DEVICES=0 python run.py --config training/config.yaml --exp_name test --reset
      
      • GPU๋ฉ”๋ชจ๋ฆฌ๊ฐ€ 6GB์ด์ƒ์ธ ๊ฒฝ์šฐ
      CUDA_VISIBLE_DEVICES=0 python run.py --config training/config_nsf.yaml --exp_name test --reset
      

    ์ด์–ด์„œ ํ•˜๊ณ  ์‹ถ์„ ๊ฒฝ์šฐ

    Windows์˜ ๊ฒฝ์šฐ

    python run.py --exp_name test
    

    Linux์˜ ๊ฒฝ์šฐ

    CUDA_VISIBLE_DEVICES=0 python run.py --exp_name test
    

    ๋’ค์— --reset ๊ณผ --config ์˜ต์…˜์„ ์ง€์šฐ๊ณ  ๋ช…๋ น์„ ๋‚ด๋ฆฌ๋ฉด ์•Œ์•„์„œ ์ตœ์ข… ckpt์— ๋งž์ถฐ ์ด์–ด์„œ ํ•™์Šต์„ ํ•œ๋‹ค. ๋งŒ์•ฝ config.yaml ์—๋Ÿฌ๊ฐ€ ๋‚ ๊ฒฝ์šฐ checkpoint/(์ด๋ฆ„) ํด๋”์— config.yamlํŒŒ์ผ์„ ๋ณต์‚ฌํ•˜์ž.

  2. ํ•™์Šต ๋๋‚˜๋ฉด ์ด์ œ ๊ฒฐ๊ณผ๋ฌผ์„ ๋ฝ‘์„ ์ฐจ๋ก€

    1. infer.py๋ฅผ ๋ฉ”๋ชจ์žฅ์œผ๋กœ ์—ด๊ณ  ๋‹˜์ด ์œ„์— ์„ค์ •ํ•œ configure์— ๋งž๊ฒŒ ์ˆ˜์ •ํ•ด์•ผํ•จ
      # 76 line
      # ๋‹˜์ด ์œ„์— ์„ค์ •ํ•œ work_dir์— ํ”„๋กœ์ ํŠธ ์ด๋ฆ„๊ณผ ๊ฐ™๊ฒŒ ์„ค์ •ํ•ด์ฃผ๋ฉด ๋Œ
      project_name = "test"  
      
      # 81 line
      # ๊ฒฐ๊ณผ๋ฌผ์„ ๋ฝ‘์„ ์›๋ณธ ํŒŒ์ผ, ์ฆ‰ ๋ชฉ์†Œ๋ฆฌ๊ฐ€ ๋ณ€๊ฒฝ๋˜๊ธฐ๋ฅผ ์›ํ•˜๋Š” ํŒŒ์ผ
      # ์ด ํŒŒ์ผ ์—ญ์‹œ ๋ฐฐ๊ฒฝ์Œ์ด ๋‹ค ์ง€์›Œ์ง€๊ณ  ๋ชฉ์†Œ๋ฆฌ๋งŒ ๋‚จ์•„์žˆ๋Š” ์ƒํƒœ์—ฌ์•ผํ•จ
      # 44.1kHz, mono์—ฌ์•ผ ํ€„์ด ๋” ์ข‹์•„์ง
      # ํ•œ๋ฒˆ์— ์—ฌ๋Ÿฌ๊ฐœ๋ฅผ ๋ณ€๊ฒฝํ•˜๊ณ  ์‹ถ์œผ๋ฉด ["test1.wav", "test2.wav"] ์ด๋Ÿฐ์‹์œผ๋กœ ๋Š˜๋ฆฌ๋ฉด๋Œ
      file_names = ["test.wav"]
      
    2. ์„ค์ •์ด ๋๋‚ฌ์œผ๋ฉด file_names list์•ˆ์— ์ด๋ฆ„๋งŒ ๋„ฃ์–ด๋†จ๋˜ ํŒŒ์ผ๋“ค์„ raw ํด๋” ์•ˆ์œผ๋กœ ์˜ฎ๊ฒจ์ค€๋‹ค.
    3. ๊ทธ ๋‹ค์Œ์— ๋‹ค์Œ ์ฝ”๋“œ ์‹คํ–‰ํ•˜๋ฉด results ํด๋” ์•ˆ์— ๊ฒฐ๊ณผ ์ถœ๋ ฅ
      python infer.py
      

Q&A

Original updates translated into english

Updates

2022.12.4 44.1kHz vocoder added, officially providing support for 44.1kHz models!

2022.11.28 Added no_fs2 option (turned on by default) to optimize some networks, improve training speed, reduce model size, and be effective for future newly trained models

2022.11.23 Fixed a major bug that caused the original gt audio used for inference to be shifted to a sample rate of 22.05kHz, we apologize for the impact, please be sure to check your own test audio and use the updated code

2022.11.22 Fixed a lot of bugs, including a few that affected the reasoning effect significantly

2022.11.20 Add input and saving of most formats when reasoning without manual conversion with other software

2022.11.13 Fix epoch/steps display issue for reading models after interrupts, add disk cache for f0 processing, add support file for real-time variant sound inference

2022.11.11 Fix slice duration error, add 44.1khz adaptation, add support for contentvec

2022.11.4 Adding Mel Spectrum Save Function

2022.11.2 Integrate new vocoder code, update parselmouth algorithm

2022.10.29 Organize inference section, add long audio auto-slice feature.

2022.10.28 Migrate hubert's onnx inference to torch inference and organize inference logic.

If originally downloaded onnx hubert model need to re-download and replace with pt model, config don't need to change, currently can implement 1060 6G memory direct GPU inference and pre-processing, please check the documentation for details.

2022.10.27 Update dependency files to remove redundant dependencies.

2022.10.27 Fixed a serious bug that had caused hubert to still use cpu inference on gpu servers, slowing down by a factor of 3-5, affecting preprocessing and inference, not training

2022.10.26 Fix the problem that preprocessed data on windows does not work on linux, update some documents

2022.10.25 Write detailed documentation for inference/training, modify and integrate some code, add support for audio in ogg format (no need to differentiate from wav, just use it directly)

2022.10.24 Support for training on custom datasets with streamlined code

2022.10.22 Complete training on opencpop dataset and create repository

Notes

This project is established for academic exchange purposes and is not intended for production environments. We are not responsible for any copyright issues arising from the sound produced by this project's model.
If you redistribute the code in this repository or publicly publish any results produced by this project (including but not limited to video website submissions), please indicate the original author and source code (this repository).
If you use this project for any other plans, please contact and inform the author of this repository in advance. Thank you very much.

Details

This project has been trained and tested on many datasets. You can download the ckpt files, demo audio, and other files required for inference and training in the discord. For English support, you can join this discord: Discord

Acknowledgements

This project is based on diffsinger, diffsinger (openvpi maintenance version), and soft-vc. We would also like to thank the openvpi members for their help during the development and training process.
Note: This project has no connection with the paper of the same name DiffSVC, please do not confuse them!

About

Singing Voice Conversion via diffusion model

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 83.4%
  • Python 16.1%
  • Batchfile 0.5%
0