Diffusion Speech Quality Assesment (SQA)

This repository contains the official PyTorch implementations for the paper:

Danilo de Oliveira, Julius Richter, Jean-Marie Lemercier, Simon Welker, Timo Gerkmann, "Non-intrusive Speech Quality Assessment with Diffusion Models Trained on Clean Speech", accepted at ISCA Interspecch 2025. [bibtex]

The code is largely based on the repository of [1]

Installation

Create a new virtual environment with Python >= 3.11 (we have not tested other Python versions, but they may work).
Install the package dependencies via pip install -r requirements.txt and let pip resolve the dependencies for you
Install the EDM2 code as a submodule: git submodule update --init --recursive

Data

For training, we use the EARS-WHAM dataset.

Pretrained checkpoints

The checkpoint used in the paper can be downloaded here

Training

Training is done by executing train_sqa.py. A minimal running example with default settings (as in our paper) can be run with

torchrun --standalone --nproc_per_node=<num-gpus> train_sqa.py --outdir=<log-dir> --data=<path-to-trainset> --batch-gpu=<batch-size-per-gpu>

where <path-to-trainset> should be a path to a folder containing clean .wav files (subdirectories are also supported).

EMA Reconstruction

To reconstruct a new EMA profile with length 0.08, run

python edm2/reconstruct_phema.py --indir=<log-dir> --outdir=<reconstructed-ema-dir> --outstd=0.080

For more detailed on post-hoc EMA reconstruction, please refer to the EDM2 repository.

SQA

To calculate the diffusion log likelihoods on a test set and save them in a csv file, run

torchrun --standalone --nproc_per_node=<num-gpus> calculate_likelihood.py --checkpoint=<path-to-pkl> --data_dir=<path-to-testset> --output_file=<path-to-csv>

The --checkpoint parameter should be the path to a snapshot or a reconstructed EMA profile.

License

The code and checkpoints are licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Citations / References

We kindly ask you to cite our papers in your publication when using any of our research or code:

@misc{deoliveira2024nonintrusivespeechqualityassessment,
    title={Non-intrusive Speech Quality Assessment with Diffusion Models Trained on Clean Speech}, 
    author={Danilo de Oliveira and Julius Richter and Jean-Marie Lemercier and Simon Welker and Timo Gerkmann},
    year={2024},
    eprint={2410.17834},
    archivePrefix={arXiv},
    primaryClass={eess.AS},
    url={https://arxiv.org/abs/2410.17834}, 
}

[1] Tero Karras, Miika Aittala, Jaakko Lehtinen, Janne Hellsten, Timo Aila, Samuli Laine, "Analyzing and Improving the Training Dynamics of Diffusion Models", CVPR 2024. [Code]

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
edm2 @ 4bf8162		edm2 @ 4bf8162
preprocessing		preprocessing
src		src
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE.txt		LICENSE.txt
README.md		README.md
calculate_likelihood.py		calculate_likelihood.py
requirements.txt		requirements.txt
train_sqa.py		train_sqa.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Diffusion Speech Quality Assesment (SQA)

Installation

Data

Pretrained checkpoints

Training

EMA Reconstruction

SQA

License

Citations / References

About

Uh oh!

Releases

Packages

Languages

License

sp-uhh/diffusion-sqa

Folders and files

Latest commit

History

Repository files navigation

Diffusion Speech Quality Assesment (SQA)

Installation

Data

Pretrained checkpoints

Training

EMA Reconstruction

SQA

License

Citations / References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages