GUSO_WASPAA23

Code to replicate our WASPAA23 submission: AN OBJECTIVE EVALUATION OF HEARING AIDS AND DNN-BASED BINAURAL SPEECH ENHANCEMENT IN COMPLEX ACOUSTIC SCENES, where we benchmark real commercially-available Hearing Aid (HA) devices and their traditional enhancing methods against DNN-based postfilters.

Enric Gusó enric.guso@eurecat.org

Joanna Luberazdka joanna.luberadzka@eurecat.org

In this repository we provide:

SOFA files with 50-point pseudo-anechoic HRTFs of a KU100 dummy head, with and without wearing a HA.
10-th order Ambisonics to Binaural decoders for both SOFA files.
Code for generating a binaural speech enhancement (denoising + dereverberation) dataset with speech from MultiLingual LibriSpeech Spanish + WHAM! binaural noises. Dataset can be for normal hearing or for HA depending on the decoder.
Code for training and evaluating two Sudo RM-RF enhancement DNNs on that dataset: a causal and a non-causal version. We also provide weights in pretrained_models.
Code for generating a small test set of 10-th order Ambisonics situations. In the paper these are used for recording different Hearing Aids with their traditional features in bypass and also using each HA enhancement.
DNN inference template script: a template on how to process the recordings in bypass with the trained Causal DNN (DNN-C) and a non-causal model (DNN).
Code to evaluate in terms of SISDR, HASPI, HASQI and MBSTOI.

Dependencies:

Install dependencies. For the dataset creation:

pip install numpy scipy mat73 jupyter soundfile pyrubberband matplotlib pandas ipykernel tqdm

And dependencies for the DNN training and evaluation:

pip install comet_ml torch pyclarity seaborn

Alternatively, install specific versions from file with:

pip install -r requirements.txt

DNN Dataset Generation

Design the dataset: generate metadata

Generate a metadata dataframe for augmenting WHAM! to match the size of the speech dataset.

Open jupyter notebook while choosing your environment. Run microson_v1_dataset_design.ipynb editing your WHAM! and Multilingual LibriSpeech Spanish (MLSS) dataset paths.

Generates meta_microson_v1.csv.

(Optional) Normal hearing training:

We provide two 10-th order Ambisonics to Binaural decoders, one for generating synthetic normal hearing binaural datasets and the default one for simulating Hearing Aids:

(default) decoders_ord10/RIC_Front_Omni_ALFE_Window_sinEQ_bimag.mat: 50-point HRIRs 10th-order Ambisonics to Binaural decoder from the KU100 dummy wearing a hearing aids device.
decoders_ord10/KU100_ALFE_Window_sinEQ_bimag.mat: 50-point HRIRs 10th-order Ambisonics to Binaural decoder from the KU100 dummy.

Generate the audio

Download MultiLingual LibriSpeech Spanish and WHAM!. Run generate_microsonv1.py providing the paths to both datasets and the desired output path with: python generate_microsonv1.py --mls_path <path to MLSS> --wham_path <path to WHAM!> --output <dataset directorty>.

We obtain the following wav files in output:

ane_ir: the anechoic impulse response in binaural
anechoic: the anechoic speech signal in binaural
reverberant: the reverberant speech signal in binaural
ir: the reverberant impulse response in binaural
mono_ir: the reverberant impulse response in mono
noise: a corresponding augmented chunk from WHAM!

Model training

Go to sudo_rm_rf folder. Configure your CometML API Key and the dataset path you set in output in __config__.py file.

If you want to replicate our exact configurations, run the shell script recipes for each model:

DNN: m1_alldata_normal.sh for a non-causal model where target=anechoic
DNN-C: m4_alldata_normal_causal.sh for a causal model where target=anechoic

Edit the checkpoints_path and then run at each server with sh <script_name>.sh. The biggest model took about 20 days in a V100 instance.

Once training is complete, take the best (last in this case) epoch. We provide ours in pretrained_models.

We also provide two additional scripts for training "mild" enhancers:

m5_alldata_mild_causal causal where target = anechoic + 0.25*(reverb + noise)
m3_alldata_mild.sh non-causal where target = anechoic + 0.25*(reverb + noise)

Test Set of Complex Situations:

Ambisonics situations

To reproduce the rest of the paper you probably will have to adapt the remaining scripts to your particular case. We upload them anyway as templates.

Download 02_Office_MOA_31ch.wav, 07_Cafe_1_MOA_31ch.wav' and 09_Dinner_party_MOA_31ch.wav' from the Ambisonics Recordings of Typical environments (ARTE) Database and place in a directory:

Run listening_test_scenes.ipynb to generate the Ambisonics signals.

Then decode the resulting ambisonic signals to speaker signals using the tool that suits your particular speaker setup (e.g. AllRAD decoding) and normalize so that all utterances in the speaker signal test set have the same energy overall.

Calibrate the speaker setup to equivalent 70dBspl and record the different HA in bypass (without beamforming and other traditional enhancement methods) and enabling them (enabled).

Crop these recordings with recordings_crop.ipynb.

DNN inference: process with recordings_process.ipynb while adjusting the paths if necessary.

Evaluation: compute metrics by running recordings_analysis.ipynb and generating the plots.

Aknowledgement, Copyright and License

MIT License, check LICENSE.txt

The research leading to these results has received funding from the European union's Horizon Europe programme under grant agreement No 101017884 - GuestXR project.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GUSO_WASPAA23

Dependencies:

DNN Dataset Generation

Design the dataset: generate metadata

(Optional) Normal hearing training:

Generate the audio

Model training

Test Set of Complex Situations:

Aknowledgement, Copyright and License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
decoders_ord10		decoders_ord10
figures		figures
listening_situations_target_speech		listening_situations_target_speech
masp		masp
pretrained_models		pretrained_models
sofa_files		sofa_files
sudo_rm_rf		sudo_rm_rf
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
generate_microsonv1.py		generate_microsonv1.py
helpers.py		helpers.py
listening_test_scenes.ipynb		listening_test_scenes.ipynb
masp_debug.ipynb		masp_debug.ipynb
meta_microson_v1.csv		meta_microson_v1.csv
microson_v1_dataset_design.ipynb		microson_v1_dataset_design.ipynb
recordings_analysis.ipynb		recordings_analysis.ipynb
recordings_crop.ipynb		recordings_crop.ipynb
recordings_process.ipynb		recordings_process.ipynb
requirements.txt		requirements.txt

License

enricguso/guso_waspaa23

Folders and files

Latest commit

History

Repository files navigation

GUSO_WASPAA23

Dependencies:

DNN Dataset Generation

Design the dataset: generate metadata

(Optional) Normal hearing training:

Generate the audio

Model training

Test Set of Complex Situations:

Aknowledgement, Copyright and License

About

Resources

License

Uh oh!

Stars

Wat 4DBB chers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages