Official Code for Target Sound Extraction under Reverberant Environments with Pitch Information (Interspeech 2024)
For the demos, please visit https://wyw97.github.io/TSE_PI/
-
Dataset: selected from FSD50K
-
RIR Simulation: Image-Source Method
Reference link: https://www.audiolabs-erlangen.de/fau/professor/habets/software/smir-generator
-
Pitch Label: Generate by Praat under anechoic condition and labeled with time shift
Reference link: https://parselmouth.readthedocs.io/_/downloads/en/stable/pdf/
-
TODO: Update the dataset.py (Update Soon!)
-
Command: python train_f0.py
Thanks for the open-source code from Veluri et al. for providing Waveformer and SemanticHearing.
The majority for this part is to simply add the pitch information and train similarly to the Waveformer.