Transformer4SED is a repository which aims to collect Transformer-based sound event detection (SED) algorithms.
- Implemented using pytorch, refactored from the DCASE official pytorch-lighting baseline
- Kaldi style recipes;
- [TODO] Support for commonly used datasets in the sound event detection field, including DESED, MAESTRO, audioset-strong, etc.
MAT-SED (Masked Audio Transformer for Sound Event Detection) is a pure Transformer-based SED model with masked-reconstruction-based pre-training.
Prototype based Masked Audio Model (PMAM) is a self-supervised representation learning algorithm designed for frame-level audio tasks like sound event detection, to better exploit unlabeled data.