This is the source code repository for the paper "Whispering Under the Eaves: Protecting User Privacy Against Commercial and LLM-powered Automatic Speech Recognition Systems". The paper has been accepted to the 34th USENIX Security Symposium, 2025.
AudioShield leverages transferable universal adversarial perturbation in latent space (LS-TUAP) to provide real-time speech privacy protection services for users, while meeting the three key requirements: real-time performance, model-agnosticism, and high audio quality. We provide a demo page at https://sites.google.com/view/lstuap.
To run the code, ensure the following dependencies are installed:
- Python == 3.8
- PyTorch == 2.2.2
- CUDA == 12.2
Install espeak:
apt-get install espeak
Create conda environment:
conda create -n AudioShield python=3.8
Then the required dependencies can be installed by running:
pip install -r requirements.txt
Build Monotonic Alignment Search for VITS model:
cd monotonic_align
python setup.py build_ext --inplace
- The VITS model is used as the Autoencoder. The pre-trained model can be found here.
- DeepSpeech2 is employed as the local target model. The implemented version is available here.
Download the VITS, Deepspeech2 for AudioShield, and ensure they are in the following folders:
pretrained/vits/
pretrained/deepspeech
The paths can be changed for your own, but make sure the paths are consistent with which are set in protection.json
.
The Librispeech
dataset can be downloaded from here. The dev-clean
subset is used in this implementation.
- Execute
python data_preprocessing.py
to process the raw dataset. - Navigate to the
datasets
folder and runpython librispeech.py
to process the latent code data.
- Return to the main directory and execute
python train.py --tgt_text "OPEN THE DOOR"
for a quick training session. - Alternatively, the following command allows manual configuration of arguments:
python train.py \
--training_iters <number_of_iterations> \
--tau <tau_hyperparameter> \
--device <device_type> \
--tgt_text <target_text> \
--output_dir <output_directory>
- After training is complete, use the saved perturbation for evaluation. The
ptb_path
parameter is the path where the perturbation is stored, andoutput_dir
is the path where the evaluated audio files will be saved. - Run the evaluation script with the following command line:
python eval.py \
--ptb_path "LS_TUAP.pth" \
--device "cuda:0" \
--output_dir "./results" \
--sampling_rate 16000
Part of the implementation is built on VITS and DeepSpeech2. Acknowledgment goes to their outstanding contributions.
This project is licensed under the MIT License. See the LICENSE
file for more details.
The adversarial examples generated by AudioShield are prohibited from being used for malicious purposes, such as disrupting the normal and legitimate use of ASR systems. Any consequences arising from such misuse are the sole responsibility of the user and are not affiliated with the paper publisher or authors.
@inproceedings{jin2025whispering,
author = {Jin, Weifei and Cao, Yuxin and Su, Junjie and Wang, Derui and Zhang, Yedi and Xue, Minhui and Hao, Jie and Dong, Jin Song and Yang, Yixian},
title = {Whispering Under the Eaves: Protecting User Privacy Against Commercial and LLM-powered Automatic Speech Recognition
5CB0
Systems},
booktitle = {34th USENIX Security Symposium (USENIX Security 25)},
year = {2025},
address = {Seattle, WA, USA}
}