DuplexMamba

Architecture

Prerequisites

Install Packages

conda create --name DuplexMamba python=3.9
conda activate DuplexMamba
pip install -r requirements.txt
pip install -e src/transformers/
pip install -e src/speechbrain/

You may need to install lower or higher versions of torch, torchaudio, causal-conv1d and mamba-ssm based on your hardware and system. Make sure they are compatible. If the installation of causal-conv1d and mamba-ssm fails, you can manually download the corresponding .whl files from causal-conv1d releases and mamba releases and install them.

Pretrained Model and Checkpoints

Download the mamba-2.8b-hf into the model folder, then run:
```
python safetensor2bin.py
```
Download the checkpoint of our trained ASR model and the checkpoints for all four stages of the DuplexMamba model from DuplexMamba and save them in the checkpoints folder. If you only need the model for inference, you can simply download the Stage 4 checkpoint.

Training

datasets

Our training code requires all data to be stored in a format similar to LibriSpeech.
For the raw data of Stage 1 and Stage 2, you can download LibriSpeech, TED-LIUM, mls_eng_10k, and VoiceAssistant-400K.
The state discrimination dataset we used can be accessed here.
For the preprocessed data for Stage 3 and Stage 4, you can download it from here.

Stage1 Multimodal Alignment:

torchrun --nproc-per-node 6 train_stage1.py hparams/S2S/train_stage1.yaml --data_folder <YOUR_PATH_TO_DATASETS> --precision bf16

Stage2 Multimodal Instruction Tuning:

torchrun --nproc-per-node 6 train_stage2.py hparams/S2S/train_stage2.yaml --data_folder <YOUR_PATH_TO_DATASETS> --precision bf16

Stage3 Input State Discrimination:

torchrun --nproc-per-node 6 train_stage3.py hparams/S2S/train_stage3.yaml --data_folder <YOUR_PATH_TO_DATASETS> --precision bf16

Stage4 Streaming Alignment:

torchrun --nproc-per-node 1 train_stage4.py hparams/S2S/train_stage4.yaml --data_folder <YOUR_PATH_TO_DATASETS> --precision bf16

Inference

python CustomGenerator.py duplex/duplex.yaml --precision bf16 --wav_path example/rlhf-57762.flac

We also provide the duplex_voice_assistant() method in the duplex_inference.py script for simulating duplex conversations. You can modify wav_list on line 236 and output_dir on line 239 of the script, then run the following command to start the experiment:

python duplex_inference.py duplex/duplex.yaml --precision bf16

A simple Case

Acknowledgement

We acknowledge the wonderful work of Mamba, Vision Mamba, and ConMamba. We borrowed their implementation of Mamba, bidirectional Mamba, and ConMamba. The training recipes are adapted from SpeechBrain.

Citation

If you find this work helpful, please consider citing:

@misc{lu2025duplexmambaenhancingrealtimespeech,
      title={DuplexMamba: Enhancing Real-time Speech Conversations with Duplex and Streaming Capabilities}, 
      author={Xiangyu Lu and Wang Xu and Haoyu Wang and Hongyun Zhou and Haiyan Zhao and Conghui Zhu and Tiejun Zhao and Muyun Yang},
      year={2025},
      eprint={2502.11123},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.11123}, 
}

License

This project is licensed under the GNU General Public License v3.0. It is based on Mamba-ASR, which is also licensed under the GPL.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DuplexMamba

Architecture

Prerequisites

Install Packages

Pretrained Model and Checkpoints

Training

datasets

Inference

A simple Case

Acknowledgement

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
duplex		duplex
example		example
figures		figures
hparams/S2S		hparams/S2S
modules		modules
src		src
.gitignore		.gitignore
CustomGenerator.py		CustomGenerator.py
LICENSE		LICENSE
README.md		README.md
duplex_inference.py		duplex_inference.py
gpt_omni_prepare.py		gpt_omni_prepare.py
requirements.txt		requirements.txt
safetensor2bin.py		safetensor2bin.py
stage1_datasets_prepare.py		stage1_datasets_prepare.py
train_stage1.py		train_stage1.py
train_stage2.py		train_stage2.py
train_stage3.py		train_stage3.py
train_stage4.py		train_stage4.py

License

khfs/DuplexMamba

Folders and files

Latest commit

History

Repository files navigation

DuplexMamba

Architecture

Prerequisites

Install Packages

Pretrained Model and Checkpoints

Training

datasets

Inference

A simple Case

Acknowledgement

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages