STAR: Speech Translation and Recognition Framework

Weiting Tan, Yunmo Chen, Tongfei Chen, Guanghui Qin, Haoran Xu, Heidi C. Zhang, Benjamin Van Durme, and Philipp Koehn

STAR is a research project that targets low-latency speech translation/transcription using a segmenter module learned from cross-attention feedback from Transformer Encoder-Decoder Model.

The project is built using poetry for streamlined dependency management and reproducible environments. To install the environment, pull our codebase and run poetry install or you can setup your own env using conda and other tool by referring to our pyproject.toml config.

In this codebase, we refer to the module as “nugget” rather than “star.” This naming reflects our use of the cross-attention feedback mechanism first proposed in Nugget: Neural Agglomerative Embeddings of Text, now extended to speech-to-text tasks.

🔧 Project Structure

STAR/
├── src/                     # Core source code
│   ├── train_simul.py       # Simultaneous S2T
│   ├── train_w2v.py         # Non-Streaming S2T
│   ├── lightning/           # PyTorch Lightning trainer modules
│   │   ├── simul_trainer.py
│   │   └── wav2vec_trainer.py
│   ├── data_utils/          # Data loading and preprocessing
│   │   └── data_module.py
│   │   └── preprocess_{dataset}.py # data preprocessing script is left here
│   ├── models/              # Customized Transformer and CTC model Code
│   │   └── my_transformer.sh
│   │   └── my_wav2vec.sh
|
├── scripts/                 # Example run scripts
│   ├── simul_s2t.sh         # Simultaneous S2T Entry
│   └── non_streaming.sh     # Non-streaming S2T Entry
|
├── pyproject.toml           # Poetry configuration
├── README.md                # Project overview
└── LICENSE                  # License information

Experiments

Non-Streaming Compression

In section 3 of our paper, we present non-streaming experiments. The training script can be found in /scripts/non_streaming.sh and the argument --nugget_compress_rate controls the compression rate. For details of the segmenter training, please track how scorer_logits are updated in my_transformer.py.

Simultaneous Speech-to-Text Experiments

The training script is provided in simul_s2t.sh. The training is overall very similar to non-streaming compression, except that we add regularization to the scores following CIF so that the number of activation is close to the number of target tokens. Infinite-lookback can be activated by setting --use_ilk to True.

If you find our work useful, please cite:

@misc{tan2024streaming,
      title={Streaming Sequence Transduction through Dynamic Compression}, 
      author={Weiting Tan and Yunmo Chen and Tongfei Chen and Guanghui Qin and Haoran Xu and Heidi C. Zhang and Benjamin Van Durme and Philipp Koehn},
      year={2024},
      eprint={2402.01172},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
compression.png		compression.png
image.png		image.png
poetry.lock		poetry.lock
poetry.toml		poetry.toml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

STAR: Speech Translation and Recognition Framework

🔧 Project Structure

Experiments

Non-Streaming Compression

Simultaneous Speech-to-Text Experiments

If you find our work useful, please cite:

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

steventan0110/STAR

Folders and files

Latest commit

History

Repository files navigation

STAR: Speech Translation and Recognition Framework

🔧 Project Structure

Experiments

Non-Streaming Compression

Simultaneous Speech-to-Text Experiments

If you find our work useful, please cite:

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages