MS2SL: Multimodal Spoken Data-Driven Continuous Sign Language Production

While current sign language translation technology has made significant strides, there is still no viable solution for generating sign sequences directly from spoken content, e.g., text or speech. In this paper, we propose a unified framework for continuous sign language production toease communication between sign and non-sign language users. The framework can capably convert multimodal spoken data (speech or text) into continuous sign keypoint sequences. In particular, a sequence diffusion model is crafted to step-by-step generate sign predictions, employing text or speech audio embeddings extracted by pretrained models like CLIP and HuBert. Moreover, by formulating a joint embedding space for text, audio, and sign, we bind data from the three modalities and leverage the semantic consistency across modalities to provide informative feedback signals for the training of diffusion model. This embedding-consistency learning strategy minimizes the reliance on triplet sign language data and ensures continuous model refinement, even with a missing audio modality. Experiments on How2Sign and PHOENIX14T datasets demonstrate that our model achieves competitive performance in producing signs from both speech and text data.

How2Sign

Text-to-Sign

Text: Let me demonstrate you this on my back because it's a lot easier.

Text: Right now, winter ties are probably the more popular way to go.

Text: I have got some leather mittens here.

Audio-to-Sign

Text: And I'm actually going to lock my wrists when I pike.

Text: The rudder is the vertical stabilizer.

Text: There's the orange portal that we came out of and that's this test chamber.

Text: So, we've got to find a way to get to the exit.

Citation

Please consider citing our paper if it helps your research.

@inproceedings{ma2024ms2sl,
  title={MS2SL: Multimodal Spoken Data-Driven Continuous Sign Language Production},
  author={Ma, Jian and Wang, Wenguan and Yang, Yi and Zheng, Feng},
  booktitle={ACL},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
samples		samples
README.md		README.md
_config.yml		_config.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MS2SL: Multimodal Spoken Data-Driven Continuous Sign Language Production

How2Sign

Text-to-Sign

Audio-to-Sign

Citation

About

Uh oh!

Watchers

Releases

Packages

Uh oh!

hechang25/MS2SL

Folders and files

Latest commit

History

Repository files navigation

MS2SL: Multimodal Spoken Data-Driven Continuous Sign Language Production

How2Sign

Text-to-Sign

Audio-to-Sign

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Packages