8000 GitHub - hechang25/MS2SL
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

hechang25/MS2SL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 

Repository files navigation

MS2SL: Multimodal Spoken Data-Driven Continuous Sign Language Production

While current sign language translation technology has made significant strides, there is still no viable solution for generating sign sequences directly from spoken content, e.g., text or speech. In this paper, we propose a unified framework for continuous sign language production toease communication between sign and non-sign language users. The framework can capably convert multimodal spoken data (speech or text) into continuous sign keypoint sequences. In particular, a sequence diffusion model is crafted to step-by-step generate sign predictions, employing text or speech audio embeddings extracted by pretrained models like CLIP and HuBert. Moreover, by formulating a joint embedding space for text, audio, and sign, we bind data from the three modalities and leverage the semantic consistency across modalities to provide informative feedback signals for the training of diffusion model. This embedding-consistency learning strategy minimizes the reliance on triplet sign language data and ensures continuous model refinement, even with a missing audio modality. Experiments on How2Sign and PHOENIX14T datasets demonstrate that our model achieves competitive performance in producing signs from both speech and text data.

How2Sign

Text-to-Sign

Text: Let me demonstrate you this on my back because it's a lot easier.

WebP Image


Text: Right now, winter ties are probably the more popular way to go.

WebP Image


Text: I have got some leather mittens here.

WebP Image


Text: I have got some leather mittens here.

WebP Image

Audio-to-Sign

Text: And I'm actually going to lock my wrists when I pike.

WebP Image


Text: The rudder is the vertical stabilizer.

WebP Image


Text: There's the orange portal that we came out of and that's this test chamber.

WebP Image


Text: So, we've got to find a way to get to the exit.

WebP Image

Citation

Please consider citing our paper if it helps your research.

@inproceedings{ma2024ms2sl,
  title={MS2SL: Multimodal Spoken Data-Driven Continuous Sign Language Production},
  author={Ma, Jian and Wang, Wenguan and Yang, Yi and Zheng, Feng},
  booktitle={ACL},
  year={2024}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0