AnyTrans: Translate AnyText in the Image with Large Scale Models

Zhipeng Qian, Pei Zhang, Baosong Yang, Kai Fan, Yiwei Ma, Derek F. Wong, Xiaoshuai Sun, Rongrong Ji

Abstract

This paper introduces AnyText, an all-encompassing framework for the task–In-Image Machine Translation (IIMT), which includes multilingual text translation and text fusion within images. Our framework leverages the strengths of large-scale models, such as Large Language Models (LLMs) and text-guided diffusion models, to incorporate contextual cues from both textual and visual elements during translation. The few-shot learning capability of LLMs allows for the translation of fragmented texts by considering the overall context. Meanwhile, diffusion models’ advanced inpainting and editing abilities make it possible to fuse translated text seamlessly into the original image while preserving its style and realism. Our framework can be constructed entirely using open-source models and requires no training, making it highly accessible and easily expandable. To encourage advancement in the IIMT task, we have meticulously compiled a test dataset called MTIT6, which consists of multilingual text image translation data from six language pairs.

Anthology ID:: 2024.findings-emnlp.137
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2024
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2432–2444
Language:
URL:: https://aclanthology.org/2024.findings-emnlp.137/
DOI:: 10.18653/v1/2024.findings-emnlp.137
Bibkey:
Cite (ACL):: Zhipeng Qian, Pei Zhang, Baosong Yang, Kai Fan, Yiwei Ma, Derek F. Wong, Xiaoshuai Sun, and Rongrong Ji. 2024. AnyTrans: Translate AnyText in the Image with Large Scale Models. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 2432–2444, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: AnyTrans: Translate AnyText in the Image with Large Scale Models (Qian et al., Findings 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.findings-emnlp.137.pdf
Software:: 2024.findings-emnlp.137.software.zip
Data:: 2024.findings-emnlp.137.data.zip

PDF Cite Search Software Data Fix data