8000 GitHub - JingMog/RFL-MSD: [AAAI'25 Oral] "RFL: Simplifying Chemical Structure Recognition with Ring-Free Language".
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

JingMog/RFL-MSD

Repository files navigation

🔥 RFL: Simplifying Chemical Structure Recognition with Ring-Free Language 🔥

This is the official implementation of our paper: "RFL: Simplifying Chemical Structure Recognition with Ring-Free Language". Accepted by AAAI 2025 oral.

Paper arxiv: Paper

🔥 News:

  • 2025.01.20. Our paper is selected as AAAI 2025 oral, congratulations 👏👏👏.
  • The source code including training and inference has relase.

TODO:

  • Update paper link in arxiv.
  • Update Source Code.

⭐ Overview

The primary objective of Optical Chemical Structure Recognition is to identify chemical structure images into corresponding markup sequences. In this work, we propose a novel Ring-Free Language (RFL), which utilizes a divide-and-conquer strategy to describe chemical structures in a hierarchical form. RFL allows complex molecular structures to be decomposed into multiple parts. This approach significantly reduces the learning difficulty for recognition models. Leveraging RFL, we propose a universal Molecular Skeleton Decoder (MSD), which comprises a skeleton generation module that progressively predicts the molecular skeleton and individual rings, along with a branch classification module for predicting branch information. Experimental results demonstrate that the proposed RFL and MSD can be applied to various mainstream methods, achieving superior performance compared to state-of-the-art approaches in both printed and handwritten scenarios.

Comparasion of RFL with previous modeling language:

Introduction

Our Model Architecture:

model architecture

🎈 Datasets

In Our paper, we use two dataset as follows.

  • EDU-CHEMC : A dataset for handwritten chemical structure recognition.
  • Mini-CASIA-CSDB : A dataset for printed chemical structure recognition.

📝 Ring-Free Language

Our Ring-Free Language (RFL) utilizes a divide-and-conquer strategy to describe chemical structures in a hierarchical form. For a molecular structure $G$, it will be equivalently converted into a molecular skeleton $S$, individual ring structures $R$ and branch information $F$.

You can use the following command to generate Ring-Free Language of single samples. We have provided some typical examples for testing in ./RFL/RFL.py:

cd RFL
python RFL.py

Batch generation of multiple process using mutli-processings:

cd RFL
bash RFL_gen.sh

💡 Training

You can start training using the following command:

bash train.sh

Note: The dataset path and related paramaters need to be modified in rain\config.py

✈️ Evalutation

bash test_organic.sh

🚀 Experiment Results

Comparison with state-of-the-art methods on handwritten dataset (EDU-CHEMC) and printed dataset (Mini-CASIA-CSDB).

Result

Ablation study on the EDU-CHEMC dataset, with all systems based on MSD-DenseWAP.

System MSD [conn] EM Struct-EM
T1 × × 38.70 49.45
T2 × 44.02 55.77
T3 × 52.76 58.58
T4 64.96 73.15

To prove that RFL and MSD can simplify molecular structure recognition and enhance generalization ability, we design experiments on molecule complexity.

Generalization

Exact match rate (in %) of DenseWAP and MSD-DenseWAP along test sets with different structural complexity. The left subplot is trained on complexity {1,2}, and the right subplot is trained on complexity {1,2,3}.

Case Study:

Case Study

📰 Citation

If you find our work is useful in your research, please consider citing:

@inproceedings{chang2025rfl,
  title={RFL: Simplifying Chemical Structure Recognition with Ring-Free Language},
  author={Chang, Qikai and Chen, Mingjun and Pi, Changpeng and Hu, Pengfei and Zhang, Zhenrong and Ma, Jiefeng and Du, Jun and Yin, Baocai and Hu, Jinshui},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={39},
  number={2},
  pages={2007--2015},
  year={2025}
}

If you have any question, please feel free to contact me: qkchang@mail.ustc.edu.cn

About

[AAAI'25 Oral] "RFL: Simplifying Chemical Structure Recognition with Ring-Free Language".

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0