Improving Distantly Supervised Relation Extraction with Self-Ensemble Noise Filtering

Tapas Nayak, Navonil Majumder, Soujanya Poria

Abstract

Distantly supervised models are very popular for relation extraction since we can obtain a large amount of training data using the distant supervision method without human annotation. In distant supervision, a sentence is considered as a source of a tuple if the sentence contains both entities of the tuple. However, this condition is too permissive and does not guarantee the presence of relevant relation-specific information in the sentence. As such, distantly supervised training data contains much noise which adversely affects the performance of the models. In this paper, we propose a self-ensemble filtering mechanism to filter out the noisy samples during the training process. We evaluate our proposed framework on the New York Times dataset which is obtained via distant supervision. Our experiments with multiple state-of-the-art neural relation extraction models show that our proposed filtering mechanism improves the robustness of the models and increases their F1 scores.

Anthology ID:: 2021.ranlp-1.116
Volume:: Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)
Month:: September
Year:: 2021
Address:: Held Online
Editors:: Ruslan Mitkov, Galia Angelova
Venue:: RANLP
SIG:
Publisher:: INCOMA Ltd.
Note:
Pages:: 1031–1039
Language:
URL:: https://aclanthology.org/2021.ranlp-1.116/
DOI:
Bibkey:
Cite (ACL):: Tapas Nayak, Navonil Majumder, and Soujanya Poria. 2021. Improving Distantly Supervised Relation Extraction with Self-Ensemble Noise Filtering. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), pages 1031–1039, Held Online. INCOMA Ltd..
Cite (Informal):: Improving Distantly Supervised Relation Extraction with Self-Ensemble Noise Filtering (Nayak et al., RANLP 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.ranlp-1.116.pdf
Code: nayakt/SENF4DSRE

PDF Cite Search Code Fix data