8000 GitHub - ydc/ctrd: CTRD is a new Chinese Theme-Rheme Discourse Dataset for Chinese discourse analysis, which contains 525 manually annotated news articles, i.e. totally 45,591 sentences, extracted from OntoNotes 4.0. Different from Penn Discourse TreeBank (PDTB) and the datasets based on Rhetorical Structure Theory (RST), CTRD was annotated according to a novel discourse annotation scheme for Chinese based on Halliday’s Systemic Functional Grammar (SFG) and Thematic Progression Patterns (TPP).
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
/ ctrd Public

CTRD is a new Chinese Theme-Rheme Discourse Dataset for Chinese discourse analysis, which contains 525 manually annotated news articles, i.e. totally 45,591 sentences, extracted from OntoNotes 4.0. Different from Penn Discourse TreeBank (PDTB) and the datasets based on Rhetorical Structure Theory (RST), CTRD was annotated according to a novel di…

License

Notifications You must be signed in to change notification settings

ydc/ctrd

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Introduction

CTRD is a new Chinese Theme-Rheme Discourse Dataset for Chinese discourse analysis, which contains 525 manually annotated news articles, i.e. totally 45,591 sentences, extracted from OntoNotes 4.0. Different from Penn Discourse TreeBank (PDTB) and the datasets based on Rhetorical Structure Theory (RST), CTRD was annotated according to a novel discourse annotation scheme for Chinese based on Halliday’s Systemic Functional Grammar (SFG) and Thematic Progression Patterns (TPP).

Citations

When you use the dataset in your work, would you please cite the following papers:

[1] Yiqi Tong, Jiangbin Zheng, Hongkang Zhu, Yidong Chen, Xiaodong Shi. A Document-Level Neural Machine Translation Model with Dynamic Caching Guided by Theme-Rheme Information. In: Proceedings of the 28th International Conference on Computational Linguistics (COLING 2020), Barcelona, Spain (Online), Dec. 8-13, 2020, pp. 4385–4395.

[2] Biao Fu, Yiqi Tong, Dawei Tian, Yidong Chen, Xiaodong Shi, Ming Zhu. CTRD: A Chinese Theme-Rheme Discourse Dataset. In: Proceedings of the 10th CCF International Conference on Natural Language Processing and Chinese Computing (NLPCC 2021), Qingdao, China, Oct. 13-17, 2021.

About

CTRD is a new Chinese Theme-Rheme Discourse Dataset for Chinese discourse analysis, which contains 525 manually annotated news articles, i.e. totally 45,591 sentences, extracted from OntoNotes 4.0. Different from Penn Discourse TreeBank (PDTB) and the datasets based on Rhetorical Structure Theory (RST), CTRD was annotated according to a novel di…

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0