Introduction

CTRD is a new Chinese Theme-Rheme Discourse Dataset for Chinese discourse analysis, which contains 525 manually annotated news articles, i.e. totally 45,591 sentences, extracted from OntoNotes 4.0. Different from Penn Discourse TreeBank (PDTB) and the datasets based on Rhetorical Structure Theory (RST), CTRD was annotated according to a novel discourse annotation scheme for Chinese based on Halliday’s Systemic Functional Grammar (SFG) and Thematic Progression Patterns (TPP).

Citations

When you use the dataset in your work, would you please cite the following papers:

[1] Yiqi Tong, Jiangbin Zheng, Hongkang Zhu, Yidong Chen, Xiaodong Shi. A Document-Level Neural Machine Translation Model with Dynamic Caching Guided by Theme-Rheme Information. In: Proceedings of the 28th International Conference on Computational Linguistics (COLING 2020), Barcelona, Spain (Online), Dec. 8-13, 2020, pp. 4385–4395.

[2] Biao Fu, Yiqi Tong, Dawei Tian, Yidong Chen, Xiaodong Shi, Ming Zhu. CTRD: A Chinese Theme-Rheme Discourse Dataset. In: Proceedings of the 10th CCF International Conference on Natural Language Processing and Chinese Computing (NLPCC 2021), Qingdao, China, Oct. 13-17, 2021.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
LICENSE		LICENSE
README.md		README.md
xml说明.docx		xml说明.docx
语料v1.2.1.rar		语料v1.2.1.rar

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Introduction

Citations

About

Uh oh!

Releases

Packages

License

ydc/ctrd

Folders and files

Latest commit

History

Repository files navigation

Introduction

Citations

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages