CTRD is a new Chinese Theme-Rheme Discourse Dataset for Chinese discourse analysis, which contains 525 manually annotated news articles, i.e. totally 45,591 sentences, extracted from OntoNotes 4.0.…

6 Updated Aug 14, 2021

ownthink / KnowledgeGraphData

史上最大规模1.4亿中文知识图谱开源下载

Python 5,057 729 Updated Dec 6, 2023

commonsense / conceptnet5

Code for building ConceptNet from raw data.

Roff 2,867 353 Updated Jan 19, 2023

leileibama / AlphaReadabilityChinese

AlphaReadabilityChinese is a tool that calculates the readability of Chinese texts, which includes indices at lexical, syntactic, and semantic levels.

30 3 Updated Mar 30, 2024

huggingface / transformers

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Python 146,874 29,625 Updated Jul 13, 2025

TianFengshou / pyhanlp_user_guide

这是我本人维护的pyhanlp的用户指南。旨在帮助你快速上手和掌握pyhanlp。

Jupyter Notebook 32 9 Updated Jul 19, 2019

hankcs / pyhanlp

中文分词

Python 3,188 803 Updated Jan 16, 2025

hankcs / HanLP

中文分词词性标注命名实体识别依存句法分析成分句法分析语义依存分析语义角色标注指代消解风格转换语义相似度新词发现关键词短语提取自动摘要文本分类聚类拼音简繁转换自然语言处理

Python 35,355 10,692 Updated May 15, 2025

srx-2000 / spider_collection

python爬虫，目前库存：网易云音乐歌曲爬取，B站视频爬取，知乎问答爬取，壁纸爬取，xvideos视频爬取，有声书爬取，微博爬虫，安居客信息爬取+数据可视化，哔哩哔哩视频封面提取器，ip代理池封装，知乎百万级用户爬虫+数据分析，github用户爬虫

Python 1,476 235 Updated Apr 23, 2024

cv-cat / Spider_XHS

小红书爬虫数据采集，小红书全域运营解决方案

JavaScript 2,183 385 Updated Jun 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WANG XIAORAN SerenaXatu

Highlights

Block or report SerenaXatu

Stars

ymcui / Chinese-BERT-wwm

TapXWorld / ChinaTextbook

yanshanjing / ChineseDiachronicCorpus

prnake / CialloCorpus

YuhuYang / QuanSyn

PKU-TANGENT / NeuralEDUSeg

scrosseye / CLEAR-Corpus

shibing624 / text2vec

pwxcoo / chinese-xinhua

ydc / ctrd