10000 SerenaXatu (WANG XIAORAN) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View SerenaXatu's full-sized avatar
  • Shanghai International Studies University @shanghai international studies university
  • 21:13 (UTC +08:00)

Highlights

  • Pro

Block or report SerenaXatu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Pre-Training with Whole Word Masking for Chinese BERT(中文BERT-wwm系列模型)

Python 10,021 1,395 Updated Jul 31, 2023

所有小初高、大学PDF教材。

Roff 44,652 9,982 Updated May 18, 2025

ChineseDiachronicCorpus,中文历时语料库,横跨六十余年,包括腾讯历时新闻2000-2016,人民日报历时语料1946-2003,参考消息历时语料1957-2002。基于历时流通语料库,可用于历时语言变化计算、语言监测、社会文化变迁研究提供基础性的语料支持。

16 57 Updated Jan 10, 2021

人民日报(1946-2024)、习近平系列重要讲话数据库、古诗文

66 2 Updated Mar 23, 2025

QuanSyn: A Python Package for Quantitative Syntax Analysis.

Python 36 3 Updated Apr 13, 2025

A toolkit for discourse segmentation (EDU segmentation).

Python 102 45 Updated Mar 24, 2023

Repository for the CommonLit Ease of Readability Corpus

24 5 Updated Apr 17, 2024

text2vec, text to vector. 文本向量表征工具,把文本转化为向量矩阵,实现了Word2Vec、RankBM25、Sentence-BERT、CoSENT等文本表征、文本相似度计算模型,开箱即用。

Python 4,789 417 Updated Jun 13, 2025

📙 中华新华字典数据库。包括歇后语,成语,词语,汉字。

Python 11,256 2,628 Updated Dec 26, 2023

CTRD is a new Chinese Theme-Rheme Discourse Dataset for Chinese discourse analysis, which contains 525 manually annotated news articles, i.e. totally 45,591 sentences, extracted from OntoNotes 4.0.…

6 Updated Aug 14, 2021

史上最大规模1.4亿中文知识图谱开源下载

Python 5,057 729 Updated Dec 6, 2023

Code for building ConceptNet from raw data.

Roff 2,867 353 Updated Jan 19, 2023

AlphaReadabilityChinese is a tool that calculates the readability of Chinese texts, which includes indices at lexical, syntactic, and semantic levels.

30 3 Updated Mar 30, 2024

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Python 146,874 29,625 Updated Jul 13, 2025

这是我本人维护的pyhanlp的用户指南。旨在帮助你快速上手和掌握pyhanlp。

Jupyter Notebook 32 9 Updated Jul 19, 2019

中文分词

Python 3,188 803 Updated Jan 16, 2025

中文分词 词性标注 命名实体识别 依存句法分析 成分句法分析 语义依存分析 语义角色标注 指代消解 风格转换 语义相似度 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁转换 自然语言处理

Python 35,355 10,692 Updated May 15, 2025

python爬虫,目前库存:网易云音乐歌曲爬取,B站视频爬取,知乎问答爬取,壁纸爬取,xvideos视频爬取,有声书爬取,微博爬虫,安居客信息爬取+数据可视化,哔哩哔哩视频封面提取器,ip代理池封装,知乎百万级用户爬虫+数据分析,github用户爬虫

Python 1,476 235 Updated Apr 23, 2024

小红书爬虫数据采集,小红书全域运营解决方案

JavaScript 2,183 385 Updated Jun 7, 2025
0