-
Shanghai International Studies University @shanghai international studies university
-
21:13
(UTC +08:00)
Highlights
- Pro
Stars
Pre-Training with Whole Word Masking for Chinese BERT(中文BERT-wwm系列模型)
ChineseDiachronicCorpus,中文历时语料库,横跨六十余年,包括腾讯历时新闻2000-2016,人民日报历时语料1946-2003,参考消息历时语料1957-2002。基于历时流通语料库,可用于历时语言变化计算、语言监测、社会文化变迁研究提供基础性的语料支持。
QuanSyn: A Python Package for Quantitative Syntax Analysis.
A toolkit for discourse segmentation (EDU segmentation).
Repository for the CommonLit Ease of Readability Corpus
text2vec, text to vector. 文本向量表征工具,把文本转化为向量矩阵,实现了Word2Vec、RankBM25、Sentence-BERT、CoSENT等文本表征、文本相似度计算模型,开箱即用。
CTRD is a new Chinese Theme-Rheme Discourse Dataset for Chinese discourse analysis, which contains 525 manually annotated news articles, i.e. totally 45,591 sentences, extracted from OntoNotes 4.0.…
Code for building ConceptNet from raw data.
AlphaReadabilityChinese is a tool that calculates the readability of Chinese texts, which includes indices at lexical, syntactic, and semantic levels.
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
这是我本人维护的pyhanlp的用户指南。旨在帮助你快速上手和掌握pyhanlp。
中文分词 词性标注 命名实体识别 依存句法分析 成分句法分析 语义依存分析 语义角色标注 指代消解 风格转换 语义相似度 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁转换 自然语言处理
python爬虫,目前库存:网易云音乐歌曲爬取,B站视频爬取,知乎问答爬取,壁纸爬取,xvideos视频爬取,有声书爬取,微博爬虫,安居客信息爬取+数据可视化,哔哩哔哩视频封面提取器,ip代理池封装,知乎百万级用户爬虫+数据分析,github用户爬虫