Stars
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Implementation of GraphReader paper: https://arxiv.org/abs/2406.14550
Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
This repository contains the NarrativeQA dataset. It includes the list of documents with Wikipedia summaries, links to full stories, and questions and answers.
Repository for MuSiQue: Multi-hop Questions via Single-hop Question Composition, TACL 2022
Supercharge Your LLM Application Evaluations 🚀
[ICLR 2025] LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs
A comic app built with Flutter, supporting multiple comic sources.
Benchmark various LLM Structured Output frameworks: Instructor, Mirascope, Langchain, LlamaIndex, Fructose, Marvin, Outlines, etc on tasks like multi-label classification, named entity recognition,…
[EMNLP 2022] An Open Toolkit for Knowledge Graph Extraction and Construction
A modular graph-based Retrieval-Augmented Generation (RAG) system
[Paper List] Papers integrating knowledge graphs (KGs) and large language models (LLMs)
Docs2KG: A Human-LLM Collaborative Approach to Unified Knowledge Graph Construction from Heterogeneous Documents
Free ChatGPT&DeepSeek API Key,免费ChatGPT&DeepSeek API。免费接入DeepSeek API和GPT4 API,支持 gpt | deepseek | claude | gemini | grok 等排名靠前的常用大模型。
[EMNLP 2023] Adapting Language Models to Compress Long Contexts
Retrieval and Retrieval-augmented LLMs
Question and Answer based on Anything.
Toolkit for Prompt Compression
[EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
Awesome multilingual OCR and Document Parsing toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools,…
A @ClickHouse fork that supports high-performance vector search and full-text search.
BISHENG is an open LLM devops platform for next generation Enterprise AI applications. Powerful and comprehensive features include: GenAI workflow, RAG, Agent, Unified model management, Evaluation,…
该项目包括一个基于 GPT 等大语言模型的长篇小说生成器,同时还有各类小说生成 Prompt 以及教程。我们欢迎社区贡献,持续更新以提供最佳的小说创作体验。