Stars
Save matplotlib figures as TikZ/PGFplots for smooth integration into LaTeX.
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
Domain adaptation made easy. Fully featured, modular, and customizable.
Croissant is a high-level format for machine learning datasets that brings together four rich layers.
DSPy: The framework for programming—not prompting—language models
A browser automation framework and ecosystem.
Scrapy, a fast high-level web crawling & scraping framework for Python.
Code for 'LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders'
Given a scholarly PDF, extract figures, tables, captions, and section titles.
A guidance language for controlling large language models.
This is the base repo for generation single-page annotations
A fast inference library for running LLMs locally on modern consumer-class GPUs
Test Software for the Characterization of AI Technologies
Always know what to expect from your data.
⚡ Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io
A small Python module for determining appropriate platform-specific dirs, e.g. a "user data dir".
Scripts and docs that help us run cost effective experiment with OpenAI APIs
Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]
Official repository for the EMNLP: Findings Paper “On Event Individuation for Document-Level Information Extraction"
Pretrain, finetune ANY AI model of ANY size on multiple GPUs, TPUs with zero code changes.
Machine learning metrics for distributed, scalable PyTorch applications.
Fit interpretable models. Explain blackbox machine learning.
ShellCheck, a static analysis tool for shell scripts