- All languages
- Bicep
- Blade
- C
- C#
- C++
- CSS
- Clojure
- Dart
- Dockerfile
- Elixir
- Gherkin
- Go
- Groovy
- HCL
- HTML
- Haskell
- Java
- JavaScript
- Jinja
- Jsonnet
- Jupyter Notebook
- Kotlin
- Lua
- MDX
- MLIR
- Makefile
- Markdown
- OCaml
- PHP
- Perl
- PostScript
- PowerShell
- Python
- Ruby
- Rust
- SCSS
- Scala
- Shell
- Starlark
- Swift
- TSQL
- TypeScript
- Visual Basic .NET
Starred repositories
Code repository for O'Reilly book
Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
LexEval: A Comprehensive Benchmark for Evaluating Large Language Models in Legal Domain
This repository contains the code and implementation details of the CascadeTabNet paper "CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents"
Distributed crawling infrastructure running on top of severless computation, cloud storage (such as S3) and sophisticated queues.
A Python module to scrape several search engines (like Google, Yandex, Bing, Duckduckgo, ...). Including asynchronous networking support.
Javascript scraping module based on puppeteer for many different search engines...
The ultimate LLM/AI application development framework in Golang.
The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.
Statistics of Common Crawl monthly archives mined from URL index files
Automatic extraction of relevant features from time series:
A research prototype of a human-centered web agent
Memory for AI Agents; Announcing OpenMemory MCP - local and secure memory management.
Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]
TextGrad: Automatic ''Differentiation'' via Text -- using large language models to backpropagate textual gradients.
🧠 Curated collection of system prompts for top AI tools. Perfect for AI agent builders and prompt engineers. Incuding: ChatGPT, Claude, Perplexity, Manus, Claude-Code, Loveable, v0, Grok, same new,…
Python & JS/TS SDK for running AI-generated code/code interpreting in your AI app
Surf is a computer use AI agent powered by OpenAI that interacts with a E2B's virtual desktop environment through natural language instructions
DeerFlow is a community-driven Deep Research framework, combining language models with tools like web search, crawling, and Python execution, while contributing back to the open-source community.
AI computer use powered by open source LLMs and E2B Desktop Sandbox
Official Firecrawl MCP Server - Adds powerful web scraping to Cursor, Claude and any other LLM clients.
A Collection of Cheatsheets, Books, Questions, and Portfolio For DS/ML Interview Prep
A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).
An open-source framework for detecting, redacting, masking, and anonymizing sensitive data (PII) across text, images, and structured data. Supports NLP, pattern matching, and customizable pipelines.
Implementation of my RAG system that won all categories in Enterprise RAG Challenge 2
FinGPT: Open-Source Financial Large Language Models! Revolutionize 🔥 We release the trained model on HuggingFace.
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)