Starred repositories
This repository delivers end-to-end, code-first tutorials covering every layer of production-grade GenAI agents, guiding you from spark to scale with proven patterns and reusable blueprints for re…
Open source platform for the machine learning lifecycle
Custom AI assistant platform to speed up your work.
This is a repo with links to everything you'd ever want to learn about data engineering
LlamaIndex is the leading framework for building LLM-powered agents over your data.
End-to-end Generative Optimization for AI Agents
syftr is an agent optimizer that helps you find the best agentic workflows for your budget.
Replace 'hub' with 'ingest' in any github url to get a prompt-friendly extract of a codebase
📦 Repomix is a powerful tool that packs your entire repository into a single, AI-friendly file. Perfect for when you need to feed your codebase to Large Language Models (LLMs) or other AI tools lik…
The official Model Context Protocol (MCP) server for DataHub (https://datahub.com)
Protocol Buffers - Google's data interchange format
🚨 Design workflows of slog handlers: pipeline, middleware, fanout, routing, failover, load balancing...
💥 A Lodash-style Go library based on Go 1.18+ Generics (map, filter, contains, find...)
The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance …
The AI-native proxy server for agents. Arch handles the pesky low-level work in building agents like clariyfing vague user input, routing prompts to the right agents and unifying access to any LLM …
A caching library with advanced concurrency features designed to make I/O heavy applications robust and highly performant
lkml2cube is a tool to convert LookML models into Cube data models.
Fancy stream processing made operationally mundane. This repository is a fork of the original project before the license was changed.
Python tool for converting files and office documents to Markdown.
DiceDB is an open-source, fast, reactive, in-memory database optimized for modern hardware.
An open-source framework for detecting, redacting, masking, and anonymizing sensitive data (PII) across text, images, and structured data. Supports NLP, pattern matching, and customizable pipelines.
A fully-featured AWS Athena database driver (+ athenareader https://github.com/uber/athenadriver/tree/master/athenareader)
A syntax-highlighting pager for git, diff, grep, and blame output
Cost monitoring for Kubernetes workloads and cloud costs
Dashboards and notebooks in a single place. Create powerful and flexible dashboards using code, or build beautiful Notion-like notebooks and share them with your team.