Starred repositories
D2 is a modern diagram scripting language that turns text to diagrams.
A lightweight Python-based tool for extracting and analyzing data column lineage for dbt projects
Modin: Scale your Pandas workflows by changing a single line of code
Master programming by recreating your favorite technologies from scratch.
Curated list of project-based tutorials
A Pure Python, React-style Framework for Scaling Your Jupyter and Web Apps
Course Materials for Analytics in Stock Markets Zoomcamp
Panel: The powerful data exploration & web app framework for Python
Scalable and efficient data transformation framework - backwards compatible with dbt.
The platform that powers Airbyte. Please file issues in https://github.com/airbytehq/airbyte
Streamlit — A faster way to build and share data apps.
The easy-to-use open source Business Intelligence and Embedded Analytics tool that lets everyone work with data 📊
DevOps resources - Linux, Jenkins, AWS, SRE, Prometheus, Docker, Python, Ansible, Git, Kubernetes, Terraform, OpenStack, SQL, NoSQL, Azure, GCP
Malloy is an experimental language for describing data relationships and transformations.
Decentralized deep learning in PyTorch. Built to train models on thousands of volunteers across the world.
re_data - fix data issues before your users & CEO would discover them 😊
dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
A collaborative documentation site, powered by Google Docs.
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
ClickHouse® is a real-time analytics database management system
StackGres Operator, Full Stack PostgreSQL on Kubernetes // !! Mirror repository of https://gitlab.com/ongresinc/stackgres, only accept Merge Requests there.
Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Redpanda is a streaming data platform for developers. Kafka API compatible. 10x faster. No ZooKeeper. No JVM!
JuiceFS is a distributed POSIX file system built on top of Redis and S3.