Highlights
data
A playground for running duckdb as a stateless query engine over a data lake
Nessie: Transactional Catalog for Data Lakes with Git-like semantics
The platform that powers Airbyte. Please file issues in https://github.com/airbytehq/airbyte
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
DoEKS is a tool to build, deploy and scale Data & ML Platforms on Amazon EKS
An Open Standard for lineage metadata collection
do more with dbt. dbt-fal helps you run Python alongside dbt, so you can send Slack alerts, detect anomalies and build machine learning models.
📊 Cube’s universal semantic layer platform is the next evolution of OLAP technology for AI, BI, spreadsheets, and embedded analytics
Scalable and efficient data transformation framework - backwards compatible with dbt.
A modular SQL linter and auto-formatter with support for multiple dialects and templated code.
Compare tables within or across databases
DataHub Actions is a framework for responding to changes to your DataHub Metadata Graph in real time.
Data Engineering Zoomcamp is a free nine-week course that covers the fundamentals of data engineering.
A Singer (https://singer.io) target that writes data to Google BigQuery.
PyPika is a python SQL query builder that exposes the full richness of the SQL language using a syntax that reflects the resulting query. PyPika excels at all sorts of SQL queries but is especially…
DuckDB is an analytical in-process SQL database management system
The User-Community Airflow Helm Chart is the standard way to deploy Apache Airflow on Kubernetes with Helm. Originally created in 2017, it has since helped thousands of companies create production-…
Repository of helm charts for deploying DataHub on a Kubernetes cluster
A web UI for Debezium; Please log issues at https://issues.redhat.com/browse/DBZ.
A series of DAGs/Workflows to help maintain the operation of Airflow
Web tool for operating kafka connect https://hub.docker.com/r/officialkakao/kafka-connect-web
This repository is a getting started guide to Singer.
Change data capture for a variety of databases. Please log issues at https://issues.redhat.com/browse/DBZ.
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.