Data
A curated list of awesome pipeline toolkits inspired by Awesome Sysadmin
The fastest ⚡️ way to build data pipelines. Develop iteratively, deploy anywhere. ☁️
Run templatable playbooks of SQL scripts in series and parallel on Redshift, PostgreSQL, BigQuery and Snowflake
Quickly ingest messy CSV and XLS files. Export to clean pandas, SQL, parquet
World's fastest log analysis: λ + SQL + JSON + S3
Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.
Redpanda is a streaming data platform for developers. Kafka API compatible. 10x faster. No ZooKeeper. No JVM!
Provide Maximum Availability and Disaster Protection for Oracle Systems
A small library for importing/exporting BigTable instance schemas and row data.
El Carro is a new project that offers a way to run Oracle databases in Kubernetes as a portable, open source, community driven, no vendor lock-in container orchestration system. El Carro provides a…
FUSE-based file system for replicating SQLite databases across a cluster of machines
A swiss army knife CLI tool for interacting with Kafka, RabbitMQ and other messaging systems.
Qlib is an AI-oriented Quant investment platform that aims to use AI tech to empower Quant Research, from exploring ideas to implementing productions. Qlib supports diverse ML modeling paradigms, i…
A next-generation crawling and spidering framework.
1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
Sample database for SQL Server, Oracle, MySQL, PostgreSQL, SQLite, DB2
⚡VoltaML is a lightweight library to convert and run your ML/DL deep learning models in high performance inference runtimes like TensorRT, TorchScript, ONNX and TVM.
Official implementation of our CVPR 2023 paper "Compressing Volumetric Radiance Fields to 1 MB"
immudb - immutable database based on zero trust, SQL/Key-Value/Document model, tamperproof, data change history
A template for PostgreSQL High Availability with Etcd, Consul, ZooKeeper, or Kubernetes
Dataform is a framework for managing SQL based data operations in BigQuery
A better ORM for Go, based on non-empty interfaces and code generation.
JunoDB is PayPal's home-grown secure, consistent and highly available key-value store providing low, single digit millisecond, latency at any scale.