Lists (1)
Sort Name ascending (A-Z)
Stars
Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, Du…
Fluss is a streaming storage built for real-time analytics.
A Persistent Key-Value Store designed for Streaming processing
A blazingly fast multi-language serialization framework powered by JIT and zero-copy.
Apache Paimon Rust The rust implementation of Apache Paimon.
Official electron build of draw.io
Flink SQL connector for ClickHouse. Support ClickHouseCatalog and read/write primary data, maps, arrays to clickhouse.
World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.
Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.
Dinky is a real-time data development platform based on Apache Flink, enabling agile data development, deployment and operation.
Coral is a translation, analysis, and query rewrite engine for SQL and other relational languages.
A repository that contains models, datasets, and fine-tuning techniques for DB-GPT, with the purpose of enhancing model performance in Text-to-SQL
FlinkSQL数据脱敏和行级权限解决方案及源码,支持面向用户级别的数据脱敏和行级数据访问控制,即特定用户只能访问到脱敏后的数据或授权过的行。此方案是实时领域Flink的解决方案,类似于离线数仓Hive Ranger中的Row-level Filter和Column Masking方案。
Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.
Very fast & efficient grep for Kafka stream
基于 antlr4 的多种数据库SQL解析器,获取SQL中元数据,可用于数据平台产品中的多个场景:ddl语句提取元数据、sql 权限校验、表级血缘、sql语法校验等场景。支持spark、flink、gauss、starrocks、Oracle、MYSQL、Postgresql,sqlserver,、db2等
Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
The Lineage Analysis system for FlinkSQL supports advanced syntax such as Watermark, UDTF, CEP, Windowing TVFs, and CTAS.
HyperLogLog (original and hyperloglog++) algorithm implementation in java.
The official home of the Presto distributed SQL query engine for big data
Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
⚡ Dynamically generated stats for your github readmes