Starred repositories
阿布量化交易系统(股票,期权,期货,比特币,机器学习) 基于python的开源量化交易,量化投资架构
Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
Blind&Invisible Watermark ,图片盲水印,提取水印无须原图!
文本盲水印:把信息隐匿到文本中,put invisible blind watermark into a text.
For developers, who are building real-time data-driven applications, Redis is the preferred, fastest, and most feature-rich cache, data structure server, and document and vector query engine.
Ultra fast JSON decoder and encoder written in C with Python bindings
Apache Pinot - A realtime distributed OLAP datastore
The original sources of MS-DOS 1.25, 2.0, and 4.0 for reference purposes
A graph database that supports more than 100+ billion data, high performance and scalability (Include OLTP Engine & REST-API & Backends)
pyspark methods to enhance developer productivity 📣 👯 🎉
Always know what to expect from your data.
Apache Ranger - To enable, monitor and manage comprehensive data security across the Hadoop platform and beyond
Scriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis.
Open source SQL Query Assistant service for Databases/Warehouses
DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualiz…
Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
CMAK is a tool for managing Apache Kafka clusters
Confluent Schema Registry for Kafka
Apache Pulsar - distributed pub-sub messaging system
一站式云原生实时流数据平台,通过0侵入、插件化构建企业级Kafka服务,极大降低操作、存储和管理实时流数据门槛