- Sunnyvale
Stars
MCP server for Apache Gravitino(incubating)
World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.
A playground to experience Gravitino
Uniffle is a high performance, general purpose Remote Shuffle Service.
Firestorm is a Remote Shuffle Service, and provides the capability for Apache Spark and Apache Hadoop MapReduce applications to store shuffle data on remote servers
Apache InLong - a one-stop, full-scenario integration framework for massive data
TBase is an enterprise-level distributed HTAP database. Through a single database cluster to provide users with highly consistent distributed database services and high-performance data warehouse s…
TubeMQ has been donated to the Apache Software Foundation and renamed to InLong, please visit the new Apache repository: https://github.com/apache/incubator-inlong
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
A complete computer science study plan to become a software engineer.
Kerberos and Hadoop: The Madness beyond the Gate
A Spark Atlas connector to track data lineage in Apache Atlas
Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.
Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, DeepSeek, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discr…
An Open Source Machine Learning Framework for Everyone
The Apache Spark - Apache HBase Connector is a library to support Spark accessing HBase table as external data source or sink.
Spark Structured Streaming Kafka 0.8 Source Implementation
Livy is an open source REST interface for interacting with Apache Spark from anywhere
DEPRECATED. Zeppelin has moved to Apache. Please make pull request there
An example of running Apache Spark using Scala in ipython notebook
Apache Spark - A unified analytics engine for large-scale data processing
an introduction book on real world's scala, including its main stream frameworks and MOMs...
jerryshao / tornado
Forked from tornadoweb/tornadoTornado is an open source version of the scalable, non-blocking web server and tools that power FriendFeed.
Tornado is a Python web framework and asynchronous networking library, originally developed at FriendFeed.