Stars
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Profiler for large-scale distributed java applications (Spark, Scalding, MapReduce, Hive,...) on YARN.
A Scala API for Apache Beam and Google Cloud Dataflow.
Apache Spark - A unified analytics engine for large-scale data processing
Protocol Buffers - Google's data interchange format