Highlights
Stars
big-data
16 repositories
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
Alluxio, data orchestration for analytics and machine learning in the cloud
Apache Spark - A unified analytics engine for large-scale data processing
A little word cloud generator in Python
Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
The fundamental package for scientific computing with Python.
Apache Heron (Incubating) is a realtime, distributed, fault-tolerant stream processing engine from Twitter