Stars
A playground for experimenting ideas that may apply to Spark SQL/Catalyst
SparkSQL自定义Hint优化器解决热点数据导致JOIN数据倾斜问题
ClickHouse® is a real-time analytics database management system
Repo for Amazon EMR and Apache Ranger Integration
《机器翻译:基础与模型》肖桐 朱靖波 著 - Machine Translation: Foundations and Models
JavaCC - a parser generator for building parsers from grammars. It can generate code in Java, C++ and C#.
https://blog.csdn.net/QXC1281/article/details/89070285
TiSpark is built for running Apache Spark on top of TiDB/TiKV
TiDB - the open-source, cloud-native, distributed SQL database designed for modern applications.
The framework to deal with ctr problem。The project contains FNN,PNN,DEEPFM, NFM etc
RSTutorials: A Curated List of Must-read Papers on Recommender System.
Advanced Deep Learning and Reinforcement Learning course taught at UCL in partnership with Deepmind
手写实现李航《统计学习方法》书中全部算法
My continuously updated Machine Learning, Probabilistic Models and Deep Learning notes and demos (2000+ slides) 我不间断更新的机器学习,概率模型和深度学习的讲义(2000+页)和视频链接
深度学习入门开源书,基于TensorFlow 2.0案例实战。Open source Deep Learning book, based on TensorFlow 2.0 framework.
The open source version of the Amazon EMR Management Guide. You can submit feedback & requests for changes by submitting issues in this repo or by making proposed changes & submitting a pull request.
Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
A curated list of resources dedicated to Feature Engineering Techniques for Machine Learning
A Guide for Feature Engineering and Feature Selection, with implementations and examples in Python.
Code repo for the book "Feature Engineering for Machine Learning," by Alice Zheng and Amanda Casari, O'Reilly 2018
Automated Machine Learning with scikit-learn
Riak is a decentralized datastore from Basho Technologies.
A collection of various deep learning architectures, models, and tips
Interactive Linear Algebra, free online textbook at Georgia Tech