Stars
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
QuestDB is a high performance, open-source, time-series database
Guides and docs to help you get up and running with Apache Airflow.
dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
A large collection of system log datasets for AI-driven log analytics [ISSRE'23]
Methods to get the probability of a changepoint in a time series.
A quick reference guide for the Pythonista in the process of becoming a Rustacean
Documentation on how to access and use the Quick, Draw! Dataset.
Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk
Markov Chain combined with word vector embedding (word2vec) and part-of-speech tagging, for context-aware text generation. License: MIT
CorEx or "Correlation Explanation" discovers a hierarchy of informative latent factors. This reference implementation has been superseded by other versions below.
Submit and share your resources and ideas in the wiki for Baseball Hack Day!
📚 Freely available programming books