Stars
The Onion Name System - academic literature
Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
Combines Apache OpenNLP and Apache Tika and provides facilities for automatically deriving sentiment from text.
Training Tesseract to better extract serial numbers from images of electronic items
MIT LL Text Classifier including MIRA Online classifier, SVM, and perceptron (LID, sentiment analysis, text difficulty assessment)
A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
Open-source JavaScript charting library behind Plotly and Dash
tools for visualizing streamcorpus data stored in kvlayer
string matching stage for the streamcorpus-pipeline
trainable text document segmenter that identifies zones of a document automatically using an SVM+HMM model, implemented as an incremental transform for streamcorpus-pipeline
A GATE based geotagger
transforms for converting opensextant output into Token objects in streamcorpus.StreamItem.body.sentences
Map Reduce Implementation of a community detection algorithm extending Louvain method for community detection.
MPI based distributed memory implementation of Louvain method for non overlapping community detecction
MPI based algorithm for detecting high centrality vertices in large graphs
Alluxio, data orchestration for analytics and machine learning in the cloud
Apache Spark - A unified analytics engine for large-scale data processing
BlinkDB: Sub-Second Approximate Queries on Very Large Data.
Uncharted Ensemble Clustering is a flexible multi-threaded clustering library for rapidly constructing tailored clustering solutions that leverage the different semantic aspects of heterogeneous da…
ApertureJS - an open, adaptable and extensible JavaScript visualization framework
Aperture-Tiles uses familiar web-based map interactions to allow exploration of arbitrary huge data sets.