- Tokyo, Japan
-
09:12
(UTC +09:00) - http://mrorii.github.io/
Stars
An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.
Retrieval and Retrieval-augmented LLMs
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
What's In My Big Data (WIMBD) - a toolkit for analyzing large text datasets
Interact, analyze and structure massive text, image, embedding, audio and video datasets
Collection of training data management explorations for large language models
Utils for streaming large files (S3, HDFS, gzip, bz2. EE80 ..)
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
🚀🧠💬 Supercharged Custom Instructions for ChatGPT (non-coding) and ChatGPT Advanced Data Analysis (coding).
Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
Accessible large language models via k-bit quantization for PyTorch.
LLM training code for Databricks foundation models
A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.
Easily convert common crawl to a dataset of caption and document. Image/text Audio/text Video/text, ...
Faster way to switch between clusters and namespaces in kubectl
An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
Testcontainers is a Java library that supports JUnit tests, providing lightweight, throwaway instances of common databases, Selenium web browsers, or anything else that can run in a Docker container.
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
A constant throughput, correct latency recording variant of wrk
Java Collections till the last breadcrumb of memory and performance
A collection of research and application papers of (uncertainty) calibration techniques.
Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search
A developer's guide to management: an open-sourced handbook for leading software engineering teams.
An open-source framework for machine learning and other computations on decentralized data.
Collection of tech talks, papers and web links on Distributed Systems, Scalability and System Design.