8000 mrorii (Naoki Orii) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View mrorii's full-sized avatar

Organizations

@oaqa

Block or report mrorii

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.

Python 5,736 178 Updated Jul 8, 2025

Retrieval and Retrieval-augmented LLMs

Python 10,155 750 Updated Jul 15, 2025

Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.

Python 2,484 194 Updated Jul 13, 2025

Curate better data for LLMs

Python 1,046 100 Updated Mar 19, 2024

What's In My Big Data (WIMBD) - a toolkit for analyzing large text datasets

Python 221 21 Updated Nov 16, 2024

Interact, analyze and structure massive text, image, embedding, audio and video datasets

Python 1,750 191 Updated Jun 13, 2025

Collection of training data management explorations for large language models

327 31 Updated Aug 2, 2024

Utils for streaming large files (S3, HDFS, gzip, bz2. EE80 ..)

Python 3,333 384 Updated Jul 10, 2025

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)

Python 2,681 190 Updated Jan 30, 2025

🚀🧠💬 Supercharged Custom Instructions for ChatGPT (non-coding) and ChatGPT Advanced Data Analysis (coding).

JavaScript 6,662 475 Updated Jan 17, 2024

Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".

Python 2,143 178 Updated Mar 27, 2024

Accessible large language models via k-bit quantization for PyTorch.

Python 7,227 720 Updated Jul 14, 2025

LLM training code for Databricks foundation models

Python 4,284 571 Updated Jul 14, 2025

A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.

Python 2,701 216 Updated Jun 19, 2025

Easily convert common crawl to a dataset of caption and document. Image/text Audio/text Video/text, ...

Python 317 28 Updated Dec 9, 2023

Faster way to switch between clusters and namespaces in kubectl

Go 18,811 1,316 Updated Jan 22, 2025

The simple, stupid rules engine for Java

Java 5,096 1,088 Updated May 29, 2024

An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.

Python 14,229 1,824 Updated Jul 3, 2024

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs

Scala 8,148 1,873 Updated Jul 15, 2025

Testcontainers is a Java library that supports JUnit tests, providing lightweight, throwaway instances of common databases, Selenium web browsers, or anything else that can run in a Docker container.

Java 8,273 1,723 Updated Jul 16, 2025

Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.

Scala 3,462 560 Updated Jun 24, 2025

A constant throughput, correct latency recording variant of wrk

C 4,420 400 Updated Mar 3, 2024

Java Collections till the last breadcrumb of memory and performance

Java 1,014 140 Updated Feb 1, 2017

A collection of research and application papers of (uncertainty) calibration techniques.

331 53 Updated Jan 3, 2024

Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search

Go 35,989 3,298 Updated Jul 15, 2025

A developer's guide to management: an open-sourced handbook for leading software engineering teams.

1,564 95 Updated Jan 24, 2020

Apache Flink Training Excercises

Java 979 688 Updated Aug 5, 2024

Learning to Rank in TensorFlow

Python 2,778 480 Updated Mar 18, 2024

An open-source framework for machine learning and other computations on decentralized data.

Python 2,386 597 Updated Jul 15, 2025

Collection of tech talks, papers and web links on Distributed Systems, Scalability and System Design.

2,063 415 Updated Nov 8, 2023
Next
0