- Santa Clara, California, United States
-
05:01
(UTC -07:00) - https://www.linkedin.com/in/jaemincs/
More
Stars
An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & vLLM & Ray & Dynamic Sampling & Async Agentic RL)
A library for 10000 accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory…
NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the effective training time by minimizing the downtime due to fa…
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
PyTorch compiler that accelerates training and inference. Get built-in optimizations for performance, memory, parallelism, and easily write your own.
Development repository for the Triton language and compiler
Ongoing research training transformer models at scale
Color effects manager for Razer devices for macOS. Supports High Sierra (10.13) to Monterey (12.0). Made by the community, based on openrazer.
An ultimately comprehensive paper list of Vision Transformer/Attention, including papers, codes, and related websites
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.
An HPC-oriented, parallel programming language targeting Charm++. Aims to be to C++ as Scala is to Java.
Graph Neural Network Library for PyTorch
Python package built to ease deep learning on graph, on top of existing DL frameworks.
Menubar Tool to set Charge Limits and Prolong Battery Lifespan
HugeCTR is a high efficiency GPU framework designed for Click-Through-Rate (CTR) estimating training
An implementation of a deep learning recommendation model (DLRM)
[ARCHIVED] The C++ Standard Library for your entire system. See https://github.com/NVIDIA/cccl
CLI11 is a command line parser for C++11 and beyond that provides a rich feature set with a simple and intuitive interface.
Directed Acyclic Graph Execution Engine (DAGEE) is a C++ library that enables programmers to express computation and data movement, as task graphs that are scheduled concurrently and asynchronously…
Examples demonstrating available options to program multiple GPUs in a single node or a cluster