-
Alibaba Cloud
- beijing
- jason.zj0619@gmail.com
Stars
antgroup / ant-ray
Forked from ray-project/rayRay is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads. AntRay is forked from ray, offering incremental new features on top …
Cost-efficient and pluggable Infrastructure components for GenAI inference
FlashMLA: Efficient MLA decoding kernels
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
SGLang is a fast serving framework for large language models and vision language models.
Pingmesh:A Large-Scale System for Data Center Network Latency Measurement and Analysis(用于数据中心网络延迟测量和分析的大规模系统)
Create and manage Amazon SageMaker HyperPod clusters, run distributed model training
Spark RAPIDS plugin - accelerate Apache Spark with GPUs
Kubectl plugin for setting conditions on nodes
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
Simple HTTP, REST, and SSE client library for Go
Cross-platform filesystem notifications for Go.
Go package for reading from continously updated files (tail -f)
Developer-friendly, embedded retrieval engine for multimodal AI. Search More; Manage Less.
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorR…
Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
GPUd automates monitoring, diagnostics, and issue identification for GPUs
Anteon (formerly Ddosify) - Effortless Kubernetes Monitoring and Performance Testing. Available on CLI, Self-Hosted, and Cloud
Embedding Projector是Google开源的高维数据可视化工具,本项目基于这款交互式可视化Web应用程序搭建一个可以进行高维数据分析的系统。
Dynolog is a telemetry daemon for performance monitoring and tracing. It exports metrics from different components in the system like the linux kernel, CPU, disks, Intel PT, GPUs etc. Dynolog also …
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
deepspeedai / Megatron-DeepSpeed
Forked from NVIDIA/Megatron-LMOngoing research training transformer language models at scale, including: BERT & GPT-2