Stars
🍒 Cherry Studio is a desktop client that supports for multiple LLM providers.
[CVPR 2024] Real-Time Open-Vocabulary Object Detection
AI Native Data App Development framework with AWEL(Agentic Workflow Expression Language) and Agents
[ICCV 2023] TransFace: Calibrating Transformer Training for Face Recognition from a Data-Centric Perspective
MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone
Espressif IoT Library. IoT Device Drivers, Documentations and Solutions.
OpenShot Video Editor is an award-winning free and open-source video editor for Linux, Mac, and Windows, and is dedicated to delivering high quality video editing and animation solutions to the world.
Dockerfile containing FFmpeg, OpenCV4 and Python2/3, based on Ubuntu LTS
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation
Real time interactive streaming digital human
This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020. For HD commercial model, please try out Sync Labs
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translatio…
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
LAVIS - A One-stop Library for Language-Vision Intelligence
Langchain-Chatchat(原Langchain-ChatGLM)基于 Langchain 与 ChatGLM, Qwen 与 Llama 等语言模型的 RAG 与 Agent 应用 | Langchain-Chatchat (formerly langchain-ChatGLM), local knowledge based LLM (like ChatGLM, Qwen and…
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
Robust Speech Recognition via Large-Scale Weak Supervision
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
Official Code for DragGAN (SIGGRAPH 2023)
一款创新跨平台摸鱼神器,支持小说、股票、网页、视频、直播、PDF、游戏等摸鱼模式,为上班族打造的上班必备神器,使用此软件可以让上班倍感轻松,远离 ICU。
A Robust, Real-time, RGB-colored, LiDAR-Inertial-Visual tightly-coupled state Estimation and mapping package
Official repository of NeuMan: Neural Human Radiance Field from a Single Video (ECCV 2022)