An easy/swift-to-adapt PyTorch-Lighting template. 套壳模板，简单易用，稍改原来Pytorch代码，即可适配Lightning。You can translate your previous Pytorch code much easier using this template, and keep your freedom to edit a…

Jupyter Notebook 1,475 194 Updated Aug 6, 2023

facebookresearch / seamless_communication

Foundational Models for State-of-the-Art Speech and Text Translation

Jupyter Notebook 11,573 1,141 Updated Nov 14, 2024

AIGC-Audio / AudioGPT

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

Python 10,165 862 Updated Jul 6, 2024

DanielMengLiu / AudioVisualLip

Python 23 1 Updated Feb 20, 2024

mkshing / Prompt-Tuning

Implementation of "The Power of Scale for Parameter-Efficient Prompt Tuning"

Jupyter Notebook 167 24 Updated Sep 8, 2021

huggingface / peft

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.

Python 18,858 1,927 Updated Jun 26, 2025

ga642381 / SpeechPrompt

**Interspeech 2022** 《SpeechPrompt: An Exploration of Prompt Tuning on Generative Spoken Language Model for Speech Processing Tasks》Speech processing with prompting paradigm

Python 100 9 Updated Apr 10, 2025

SYSTRAN / faster-whisper

Faster Whisper transcription with CTranslate2

Python 16,760 1,384 Updated Jun 2, 2025

usc-sail / mica-subtitle-aligned-movie-sounds

A dataset for Audio-Visual Sound Event Detection in Movies

Python 27 1 Updated Jan 23, 2023

xinyu1205 / recognize-anything

Open-source and strong foundation image recognition models.

Jupyter Notebook 3,296 307 Updated Feb 18, 2025

IDEA-CCNL / Fengshenbang-LM

Fengshenbang-LM(封神榜大模型)是IDEA研究院认知计算与自然语言研究中心主导的大模型开源体系，成为中文AIGC和认知智能的基础设施。

Python 4,130 384 Updated Aug 13, 2024

henrywoo / pyllama

LLaMA: Open and Efficient Foundation Language Models

Python 2,801 310 Updated Nov 8, 2023

haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 22,881 2,527 Updated Aug 12, 2024

suno-ai / bark

🔊 Text-Prompted Generative Audio Model

Jupyter Notebook 38,079 4,523 Updated Aug 19, 2024

facebookresearch / dinov2

PyTorch code and models for the DINOv2 self-supervised learning method.

Jupyter Notebook 10,931 1,004 Updated Jun 24, 2025

modelscope / 3D-Speaker

A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization

Python 2,140 186 Updated Jun 6, 2025

Moonvy / OpenPromptStudio

🥣 AIGC 提示词可视化编辑器 | OPS | Open Prompt Studio

Vue 6,283 741 Updated Apr 28, 2024

IDEA-Research / GroundingDINO

[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

Python 8,314 840 Updated Aug 12, 2024

facebookresearch / segment-anything

The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Jupyter Notebook 50,607 5,955 Updated Sep 18, 2024

IDEA-Research / Grounded-Segment-Anything

Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything

Jupyter Notebook 16,514 1,509 Updated Sep 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fanOfJava

Block or report fanOfJava

Stars

fishaudio / fish-speech

facebookresearch / AudioMAE

bytedance / InfiniteYou

taylover-pei / SSDG-CVPR2020

FakeSoundData / FakeSound

cwx-worst-one / EAT

m1guelpf / auto-subtitle

BradyFU / Awesome-Multimodal-Large-Language-Models

NanmiCoder / MediaCrawler

mahseema / awesome-ai-tools

miracleyoo / pytorch-lightning-template