8000 YYTtyy / Starred · GitHub

More Web Proxy on the site http://driver.im/

YYTtyy

Follow

YYTtyy

Follow

6 followers · 39 following

Lists (1)

Sort

interesting

Stars

facebookresearch / LASER

Language-Agnostic SEntence Representations

Jupyter Notebook 3,639 462 Updated May 2, 2024

XinhaoMei / ACT

Source code for the paper 'Audio Captioning Transformer'

Jupyter Notebook 53 3 Updated Jan 18, 2022

Labbeti / aac-datasets

Audio Captioning datasets for PyTorch.

Python 117 8 Updated Mar 18, 2025

AndreyGuzhov / AudioCLIP

Source code for models described in the paper "AudioCLIP: Extending CLIP to Image, Text and Audio" (https://arxiv.org/abs/2106.13043)

Python 816 99 Updated Sep 30, 2021

baaivision / Emu3

Next-Token Prediction is All You Need

Python 2,127 80 Updated Mar 17, 2025

haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 22,596 2,496 Updated Aug 12, 2024

hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 49,529 6,032 Updated May 21, 2025

BAAI-DCAI / Bunny

A family of lightweight multimodal models.

Python 1,018 74 Updated Nov 18, 2024

BAAI-DCAI / Visual-Instruction-Tuning

SVIT: Scaling up Visual Instruction Tuning

Python 162 4 Updated Jun 20, 2024

kyutai-labs / moshi

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.

Python 8,250 697 Updated May 20, 2025

ictnlp / LLaMA-Omni

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 2,923 197 Updated May 19, 2025

NVIDIA / audio-flamingo

PyTorch implementation of Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities.

Python 484 28 Updated Apr 29, 2025

ms-dot-k / Image-to-Speech

Pytorch implementation of "Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens"

Python 12 Updated Mar 9, 2024

gpt-omni / mini-omni

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 3,325 283 Updated Nov 5, 2024

SYSTRAN / faster-whisper

Faster Whisper transcription with CTranslate2

Python 16,168 1,335 Updated Apr 29, 2025

Haochen-Luo / CroPA

Python 42 5 Updated Dec 7, 2024

JindongGu / Awesome-Prompting-on-Vision-Language-Model

This repo lists relevant papers summarized in our survey paper: A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models.

457 34 Updated Mar 18, 2025

TreeLLi / APT

One Prompt Word is Enough to Boost Adversarial Robustness for Pre-trained Vision-Language Models

Python 48 5 Updated Dec 20, 2024

SCLBD / Transfer_attack_RAP

Python 35 3 Updated Dec 16, 2022

RobustBench / robustbench

RobustBench: a standardized adversarial robustness benchmark [NeurIPS 2021 Benchmarks and Datasets Track]

Python 712 100 Updated Mar 31, 2025

NVlabs / DiffPure

A new adversarial purification method that uses the forward and reverse processes of diffusion models to remove adversarial perturbations.

Python 301 35 Updated Jan 29, 2023

Trusted-AI / adversarial-robustness-toolbox

Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams

Python 5,279 1,217 Updated May 23, 2025

KaiyangZhou / CoOp

Prompt Learning for Vision-Language Models (IJCV'22, CVPR'22)

Python 1,959 223 Updated May 20, 2024

jiamingzhang94 / Adversarial-Prompt-Tuning

ECCV2024: Adversarial Prompt Tuning for Vision-Language Models

Python 25 1 Updated Nov 19, 2024

carlini / audio_adversarial_examples

Targeted Adversarial Examples on Speech-to-Text systems

Python 301 95 Updated Jul 24, 2022

soerenab / AudioMNIST

Python 357 70 Updated Mar 8, 2024

piotrkawa / audio-deepfake-adversarial-attacks

Implementation of "Defense against Adversarial Attacks on Audio DeepFake Detection"

Jupyter Notebook 49 5 Updated Oct 20, 2023

cvlab-columbia / ZSRobust4FoundationModel

Python 41 3 Updated Jun 11, 2023

facebookresearch / MAViL

The repo host the code and model of MAViL.

42 1 Updated Jul 24, 2023

microsoft / Pengi

An Audio Language model for Audio Tasks

Python 303 16 Updated Apr 19, 2024

0