8000 YYTtyy / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View YYTtyy's full-sized avatar

Block or report YYTtyy

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Language-Agnostic SEntence Representations

Jupyter Notebook 3,639 462 Updated May 2, 2024

Source code for the paper 'Audio Captioning Transformer'

Jupyter Notebook 53 3 Updated Jan 18, 2022

Audio Captioning datasets for PyTorch.

Python 117 8 Updated Mar 18, 2025

Source code for models described in the paper "AudioCLIP: Extending CLIP to Image, Text and Audio" (https://arxiv.org/abs/2106.13043)

Python 816 99 Updated Sep 30, 2021

Next-Token Prediction is All You Need

Python 2,127 80 Updated Mar 17, 2025

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 22,596 2,496 Updated Aug 12, 2024

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 49,529 6,032 Updated May 21, 2025

A family of lightweight multimodal models.

Python 1,018 74 Updated Nov 18, 2024

SVIT: Scaling up Visual Instruction Tuning

Python 162 4 Updated Jun 20, 2024

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.

Python 8,250 697 Updated May 20, 2025

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 2,923 197 Updated May 19, 2025

PyTorch implementation of Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities.

Python 484 28 Updated Apr 29, 2025

Pytorch implementation of "Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens"

Python 12 Updated Mar 9, 2024

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 3,325 283 Updated Nov 5, 2024

Faster Whisper transcription with CTranslate2

Python 16,168 1,335 Updated Apr 29, 2025
Python 42 5 Updated Dec 7, 2024

This repo lists relevant papers summarized in our survey paper: A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models.

457 34 Updated Mar 18, 2025

One Prompt Word is Enough to Boost Adversarial Robustness for Pre-trained Vision-Language Models

Python 48 5 Updated Dec 20, 2024
Python 35 3 Updated Dec 16, 2022

RobustBench: a standardized adversarial robustness benchmark [NeurIPS 2021 Benchmarks and Datasets Track]

Python 712 100 Updated Mar 31, 2025

A new adversarial purification method that uses the forward and reverse processes of diffusion models to remove adversarial perturbations.

Python 301 35 Updated Jan 29, 2023

Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams

Python 5,279 1,217 Updated May 23, 2025

Prompt Learning for Vision-Language Models (IJCV'22, CVPR'22)

Python 1,959 223 Updated May 20, 2024

ECCV2024: Adversarial Prompt Tuning for Vision-Language Models

Python 25 1 Updated Nov 19, 2024

Targeted Adversarial Examples on Speech-to-Text systems

Python 301 95 Updated Jul 24, 2022
Python 357 70 Updated Mar 8, 2024

Implementation of "Defense against Adversarial Attacks on Audio DeepFake Detection"

Jupyter Notebook 49 5 Updated Oct 20, 2023

The repo host the code and model of MAViL.

42 1 Updated Jul 24, 2023

An Audio Language model for Audio Tasks

Python 303 16 Updated Apr 19, 2024
Next
0