Stars
A real time Multimodal Emotion Recognition web app for text, sound and video inputs
A Unimodal Valence-Arousal Driven Contrastive Learning Framework for Multimodal Multi-Label Emotion Recognition (ACM MM 2024 oral)
[Information Fusion (Vol.103, Mar. '24)] Boosting Image Matting with Pretrained Plain Vision Transformers
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (V…
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
ImageBind One Embedding Space to Bind Them All
Explainable Multimodal Emotion Reasoning (EMER), Open-vocabulary MER (OV-MER), and AffectGPT
MIntRec2.0 is the first large-scale dataset for multimodal intent recognition and out-of-scope detection in multi-party conversations (ICLR 2024)
Setup and run a local LLM and Chatbot using consumer grade hardware.
《开源大模型食用指南》针对中国宝宝量身打造的基于Linux环境快速微调(全参数/Lora)、部署国内外开源大模型(LLM)/多模态大模型(MLLM)教程
Benchmarking Legal Knowledge of Large Language Models
Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷
Production-ready platform for agentic workflow development.
[NAACL'24] Self-data filtering of LLM instruction-tuning data using a novel perplexity-based difficulty score, without using any other models
Always know what to expect from your data.
Repo for Rho-1: Token-level Data Selection & Selective Pretraining of LLMs.
An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-preserving data collec…
Useful resources on data quality for machine learning and artificial intelligence.
Evidently is an open-source ML and LLM observability framework. Evaluate, test, and monitor any AI-powered system or data pipeline. From tabular data to Gen AI. 100+ metrics.
Learning Language-guided Adaptive Hyper-modality Representation for Multimodal Sentiment Analysis (ALMT)
Question and Answer based on Anything.
[NeurIPS 2024] An official implementation of ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
Source code of paper "An Unforgeable Publicly Verifiable Watermark for Large Language Models" accepted by ICLR 2024
GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型
[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
FinGLM: 致力于构建一个开放的、公益的、持久的金融大模型项目,利用开源开放来促进「AI+金融」。
Retrieval and Retrieval-augmented LLMs
整理开源的中文大语言模型,以规模较小、可私有化部署、训练成本较低的模型为主,包括底座模型,垂直领域微调及应用,数据集与教程等。