这个文档包含了我所跟踪的一些领域的最新动态。
Conference
- 【ICCV 2023】Human Part-wise 3D Motion Context Learning for Sign Language Recognition. [Paper]
- 【CVPR 2023】Natural Language-Assisted Sign Language Recognition. [Paper] [Code]
- 【CVPRW 2023】Isolated Sign Language Recognition based on Tree Structure Skeleton Images. [Paper] [Code]
- 【AAAI 2023】BEST: BERT Pre-Training for Sign Language Recognition with Coupling Tokenization. [Paper]
- 【NeurIP 8000 S 2023】PopSign ASL v1.0: An Isolated American Sign Language Dataset Collected via Smartphones. [Paper]
- 【NeurIPS 2023】ASL Citizen: A Community-Sourced Dataset for Advancing Isolated Sign Language Recognition. [Paper]
- 【ACMMM 2024】Siformer: Feature-isolated Transformer for Efficient Skeleton-based Sign Language Recognition. [Paper]
- 【COLING 2024】Dynamic Spatial-Temporal Aggregation for Skeleton-Aware Sign Language Recognition [Paper] [code].
- 【CVPR 2025】Lost in Translation, Found in Context: Sign Language Translation with Contextual Cues. [Paper]
- 【WWW 2025】Exploiting Ensemble Learning for Cross-View Isolated Sign Language Recognition. [Paper]
Workshop
- 【ICCVW 2023】New keypoint-based approach for recognising British Sign Language (BSL) from sequences. [Paper]
Submittion
- 【ICLR 2025】Representing Signs as Signs: One-Shot ISLR to Facilitate Functional Sign Language Technologies. [Paper]
Journal
-
(TIP 2024)Self-Supervised Representation Learning with Spatial-Temporal Consistency for Sign Language Recognition. [Paper] [Code]
-
(TPAMI 2023)Towards Zero-Shot Sign Language Recognition. [Paper]
-
(TCSVT 2024)MASA: Motion-aware Masked Autoencoder with Semantic Alignment for Sign Language Recognition. [Paper] [Code]
-
(TMM 2024)SKIM: Skeleton-Based Isolated Sign Language Recognition With Part Mixing. [Paper]
-
#(PR 2024)Cross-lingual few-shot sign language recognition. [Paper]
Preprint
- 「Arxiv 2024.01.22」Connecting the Dots: Leveraging Spatio-Temporal Graph Neural Networks for Accurate Bangla Sign Language Recognition. [Paper]
- 「Arxiv 2024.02.13」BdSLW60: A Word-Level Bangla Sign Language Dataset. [Paper]
- 「Arxiv 2024.03.19」Dynamic Spatial-Temporal Aggregation for Skeleton-Aware Sign Language Recognition. [Paper]
- 「Arxiv 2024.04.15」Transfer Learning for Cross-dataset Isolated Sign Language Recognition in Under-Resourced Datasets. [Paper]
- 「Arxiv 2024.04.24」Sign Language Recognition based on YOLOv5 Algorithm for the Telugu Sign Language. [Paper]
- 「Arxiv 2024.04.29」Enhancing Brazilian Sign Language Recognition through Skeleton Image Representation. [Paper]
- 「Arxiv 2024.06.24」PenSLR: Persian end-to-end Sign Language Recognition Using Ensembling. [Paper]
- 「Arxiv 2024.07.02」Sign Language Recognition Based On Facial Expression and Hand Skeleton. [Paper]
- 「Arxiv 2024.06.27」A Transformer-Based Multi-Stream Approach for Isolated Iranian Sign Language Recognition. [Paper]
- 「Arxiv 2024.07.07」iSign: A Benchmark for Indian Sign Language Processing. [Paper]
- 「Arxiv 2024.08.20」BAUST Lipi: A BdSL Dataset with Deep Learning Based Bangla Sign Language Recognition. [Paper]
- 「Arxiv 2024.01.14」Revolutionizing Communication with Deep Learning and XAI for Enhanced Arabic Sign Language Recognition. [Paper]
- 「Arxiv 2024.09.11」Deep Neural Network-Based Sign Language Recognition: A Comprehensive Approach Using Transfer Learning with Explainability. [Paper]
- 「Arxiv 2024.08.26」Bengali Sign Language Recognition through Hand Pose Estimation using Multi-Branch Spatial-Temporal Attention Model. [Paper]
- 「Arxiv 2024.09.27」Hierarchical Windowed Graph Attention Network and a Large Scale Dataset for Isolated Indian Sign Language Recognition. [Paper]
- #「Arxiv 2024.11.10」Classification in Japanese Sign Language Based on Dynamic Facial Expressions. [Paper]
- 「Arxiv 2024.12.10」Real-time Sign Language Recognition Using MobileNetV2 and Transfer Learning. [Paper]
- 「Arxiv 2024.12.16」Training Strategies for Isolated Sign Language Recognition. [Paper]
- 「Arxiv 2024.12.24」Learning Sign Language Representation using CNN LSTM, 3DCNN, CNN RNN LSTM and CCN TD. [Paper]
- #「Arxiv 2025.02.27」Representing Signs as Signs: One-Shot ISLR to Facilitate Functional Sign Language Technologies. [Paper]
- #「Arxiv 2025.03.04」BdSLW401: Transformer-Based Word-Level Bangla Sign Language Recognition Using Relative Quantization Encoding (RQE). [Paper]
- 「Arxiv 2025.03.16」Cross-Modal Consistency Learning for Sign Language Recognition. [Paper]
- #「Arxiv 2025.03.16」ISLR101: an Iranian Word-Level Sign Language Recognition Dataset. [Paper]
- 「Arxiv 2025.04.10」Breaking the Barriers: Video Vision Transformers for Word-Level Sign Language Recognition. [Paper]
- 「Arxiv 2025.04.23」SSLR: A Semi-Supervised Learning Method for Isolated Sign Language Recognition. [Paper]
Conference
- 【ACMMM 2023】AdaBrowse: Adaptive Video Browser for Efficient Continuous Sign Language Recognition. [Paper]
- 【ACMMM 2023】Towards Real-Time Sign Language Recognition and Translation on Edge Devices. [Paper]
- 【ICCV 2023】CoSign: Exploring Co-occurrence Signals in Skeleton-based Continuous Sign Language Recognition. [Paper]
- 【ICCV 2023】Improving Continuous Sign Language Recognition with Cross-Lingual Signs. [Paper]
- 【ICCV 2023】C2ST: Cross-modal Contextualized Sequence Transduction for Continuous Sign Language Recognition. [Paper]
- 【EMNLP Findings 2023]】Handshape-Aware Sign Language Recognition: Extended Datasets and Exploration of Handshape-Inclusive Methods. [paper]
- 【AAAI 2023】Self-Emphasizing Network for Continuous Sign Language Recognition. [Paper] [Code]
- 【CVPR 2023】CVT-SLR: Contrastive Visual-Textual Transformation for Sign Language Recognition with Variational Alignment. [Paper] [Code]
- 【CVPR 2023】Continuous Sign Language Recognition with Correlation Network. [Paper] [Code]
- 【CVPR 2023】Distilling Cross-Temporal Contexts for Continuous Sign Language Recognition. [Paper]
- 【ECCV 2024】 EvSign: Sign Language Recognition and Translation with Streaming Events. [paper]
- 【EMNLP 2024】Towards Online Continuous Sign Language Recognition and Translation. [paper]]
- 【AAAI 2024】KD-MSLRT: Lightweight Sign Language Recognition Model Based on Mediapipe and 3D to 1D Knowledge Distillation. [Paper]
- 【AAAI 2024】Cross-Sentence Gloss Consistency for Continuous Sign Language Recognition. [Paper]
- 【AAAI 2024】TCNet: Continuous Sign Language Recognition from Trajectories and Correlated Regions. [Paper]
- 【CVPR 2024】SignGraph: A Sign Sequence is Worth Graphs of Nodes. [Paper] [code]
- 【IJCAI 2023】Contrastive Learning for Sign Language Recognition and Translation. [Paper]
Submittion
- 【ICLR 2024】SignKD: Multi-modal Hierarchical Knowledge Distillation for Continuous Sign Language Recognition. [Paper]
Journal
-
(TMM 2023)Prior-Aware Cross Modality Augmentation Learning for Continuous Sign Language Recognition. [Paper]
-
(TETCI 2024)Spatial Temporal Aggregation for Efficient Continuous Sign Language Recognition. [Paper]
-
(TIP 2024)Gloss Prior Guided Visual Feature Learning for Continuous Sign Language Recognition. [Paper]
-
(TCSVT 2023)Spatial-Temporal Enhanced Network for Continuous Sign Language Recognition. [Paper]
-
(TMM 2023)Collaborative Multilingual Continuous Sign Language Recognition: A Unified Framework. [Paper]
-
(TMM 2024)A Sign Language Recognition Framework Based on Cross-Modal Complementary Information Fusion. [Paper]
-
(PR 2024)Scalable Frame Resolution for Efficient Continuous Sign Language Recognition. [Paper]
-
(PR 2023)Multi-scale local-temporal similarity fusion for continuous sign language recognition. [Paper]
-
(TPAMI 2025)MixSignGraph: A Sign Sequence is Worth Mixed Graphs of Nodes. [Paper]
Preprint
- 「Arxiv 2024.01.22」SignVTCL: Multi-Modal Continuous Sign Language Recognition Enhanced by Visual-Textual Contrastive Learning. [Paper]
- 「Arxiv 2024.02.29」Continuous Sign Language Recognition Based on Motor attention mechanism and frame-level Self-distillation. [Paper]
- 「Arxiv 2024.04.12」Improving Continuous Sign Language Recognition with Adapted Image Models [Paper] [Code]
- 「Arxiv 2024.04.17」CorrNet+: Sign Language Recognition and Translation via Spatial-Temporal Correlation [Paper] [Code]
- #「Arxiv 2024.04.21」Stream State-tying for Sign Language Recognition. [Paper]
- 「Arxiv 2024.05.02」A Hong Kong Sign Language Corpus Collected from Sign-interpreted TV News [Paper]
- 「Arxiv 2024.05.16」A Tale of Two Languages: Large-Vocabulary Continuous Sign Language Recognition from Spoken Language Supervision. [Paper]
- 「Arxiv 2024.05.20」Continuous Sign Language Recognition with Adapted Conformer via Unsupervised Pretraining. [Paper]
- 「Arxiv 2024.06.26」Continuous Sign Language Recognition Using Intra-inter Gloss Attention. [Paper]
- #「Arxiv 2024.08.14」Sign language recognition based on deep learning and low-cost handcrafted descriptors. [Paper]
- 「Arxiv 2024.09.02」SCOPE: Sign Language Contextual Processing with Embedding from LLMs [Paper]
- 「Arxiv 2024.09.18」A Chinese Continuous Sign Language Dataset Based on Complex Environments. [Paper]
- #「Arxiv 2024.11.07」Continuous Sign Language Recognition System using Deep Learning with MediaPipe Holistic. [Paper]
- 「Arxiv 2025.03.11」OLMD: Orientation-aware Long-term Motion Decoupling for Continuous Sign Language Recognition. [Paper]
- 「Arxiv 2025.03.21」Stack Transformer Based Spatial-Temporal Attention Model for Dynamic Multi-Culture Sign Language Recognition [Paper]
- 「Arxiv 2025.04.02」CLIP-SLA: Parameter-Efficient CLIP Adaptation for Continuous Sign Language Recognition. [Paper]
- 「Arxiv 2025.04.22」SignX: The Foundation Model for Sign Recognition. [Paper]
Conference
-
【ICLR 2023】SLTUNET: A Simple Unified Model for Sign Language Translation. [paper] [Code]
-
【ACL 2023】Gloss-Free End-to-End Sign Language Translation. [Paper] [Code]
-
#【ACL 2023】Neural Machine Translation Methods for Translating Text to Sign Language Glosses. [Paper]
-
#【ACL 2023】Considerations for meaningful sign language machine translation based on glosses. [Paper]
-
#【ACL 2023】ISLTranslate: Dataset for Translating Indian Sign Language. [Paper] [Code]
-
【EMNLP 2023】Cross-modality Data Augmentation for End-to-End Sign Language Translation. [paper] [Code]
-
【NeurIPS 2023】YouTube-ASL: A Large-Scale, Open-Domain American Sign Language-English Parallel Corpus. [paper]
-
【NeurIPS 2023】Auslan-Daily: Australian Sign Language Translation for Daily Communication and News. [Paper]]
-
【CVPRW 2023】Sign Language Translation from Instructional Videos. [Paper] [Project] [Code]
-
【CVPR 2023】Gloss Attention for Gloss-free Sign Language Translation. [Paper] [Code]
-
【ICCV 2023】Sign Language Translation with Iterative Prototype. [Paper]
-
【ICCV 2023】Gloss-free Sign Language Translation: Improving from Visual-Language Pretraining. [paper] [Code]
-
【ACL 2024】Sign Language Translation with Sentence Embedding Supervision. [paper]
-
【ACL 2024】Unsupervised Sign Language Translation and Generation. [Paper]
-
【ICLR 2024】Sign2GPT: Leveraging Large Language Models for Gloss-Free Sign Language Translation. [paper]
-
【AAAI 2024】Conditional Variational Autoencoder for Sign Language Translation with Cross-Modal Alignment. [paper] [Code]
-
【LREC-COLING 2024】Factorized Learning Assisted with Large Language Model for Gloss-free Sign Language Translation. [paper]
-
【ACL 2024】Towards Privacy-Aware Sign Language Translation at Scale. [Paper] [Code]
-
【CVPR 2024】LLMs are Good Sign Language Translators. [paper]
-
【NeurIPS 2024】Improving Gloss-free Sign Language Translation by Reducing Representation Density. [paper] [code]
-
【NeurIPS 2024】Scaling Sign Language Translation. [Paper]
-
【NeurIPS 2024】MM-WLAuslan: Multi-View Multi-Modal Word-Level Australian Sign Language Recognition Dataset. [Paper]
-
【ECCV 2024】A Simple Baseline for Spoken Language to Sign Language Translation with 3D Avatars. [Paper]
-
【ECCV 2024】Visual Alignment Pre-training for Sign Language Translation. [Paper]
-
【IJCAI 2024】Efficient Sign Language Translation with a Curriculum-based Non-autoregressive Decoder. [Paper]
-
【ICLR 2025】YouTube-SL-25: A Large-Scale, Open-Domain Multilingual Sign Language Parallel Corpus. [Paper]
Submittion
- 【NeurIPS 2023】Towards Faithful Sign Language Translation. [Paper]
- 【ICLR 2025】Hybrid Model Collaboration For Sign Language Translation With VQ-VAE And RAG Enhanced LLMS. [Paper]
Journal
- (TPAMI 2023)SignNet II: A Transformer-Based Two-Way Sign Language Translation Model. [Paper]
- (TCSVT 2024)Improving End-to-End Sign Language Translation With Adaptive Video Representation Enhanced Transformer. [Paper]
- (TCSVT 2024)Overcoming Modality Bias in Question-Driven Sign Language Video Translation. [Paper]
Preprint
- 「Arxiv 2024.02.11」American Sign Language Video to Text Translation. [Paper]
- 「Arxiv 2024.02.14」Towards Privacy-Aware Sign Language Translation at Scale. [Paper]
- 「Arxiv 2024.03.19」Factorized Learning Assisted with Large Language Model for Gloss-free Sign Language Translation. [Paper]
- 「Arxiv 2024.05.09」Multi-Stream Keypoint Attention Network for Sign Language Recognition and Translation. [Paper]
- 「Arxiv 2024.06.10」SignBLEU: Automatic Evaluation of Multi-channel Sign Language Translation. [Paper]
- 「Arxiv 2024.06.11」SignMusketeers: An Efficient Multi-Stream Approach for Sign Language Translation at Scale. [Paper]
- 「Arxiv 2024.06.16」Reconsidering Sentence-Level Sign Language Translation. [Paper]
- 「Arxiv 2024.07.12」Gloss2Text: Sign Language Gloss translation using LLMs and Semantically Aware Label Smoothing. [Paper]
- 「Arxiv 2024.07.23」E-TSL: A Continuous Educational Turkish Sign Language Dataset with Baseline Methods. [Paper]
- 「Arxiv 2024.08.13」Fingerspelling within Sign Language Translation. [Paper]
- 「Arxiv 2024.08.19」C2RL: Content and Context Representation Learning for Gloss-free Sign Language Translation and Retrieval. [Paper]
- 「Arxiv 2024.08.20」Event Stream based Sign Language Translation: A High-Definition Benchmark Dataset and A New Algorithm. [Paper]
- 「Arxiv 2024.08.27」From Rule-Based Models to Deep Learning Transformers Architectures for Natural Language Processing and Sign Language Translation Systems: Survey, Taxonomy and Performance Evaluation. [Paper]
- 「Arxiv 2024.09.03」Less is more: concatenating videos for Sign Language Translation from a small set of signs. [Paper]
- 「Arxiv 2024.09.15」ELMI: Interactive and Intelligent Sign Language Translation of Lyrics for Song Signing. [Paper]
- 「Arxiv 2024.09.17」American Sign Language to Text Translation using Transformer and Seq2Seq with LSTM. [Paper]
- 「Arxiv 2024.10.01」Advanced Arabic Alphabet Sign Language Recognition Using Transfer Learning and Transformer Models. [Paper]
- 「Arxiv 2024.10.18」SignAttention: On the Interpretability of Transformer Models for Sign Language Translation. [Paper]
- 「Arxiv 2024.10.25」Diverse Sign Language Translation. [Paper]
- 「Arxiv 2024.11.04」A Spatio-Temporal Representation Learning as an Alternative to Traditional Glosses in Sign Language Translation and Production. [Paper]
- 「Arxiv 2024.11.15」An Efficient Sign Language Translation Using Spatial Configuration and Motion Dynamics with LLMs. [Paper]
- 「Arxiv 2024.11.19」Enhanced Sign Language Translation between American Sign Language (ASL) and Indian Sign Language (ISL) Using LLMs. [Paper]
- 「Arxiv 2024.11.19」Signformer is all you need: Towards Edge AI for Sign Language. [Paper]
- 「Arxiv 2024.11.25」SHuBERT: Self-Supervised Sign Language Representation Learning via Multi-Stream Cluster Prediction. [Paper]
- 「Arxiv 2024.11.25」Leveraging the Power of MLLMs for Gloss-Free Sign Language Translation. [Paper]
- 「Arxiv 2024.12.21」Real-time Bangla Sign Language Translator. [Paper]
- 「Arxiv 2024.12.21」LLaVA-SLT: Visual Language Tuning for Sign Language Translation. [Paper]
- 「Arxiv 2024.12.24」Improvement in Sign Language Translation Using Text CTC Alignment. [Paper]
- 「Arxiv 2025.02.04」 Spatio-temporal transformer to support automatic sign language translation. [Paper]
- 「Arxiv 2025.02.17」GLoT: A Novel Gated-Logarithmic Transformer for Efficient Sign Language Translation. [Paper]
- 「Arxiv 2025.03.03」Co-creation for Sign Language Processing and Machine Translation. [Paper]
- 「Arxiv 2025.03.09」Sign Language Translation using Frame and Event Stream: Benchmark Dataset and Algorithms. [Paper]
- 「Arxiv 2025.03.25」A multitask transformer to sign language translation using motion gesture primitives. [Paper]
- 「Arxiv 2025.04.03」State-of-the-Art Translation of Text-to-Gloss using mBART : A case study of Bangla. [Paper]
- 「Arxiv 2025.04.16」ADAT: Time-Series-Aware Adaptive Transformer Architecture for Sign Language Translation. [Paper]
Preprint
- 「Arxiv 2024.08.17」An Open-Source American Sign Language Fingerspell Recognition and Semantic Pose Retrieval Interface. [Paper]
- 「Arxiv 2024.11.23」AzSLD: Azerbaijani Sign Language Dataset for Fingerspelling, Word, and Sentence Translation with Baseline Software. [Paper]
- 「Arxiv 2025.02.15」SpellRing: Recognizing Continuous Fingerspelling in American Sign Language using a Ring. [Paper]
Conference
- 【ECCV 2024】Pose Guided Fine-Grained Sign Language Video Generation. [Paper]
- 【ECCV 2024】SignGen: End-to-End Sign Language Video Generation with Latent Diffusion. [Paper]
- 【ACL 2024】T2S-GPT: Dynamic Vector Quantization for Autoregressive Sign Language Production from Text. [Paper]
- 【BMVC 2024】Sign Stitching: A Novel Approach to Sign Language Production. [Paper]
- 【CVPR 2024】Neural Sign Actors: A Diffusion Model for 3D Sign Language Production from Text. [Paper]
Submittion
- 【ICLR 2024】NaturalSigner: Diffusion Models are Natural Sign Language Generator. [Paper]
- 【ICLR 2025】DiffSign: AI-Assisted Generation of Customizable Sign Language Videos With Enhanced Realism. [Paper]
Preprint
- 「Arxiv 2024.04.17」Select and Reorder: A Novel Approach for Neural Sign Language Production. [Paper]
- 「Arxiv 2024.12.07」SignAvatar: Sign Language 3D Motion Reconstruction and Generation. [Paper]
- 「Arxiv 2024.05.16」Diversity-Aware Sign Language Production through a Pose Encoding Variational Autoencoder. [Paper]
- 「Arxiv 2024.07.04」MS2SL: Multimodal Spoken Data-Driven Continuous Sign Language Production. [Paper]
- 「Arxiv 2024.11.26」DiffSLT: Enhancing Diversity in Sign Language Translation via Diffusion Model. [Paper]
- 「Arxiv 2024.11.26」Signs as Tokens: An Autoregressive Multilingual Sign Language Generator. [Paper]
- 「Arxiv 2024.11.29」SignLLM: Sign Language Production Large Language Models. [Paper]
- 「Arxiv 2024.12.19」Sign-IDD: Iconicity Disentangled Diffusion for Sign Language Production. [Paper] [Code]
- 「Arxiv 2024.12.22」Linguistics-Vision Monotonic Consistent Network for Sign Language Production. [Paper]
- 「Arxiv 2025.01.01」Beyond Words: AuralLLM and SignMST-C for Precise Sign Language Production and Bidirectional Accessibility. [Paper]
- 「Arxiv 2025.01.12」Comparison of Autoencoders for tokenization of ASL datasets. [Paper]
- 「Arxiv 2025.02.08」Towards AI-driven Sign Language Generation with Non-manual Markers. [Paper]
- 「Arxiv 2025.03.04」A Transformer-Based Framework for Greek Sign Language Production using Extended Skeletal Motion Representations. [Paper]
- 「Arxiv 2025.03.20」Text-Driven Diffusion Model for Sign Language Production. [Paper]
- 「Arixv 2025.04.09」Disentangle and Regularize: Sign Language Production with Articulator-Based Disentanglement and Channel-Aware Regularization. [Paper]
Conference
- 【ICLR 2025】Uni-Sign: Toward Unified Sign Language Understanding at Scale. [Paper] [Code]
- 【ECCV 2024】SignAvatars: A Large-scale 3D Sign Language Holistic Motion Dataset and Benchmark. [Paper]
Journal
- (TPAMI 2023)SignBERT+: Hand-Model-Aware Self-Supervised Pre-Training for Sign Language Understanding. [Paper]
Preprint
- 「Arxiv 2024.08.16」Scaling up Multimodal Pre-training for Sign Language Understanding. [Paper]
- 「Arxiv 2024.10.07」Studying and Mitigating Biases in Sign Language Understanding Models.[Paper]
- 「Arxiv 2025.03.11」SignRep: Enhancing Self-Supervised Sign Representations. [Paper]
- Real Time American Sign Language Detection Using Yolo-v9
- Enhancing Sign Language Detection through Mediapipe and Convolutional Neural Networks (CNN)
- A Transformer Model for Boundary Detection in Continuous Sign Language
- 【CVPR 2024】CiCo: Domain-Aware Sign Language Retrieval via Cross-Lingual Contrastive Learning
- 【ACMMM 2024】SEDS: Semantically Enhanced Dual-Stream Encoder for Sign Language Retrieval
- SLVideo: A Sign Language Video Moment Retrieval Framework
- 【ECCV 2024】Uncertainty-aware sign language video retrieval with probability distribution modeling
- SignCLIP: Connecting Text and Sign Language by Contrastive Learning
Preprint
- 「Arxiv 2025.03.05」Deep Understanding of Sign Language for Sign to Subtitle Alignment. [Paper]
- 「Arxiv 2025.04.11」Hands-On: Segmenting Individual Signs from Continuous Sequences. [Paper]
Conference
- 【EMNLP 2024】ASL STEM Wiki: Dataset and Benchmark for Interpreting STEM Articles. [Paper]
- 【CVPR 2025】Discrete to Continuous: Generating Smooth Transition Poses from Sign Language Observations. [Paper]
- 【CVPR 2025】VSNet: Focusing on the Linguistic Characteristics of Sign Language. [Paper]
Submittion
- 【ICLR 2023】URVoice: An Akl-Toussaint/ Graham- Sklansky Approach towards Convex Hull Computation for Sign Language Interpretation. [Paper]
Preprint
- 「Arxiv 2023.04.01」Ham2Pose: Animating Sign Language Notation Into Pose Sequences. [Paper]
- 「Arxiv 2024.06.18」A Comparative Study of Continuous Sign Language Recognition Techniques. [Paper]
- 「Arxiv 2024.12.02」Real-Time Multilingual Sign Language Processing. [Paper]
- 「Arxiv 2024.12.11」2M-BELEBELE: Highly Multilingual Speech and American Sign Language Comprehension Dataset. [Paper]
- 「Arxiv 2025.03.09」Virtual Co-presenter: Connecting Deaf and Hard-of-hearing Livestreamers and Hearing audience in E-commerce Livestreaming. [Paper]
- 「Arxiv 2025.04.04」See-Through Face Display for DHH People: Enhancing Gaze Awareness in Remote Sign Language Conversations with Camera-Behind Displays. [Paper]
- 「Arxiv 2025.04.08」Towards an AI-Driven Video-Based American Sign Language Dictionary: Exploring Design and Usage Experience with Learners. [Paper]
Conference
-
【NeurIPS 2024】ShareGPT4Video: Improving Video Understanding and Generation with Better Captions. [Paper]
-
【NeurIPS 2024】Animal-Bench: Benchmarking Multimodal Video Models for Animal-centric Video Understanding. [Paper]
-
【NeurIPS 2024】TOPA: Extending Large Language Models for Video Understanding via Text-Only Pre-Alignment. [Paper]
-
【NeurIPS 2024】Streaming Long Video Understanding with Large Language Models. [Paper]
-
【NeurIPS 2024】VERIFIED: A Video Corpus Moment Retrieval Benchmark for Fine-Grained Video Understanding. [Paper]
-
【NeurIPS 2024】Video Token Merging for Long Video Understanding. [Paper]
-
【NeurIPS 2024】MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding. [Paper]
-
【ICLR 2025】TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning. [Paper]
-
【ICLR 2025】CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding. [[Paper](CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding)]
-
【ICLR 2025】VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks. [Paper]
-
【ICLR 2025】SVBench: A Benchmark with Temporal Multi-Turn Dialogues for Streaming Video Understanding. [Paper]
-
【ICLR 2025】Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge. [Paper]
-
【CVPR 2025】STOP: Integrated Spatial-Temporal Dynamic Prompting for Video Understanding
-
【CVPR 2025】MMVU: Measuring Expert-Level Multi-Discipline Video Understanding
-
【CVPR 2025】OVBench: How Far is Your Video-LLMs from Real-World Online Video Understanding?
-
【CVPR 2025】Towards Vision Language Models For Extra-Long Video Understanding
-
【CVPR 2025】VideoICL: Confidence-based Iterative In-context Learning for Out-of-Distribution Video Understanding
-
【CVPR 2025】ViCaS: A Dataset for Combining Holistic and Pixel-level Video Understanding using Captions with Grounded Segmentation
-
【CVPR 2025】Towards Universal Soccer Video Understanding
-
【CVPR 2025】Adaptive Keyframe Sampling for Long Video Understanding
-
【CVPR 2025】BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understanding
-
【CVPR 2025】MLVU: Benchmarking Multi-task Long Video Understanding
-
【CVPR 2025】DrVideo: Document Retrieval Based Long Video Understanding
-
【CVPR 2025】VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation
-
【CVPR 2025】Mamba4D: Efficient 4D Point Cloud Video Understanding with Disentangled Spatial-Temporal State Space Models
-
【CVPR 2025】DynFocus: Dynamic Cooperative Network Empowers LLMs with Video Understanding
-
【CVPR 2025】VidHalluc: Evaluating Temporal Hallucinations in Multimodal Large Language Models for Video Understanding
-
【CVPR 2025】Adapting Pre-trained 3D Models for Point Cloud Video Understanding via Cross-frame Spatio-temporal Perception
-
【CVPR 2025】Re-thinking Temporal Search for Long-Form Video Understanding
-
【CVPR 2025】Apollo: An Exploration of Video Understanding in Large Multi-Modal Models
-
【CVPR 2025】HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding
-
【CVPR 2025】M-LLM Based Video Frame Selection for Efficient Video Understanding
-
【CVPR 2025】VideoChat-Online: Towards Online Spatial-Temporal Video Understanding via Large Video Language Models
Large-scale Video Classification with Convolutional Neural Networks
Beyond Short Snippets: Deep Networks for Video Classification
Conference
- 【ICLR 2023】Temporal Coherent Test Time Optimization for Robust Video Classification. [Paper]
Two-Stream Convolutional Networks for Action Recognition in Videos
Learning Spatiotemporal Features with 3D Convolutional Networks
Convolutional Two-Stream Network Fusion for Video Action Recognition
Temporal Segment Networks: Towards Good Practices for Deep Action Recognition
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
A Closer Look at Spatiotemporal Convolutions for Action Recognition
Non-local Neural Networks
SlowFast Networks for Video Recognition
Is Space-Time Attention All You Need for Video Understanding?
Conference
- 【ICLR 2023】Graph Contrastive Learning for Skeleton-based Action Recognition. [Paper]
- 【ICLR 2023】AIM: Adapting Image Models for Efficient Video Action Recognition. [Paper]
- 【ICML 2024】Memory Consolidation Enables Long-Context Video Understanding. [Paper]
- 【ICML 2024】VideoPrism: A Foundational Visual Encoder for Video Understanding. [Paper]
- 【NeurIPS 2024】CHASE: Learning Convex Hull Adaptive Shift for Skeleton-based Multi-Entity Action Recognition. [Paper]
- 【NeurIPS 2024】ActionAtlas: A VideoQA Benchmark for Domain-specialized Action Recognition. [Paper]
- 【NeurIPS 2024】Recovering Complete Actions for Cross-dataset Skeleton Action Recognition. [Paper]
- 【ICLR 2024】SpikePoint: An Efficient Point-based Spiking Neural Network for Event Cameras Action Recognition. [Paper]
- 【ICLR 2024】FROSTER: Frozen CLIP is A Strong Teacher for Open-Vocabulary Action Recognition. [Paper]
- 【ICLR 2025】TASAR: Transfer-based Attack on Skeletal Action Recognition. [Paper]
- 【ICLR 2025】ALBAR: Adversarial Learning approach to mitigate Biases in Action Recognition. [Paper]
- 【CVPR 2025】Semantic-guided Cross-Model Prompt Learning for skeleton-based zero-shot action recognition
- 【CVPR 2025】Neuron : Learning Context-Aware Evolving Representations for Zero-Shot Skeleton Action Recognition
- 【CVPR 2025】Revealing Key Details to See Differences: A Novel Prototypical Perspective for Skeleton-based Action Recognition
- 【CVPR 2025】TAMT: Temporal-Aware Model Tuning for Cross-Domain Few-Shot Action Recognition
- 【CVPR 2025】Temporal Alignment-Free Video Matching for Few-shot Action Recognition
- 【CVPR 2025】Are Spatial-Temporal Graph Convolution Networks for Human Action Recognition Over-Parameterized?
Conference
- 【ICML 2023】Sequence Modeling with Multiresolution Convolutional Memory. [Paper]
- 【ICML 2023】Simple Hardware-Efficient Long Convolutions for Sequence Modeling. [Paper]
- 【ICML 2023】CAB: Comprehensive Attention Benchmarking on Long Sequence Modeling. [Paper]
- 【ICLR 2023】Simplified State Space Layers for Sequence Modeling. [Paper]
- 【ICLR 2023】Learning Cut Selection for Mixed-Integer Linear Programming via Hierarchical Sequence Model. [[Paper](Learning Cut Selection for Mixed-Integer Linear Programming via Hierarchical Sequence Model)]
- 【ICLR 2023】Bayes risk CTC: Controllable CTC alignment in Sequence-to-Sequence tasks. [Paper]
- 【ICLR 2023】Planning with Sequence Models through Iterative Energy Minimization. [Paper]
- 【ICLR 2023】Sub-Task Dec 6DAF omposition Enables Learning in Sequence to Sequence Tasks. [Paper]
- 【ICLR 2023】ChordMixer: A Scalable Neural Attention Model for Sequences with Different Length. [Paper]
- 【ICLR 2023】Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task. [Paper]
- 【ICLR 2023】What Makes Convolutional Models Great on Long Sequence Modeling? [Paper]
- 【ICLR 2023】Multiple sequence alignment as a sequence-to-sequence learning problem. [Paper]
- 【ICLR 2023】Data Continuity Matters: Improving Sequence Modeling with Lipschitz Regularizer. [Paper]
- 【ICLR 2023】Continuous-Discrete Convolution for Geometry-Sequence Modeling in Proteins. [Paper]
- 【ICLR 2023】Toeplitz Neural Network for Sequence Modeling. [Paper]
- 【ICML 2024】Hierarchical State Space Models for Continuous Sequence-to-Sequence Modeling. [Paper]
- 【ICML 2024】Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling. [Paper]
- 【ICML 2024】Reinformer: Max-Return Sequence Modeling for Offline RL. [Paper]
- 【ICML 2024】VQDNA: Unleashing the Power of Vector Quantization for Multi-Species Genomic Sequence Modeling. [Paper]
- 【ICML 2024】Never Train from Scratch: Fair Comparison of Long-Sequence Models Requires Data-Driven Priors. [Paper]
- 【ICML 2024】FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores. [Paper]
- 【ICML 2024】IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs. [Paper]
- 【ICML 2024】Traveling Waves Encode The Recent Past and Enhance Sequence Learning. [Paper]
- 【ICML 2024】SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking. [Paper]
- 【ICML 2024】Robustifying State-space Models for Long Sequences via Approximate Diagonalization. [Paper]
- 【ICML 2024】Parallelizing non-linear sequential models over the sequence length. [Paper]
- 【NeurIPS 2024】Understanding the Expressive Power and Mechanisms of Transformer for Sequence Modeling. [Paper]
- 【NeurIPS 2024】Approximation Rate of the Transformer Architecture for Sequence Modeling. [Paper]
- 【NeurIPS 2024】MambaLRP: Explaining Selective State Space Sequence Models. [Paper]
- 【NeurIPS 2024】Mini-Sequence Transformers: Optimizing Intermediate Memory for Long Sequences Training. [Paper]
- 【NeurIPS 2024】Improving Adaptivity via Over-Parameterization in Sequence Models. [Paper]
- 【NeurIPS 2024】Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum. [Paper]
- 【NeurIPS 2024】3DET-Mamba: Causal Sequence Modelling for End-to-End 3D Object Detection. [Paper]
- 【NeurIPS 2024】Reparameterized Multi-Resolution Convolutions for Long Sequence Modelling. [Paper]
- 【NeurIPS 2024】Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling. [Paper]
- 【NeurIPS 2024】Parallelizing Linear Transformers with the Delta Rule over Sequence Length. [Paper]
- 【NeurIPS 2024】Gated Slot Attention for Efficient Linear-Time Sequence Modeling. [Paper]
- 【NeurIPS 2024】Meta-DT: Offline Meta-RL as Conditional Sequence Modeling with World Model Disentanglement. [Paper]
- 【ICLR 2025】Tackling Data Corruption in Offline Reinforcement Learning via Sequence Modeling. [Paper]
- 【ICLR 2025】Limits of Deep Learning: Sequence Modeling through the Lens of Complexity Theory. [Paper]
- 【ICLR 2025】Samba: Synchronized Set-of-Sequences Modeling for Multiple Object Tracking. [Paper]
- 【ICLR 2025】Why RoPE Struggles to Maintain Long-Term Decay in Long Sequences? [Paper]
- 【ICLR 2025】FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference. [Paper]
- 【ICLR 2025】Bio-xLSTM: Generative modeling, representation and in-context learning of biological and chemical sequences. [Paper]
- 【ICLR 2025】Flash Inference: Near Linear Time Inference for Long Convolution Sequence Models and Beyond. [Paper]
- 【ICLR 2025】mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models. [[Paper](mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models)]
- 【CVPR 2025】Image Re-ranking with Long-Context Sequence Modeling
- 【CVPR 2025】Parallel Sequence Modeling via Generalization Spatial Propagation Network
- 【CVPR 2025】Bridging Gait Recognition and Large Language Models Sequence Modeling
- 【CVPR 2025】Contextual AD Narration with Interleaved Multimodal Sequence
- 【CVPR 2025】HOIGPT: Learning Long Sequence Hand-Object Interaction with Language Models
- 【CVPR 2025】Advancing Semantic Future Prediction through Multimodal Visual Sequence Transformers
- 【CVPR 2025】DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos
- 【CVPR 2025】KeyFace: Expressive Audio-Driven Facial Animation for Long Sequences via KeyFrame Interpolation
[Arxiv 2018] An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling
Gated Linear Attention Transformers with Hardware-Efficient Training
Long Short-Term Memory
DeltaProduct: Increasing the Expressivity of DeltaNet Through Products of Householders
An Uncertainty Principle for Linear Recurrent Neural Networks
SSMLoRA: Enhancing Low-Rank Adaptation with State Space Model
【NeurIPS 2024】Parallelizing Linear Transformers with the Delta Rule over Sequence Length. [Paper]
「Arxiv 2024.08.11」Learning to (Learn at Test Time): RNNs with Expressive Hidden States. [Paper] [Code]
「Arxiv 2024.08.27」Gated Linear Attention Transformers with Hardware-Efficient Training. [Paper]
「Arxiv 2024.11.09」Gated Delta Networks: Improving Mamba2 with Delta Rule. [Paper] [Code]
-
【Fully-Supervised】【CVPR 2024】FACT: Frame-Action Cross-Attention Temporal Modeling for Efficient Fully-Supervised Action Segmentation
-
【科学问题】如何使帧和动作交互
-
【解决方案】训练一组 Action Token,使用注意力机制进行帧到动作和动作到帧的跨注意力
-
【Fully-Supervised】【NeurIPS 2024】Activity Grammars for Temporal Action Segmentation
-
Diffusion Action Segmentation
-
【科学问题】如何建模超长视频语义
-
【解决方案】分为两支,一支用 local window attention 计算建模局部,一支采样帧,做全局建模
-
【Fully-Supervised】【ICCV 2023】How Much Temporal Long-Term Context is Needed for Action Segmentation?
-
【Fully-Supervised】【ISKE 2023】Streaming Video Temporal Action Segmentation In Real Time
-
【Fully-Supervised】【NeurIPS 2022】Don't Pour Cereal into Coffee: Differentiable Temporal Logic for Temporal Action Segmentation
-
【Fully-Supervised】【PR 2022】Maximization and Restoration: Action Segmentation through Dilation Passing and Temporal Reconstruction
-
【Fully-Supervised】【ICIP 2022】Mcfm: Mutual Cross Fusion Module for Intermediate Fusion-Based Action Segmentation
-
【Fully-Supervised】【IVC 2022】Multistage temporal convolution transformer for action segmentation
-
【Fully-Supervised】【Arxiv 2022】Semantic2Graph: Graph-based Multi-modal Feature Fusion for Action Segmentation in Videos
-
【Fully-Supervised】【IJCAI 2022】Uncertainty-Aware Representation Learning for Action Segmentation
-
【Fully-Supervised】【ECCV 2022】Unified Fully and Timestamp Supervised Temporal Action Segmentation via Sequence to Sequence Translation
-
【Fully-Supervised】【BMVC 2021】ASFormer: Transformer for Action Segmentation
-
【Fully-Supervised】【Arxiv 2021】Coarse to Fine Multi-Resolution Temporal Convolutional Network
-
【Fully-Supervised】【GCPR 2021】FIFA: Fast Inference Approximation for Action Segmentation
-
【Fully-Supervised】【CVPR 2021】Global2Local: Efficient Structure Search for Video Action Segmentation
-
【Weakly-Supervised】【CVPR 2024】Efficient and Effective Weakly-Supervised Action Segmentation via Action-Transition-Aware Boundary Alignment
-
【科学问题】如何理解动作过度
-
【Weakly-Supervised】【IROS 2023】Is Weakly-supervised Action Segmentation Ready For Human-Robot Interaction ? No , Let ’ s Improve It With Action-union Learning
-
【科学问题】如何从时间戳中生成其他帧的伪标签
-
【解决问题】如何更好利用时间戳
-
【Weakly-Supervised】【CVPR 2023】Reducing the Label Bias for Timestamp Supervised Temporal Action Segmentation
-
【Weakly-Supervised】【IJCAI 2023】Timestamp-Supervised Action Segmentation in the Perspective of Clustering
-
【Weakly-Supervised】【ECCV 2022】A Generalized & Robust Framework For Timestamp Supervision in Temporal Action Segmentation
-
【Weakly-Supervised】【WACV 2022】Hierarchical Modeling for Task Recognition and Action Segmentation in Weakly-Labeled Instructional Videos
-
【Weakly-Supervised】【BMVC 2022】Robust Action Segmentation from Timestamp Supervision
-
【Weakly-Supervised】【CVPR 2022】Semi-Weakly-Supervised Learning of Complex Actions from Instructional Task Videos
-
【Weakly-Supervised】【CVPR 2022】Set-Supervised Action Learning in Procedural Task Videos via Pairwise Order Consistency
-
【Weakly-Supervised】【TMM 2022】Temporal Action Segmentation with High-level Complex Activity Labels
-
【Weakly-Supervised】【IROS 2022】Timestamp-Supervised Action Segmentation with Graph Convolutional Networks
-
【Weakly-Supervised】【ICME 2022】Turning to a Teacher for Timestamp Supervised Temporal Action Segmentation
-
【Weakly-Supervised】【ECCV 2022】Unified Fully and Timestamp Supervised Temporal Action Segmentation via Sequence to Sequence Translation
-
【Weakly-Supervised】【CVPR 2022】Weakly-Supervised Online Action Segmentation in Multi-View Instructional Videos
-
【Weakly-Supervised】【CVPR 2021】Anchor-Constrained Viterbi for Set-Supervised Action Segmentation
-
【Weakly-Supervised】【TPAMI 2021】Fast Weakly Supervised Action Segmentation Using Mutual Consistency
-
【Weakly-Supervised】【CVPR 2021】Learning Discriminative Prototypes with Dynamic Time Warping
-
【Weakly-Supervised】【CVPR 2021】Temporal Action Segmentation from Timestamp Supervision
-
【Weakly-Supervised】【CVPR 2021】Weakly-Supervised Action Segmentation and Alignment via Transcript-Aware Union-of-Subspaces Learning
-
【Weakly-Supervised】【AAAI 2021】Weakly-supervised Temporal Action Localization by Uncertainty Modeling
-
【Unsupervised】【CVPR 2024】Temporally Consistent Unbalanced Optimal Transport for Unsupervised Action Segmentation
2024.11.00 ~ 2024.09.20
- 【Arxiv 2024】Unified Speech Recognition: A Single Model for Auditory, Visual, and Audiovisual Inputs
- 【TASLP 2024】Optimizing Contextual Speech Recognition Using Vector Quantization for Efficient Retrieval
- 【Arxiv 2024】Augmenting Polish Automatic Speech Recognition System With Synthetic Data
- 【Arxiv 2024】Using Confidence Scores to Improve Eyes-free Detection of Speech Recognition Errors
- 【Arxiv 2024】A Survey on Speech Large Language Models
- 【Arxiv 2024】We Augmented Whisper With kNN and You Won't Believe What Came Next
- 【EMNLP 2024 Findings】STTATTS: Unified Speech-To-Text And Text-To-Speech Model
- 【Arxiv 2024】Evaluating and Improving Automatic Speech Recognition Systems for Korean Meteorological Experts
- 【Arxiv 2024】Improving Automatic Speech Recognition with Decoder-Centric Regularisation in Encoder-Decoder Models
- 【Arxiv 2024】Enhancing Low-Resource ASR through Versatile TTS: Bridging the Data Gap
- 【Arxiv 2024】DENOASR: Debiasing ASRs through Selective DenoisingTemporal Action Segmentation
- 【Arxiv 2024】End-to-End Transformer-based Automatic Speech Recognition for Northern Kurdish: A Pioneering Approach
- 【Arxiv 2024】Moonshine: Speech Recognition for Live Transcription and Voice Commands
- 【Arxiv 2024】Enhancing Multimodal Sentiment Analysis for Missing Modality through Self-Distillation and Unified Modality Cross-Attention
- 【Arxiv 2024】AC-Mix: Self-Supervised Adaptation for Low-Resource Automatic Speech Recognition using Agnostic Contrastive Mixup
- 【Arxiv 2024】A two-stage transliteration approach to improve performance of a multilingual ASR
- 【Arxiv 2024】Parameter-efficient Adaptation of Multilingual Multimodal Models for Low-resource ASR
- 【Arxiv 2024】Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation
- 【AAAI-FSS 2024】Automatic Screening for Children with Speech Disorder using Automatic Speech Recognition: Opportunities and Challenges
- 【Arxiv 2024】In-Materia Speech Recognition
- 【Arxiv 2024】Automatic Speech Recognition with BERT and CTC Transformers: A Review
- 【Arxiv 2024】Enhancing Indonesian Automatic Speech Recognition: Evaluating Multilingual Models with Diverse Speech Variabilities
- **【ICASSP 2024】**Full-Rank No More: Low-Rank Weight Training for Modern Speech Recognition Models
- 【Arxiv 2024】Advocating Character Error Rate for Multilingual ASR Evaluation
- 【Arxiv 2024】Incorporating Talker Identity Aids With Improving Speech Recognition in Adversarial Environments
- 【Arxiv 2024】CR-CTC: Consistency regularization on CTC for improved speech recognition
- 【Arxiv 2024】Efficient and Robust Long-Form Speech Recognition with Hybrid H3-Conformer
- 【Arxiv 2024】Reverb: Open-Source ASR and Diarization from Rev
- 【Arxiv 2024】SeeSay: An Assistive Device for the Visually Impaired Using Retrieval Augmented Generation
- 【Arxiv 2024】Efficient Streaming LLM for Speech Recognition
- 【Arxiv 2024】Convolutional Variational Autoencoders for Spectrogram Compression in Automatic Speech Recognition
- 【Arxiv 2024】Algorithms For Automatic Accentuation And Transcription Of Russian Texts In Speech Recognition Systems
- 【Arxiv 2024】Automatic Speech Recognition for the Ika Language
- **【EMNLP 224】**VHASR: A Multimodal Speech Recognition System With Vision Hotwords
- 【Arxiv 2024】End-to-End Speech Recognition with Pre-trained Masked Language Model
- 【ICASSP 2025】Alignment-Free Training for Transducer-based Multi-Talker ASR
- 【ICASSP 2025】Mamba for Streaming ASR Combined with Unimodal Aggregation
- 【Interspeech 2024】Boosting Hybrid Autoregressive Transducer-based ASR with Internal Acoustic Model Training and Dual Blank Thresholding
- 【ICASSP 2025】Alignment-Free Training for Transducer-based Multi-Talker ASR
- 【ICASSP 2025】Predictive Speech Recognition and End-of-Utterance Detection Towards Spoken Dialog Systems
- 【ICASSP 2025】HDMoLE: Mixture of LoRA Experts with Hierarchical Routing and Dynamic Thresholds for Fine-Tuning LLM-based ASR Models
- 【Interspeech 2024】Fine-Tuning Automatic Speech Recognition for People with Parkinson's: An Effective Strategy for Enhancing Speech Technology Accessibility
- 【ICASSP 2025】Efficient Long-Form Speech Recognition for General Speech In-Context Learning
SimCLR
BYOL
MoCo v1
MoCo v2
MoCo v3
SimSiam
Dino
Update Log
Conference
ICML 2024
- Sign Language
- Video Understanding
- Sequence Modeling
ICML 2023
- Sign Language
- Video Understanding
- Sequence Modeling
ICLR 2025
- Sign Language
- Video Understanding
- Sequence Modeling
ICLR 2024
- Sign Language
- Video Understanding
- Sequence Modeling
ICLR 2023
- Sign Language
- Video Understanding
- Sequence Modeling
NeruIPS 2024
- Sign Language
- Video Understanding
- Sequence Modeling
NeruIPS 2023
- Sign Language
- Video Understanding
- Sequence Modeling
CVPR 2025
- Sign language
- Video Understanding
- Video Classification
- Action Recognition
- Sequence
CVPR 2024
- Sign Language
- Video Understanding
- Sequence Modeling
CVPR 2023
- Sign Language
- Video Understanding
- 9E19 Sequence Modeling
ICCV 2023
- Sign Language
- Video Understanding
- Sequence Modeling
ECCV 2024
- Sign Language
- Video Understanding
- Sequence Modeling
AAAI 2024
- Sign Language
- Video Understanding
- Sequence Modeling
AAAI 2023
- Sign Language
- Video Understanding
- Sequence Modeling
WWW 2024
- Sign Language
- Video Understanding
- Sequence Modeling
WWW 2023
- Sign Language
- Video Understanding
- Sequence Modeling
ACMMM 2024
- Sign Language
- Video Understanding
- Sequence Modeling
ACMMM 2023
- Sign Language
- Video Understanding
- Sequence Modeling
SIGIR 2024
- Sign Language
- Video Understanding
- Sequence Modeling
SIGIR 2023
- Sign Language
- Video Understanding
- Sequence Modeling
IJCAI 2024
- Sign Language
- Video Understanding
- Sequence Modeling
IJCAI 2023
- Sign Language
- Video Understanding
- Sequence Modeling
KDD 2024
- Sign Language
- Video Understanding
- Sequence Modeling
KDD 2023
- Sign Language
- Video Understanding
- Sequence Modeling
TPAMI 2024
- Sign Language
- Video Understanding
- Sequence Modeling
TPAMI 2023
- Sign Language
- Video Understanding
- Sequence Modeling
JMLR 2024
- Sign Language
- Video Understanding
- Sequence Modeling
JMLR 2023
- Sign Language
- Video Understanding
- Sequence Modeling
TOG 2024
- Sign Language
- Video Understanding
- Sequence Modeling
TOG 2023
- Sign Language
- Video Understanding
- Sequence Modeling
IJCV 2024
- Sign Language
- Video Understanding
- Sequence Modeling
IJCV 2023
- Sign Language
- Video Understanding
- Sequence Modeling
TIP 2024
- Sign Language
- Video Understanding
- Sequence Modeling
TIP 2023
- Sign Language
- Video Understanding
- Sequence Modeling
TC 2024
- Sign Language
- Video Understanding
- Sequence Modeling
TC 2023
- Sign Language
- Video Understanding
- Sequence Modeling
TCSVT 2024
- Sign Language
- Video Understanding
- Sequence Modeling
TCSVT 2023
- Sign Language
- Video Understanding
- Sequence Modeling
TMM 2024
- Sign Language
- Video Understanding
- Sequence Modeling
TMM 2023
- Sign Language
- Video Understanding
- Sequence Modeling
TVCG 2024
- Sign Language
- Video Understanding
- Sequence Modeling
TVCG 2023
- Sign Language
- Video Understanding
- Sequence Modeling
PR 2024
- Sign Language
- Video Understanding
- Sequence Modeling
PR 2023
- Sign Language
- Video Understanding
- Sequence Modeling
Arxiv
2025.03
- Sign Language
- Video Understanding
- Sequence Modeling
2025.02
- Sign Language
- Video Understanding
- Sequence Modeling
2025.01
- Sign Language
- Video Understanding
- Sequence Modeling
2024.12
- Sign Language
- Video Understanding
- Sequence Modeling
2024.11
- Sign Language
- Video Understanding
- Sequence Modeling
2024.10
- Sign Language
- Video Understanding
- Sequence Modeling
2024.09
- Sign Language
- Video Understanding Sequence Modeling
2024.08
- Sign Language
- Video Understanding
- Sequence Modeling
2024.07
- Sign Language
- Video Understanding
- Sequence Modeling
2024.06
- Sign Language
- Video Understanding
- Sequence Modeling
2024.05
- Sign Language
- Video Understanding
- Sequence Modeling
2024.04
- Sign Language
- Video Understanding
- Sequence Modeling
2024.03
- Sign Language
- Video Understanding
- Sequence Modeling
2024.02
- Sign Language
- Video Understanding
- Sequence Modeling
2024.01
- Sign Language
- Video Understanding
- Sequence Modeling