8000 GitHub - guozihang/gzhlaker_awesome: An Awesome Collection for Sequence Modeling in Diversity Field
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

guozihang/gzhlaker_awesome

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 

Repository files navigation

这个文档包含了我所跟踪的一些领域的最新动态。

Sign Language Recognition

Isolated Sign Language Recognition

Conference

  • 【ICCV 2023】Human Part-wise 3D Motion Context Learning for Sign Language Recognition. [Paper]
  • 【CVPR 2023】Natural Language-Assisted Sign Language Recognition. [Paper] [Code]
  • 【CVPRW 2023】Isolated Sign Language Recognition based on Tree Structure Skeleton Images. [Paper] [Code]
  • 【AAAI 2023】BEST: BERT Pre-Training for Sign Language Recognition with Coupling Tokenization. [Paper]
  • 【NeurIP 8000 S 2023】PopSign ASL v1.0: An Isolated American Sign Language Dataset Collected via Smartphones. [Paper]
  • 【NeurIPS 2023】ASL Citizen: A Community-Sourced Dataset for Advancing Isolated Sign Language Recognition. [Paper]
  • 【ACMMM 2024】Siformer: Feature-isolated Transformer for Efficient Skeleton-based Sign Language Recognition. [Paper]
  • 【COLING 2024】Dynamic Spatial-Temporal Aggregation for Skeleton-Aware Sign Language Recognition [Paper] [code].
  • 【CVPR 2025】Lost in Translation, Found in Context: Sign Language Translation with Contextual Cues. [Paper]
  • 【WWW 2025】Exploiting Ensemble Learning for Cross-View Isolated Sign Language Recognition. [Paper]

Workshop

  • 【ICCVW 2023】New keypoint-based approach for recognising British Sign Language (BSL) from sequences. [Paper]

Submittion

  • 【ICLR 2025】Representing Signs as Signs: One-Shot ISLR to Facilitate Functional Sign Language Technologies. [Paper]

Journal

  • (TIP 2024)Self-Supervised Representation Learning with Spatial-Temporal Consistency for Sign Language Recognition. [Paper] [Code]

  • (TPAMI 2023)Towards Zero-Shot Sign Language Recognition. [Paper]

  • (TCSVT 2024)MASA: Motion-aware Masked Autoencoder with Semantic Alignment for Sign Language Recognition. [Paper] [Code]

  • (TMM 2024)SKIM: Skeleton-Based Isolated Sign Language Recognition With Part Mixing. [Paper]

  • #(PR 2024)Cross-lingual few-shot sign language recognition. [Paper]

Preprint

  • 「Arxiv 2024.01.22」Connecting the Dots: Leveraging Spatio-Temporal Graph Neural Networks for Accurate Bangla Sign Language Recognition. [Paper]
  • 「Arxiv 2024.02.13」BdSLW60: A Word-Level Bangla Sign Language Dataset. [Paper]
  • 「Arxiv 2024.03.19」Dynamic Spatial-Temporal Aggregation for Skeleton-Aware Sign Language Recognition. [Paper]
  • 「Arxiv 2024.04.15」Transfer Learning for Cross-dataset Isolated Sign Language Recognition in Under-Resourced Datasets. [Paper]
  • 「Arxiv 2024.04.24」Sign Language Recognition based on YOLOv5 Algorithm for the Telugu Sign Language. [Paper]
  • 「Arxiv 2024.04.29」Enhancing Brazilian Sign Language Recognition through Skeleton Image Representation. [Paper]
  • 「Arxiv 2024.06.24」PenSLR: Persian end-to-end Sign Language Recognition Using Ensembling. [Paper]
  • 「Arxiv 2024.07.02」Sign Language Recognition Based On Facial Expression and Hand Skeleton. [Paper]
  • 「Arxiv 2024.06.27」A Transformer-Based Multi-Stream Approach for Isolated Iranian Sign Language Recognition. [Paper]
  • 「Arxiv 2024.07.07」iSign: A Benchmark for Indian Sign Language Processing. [Paper]
  • 「Arxiv 2024.08.20」BAUST Lipi: A BdSL Dataset with Deep Learning Based Bangla Sign Language Recognition. [Paper]
  • 「Arxiv 2024.01.14」Revolutionizing Communication with Deep Learning and XAI for Enhanced Arabic Sign Language Recognition. [Paper]
  • 「Arxiv 2024.09.11」Deep Neural Network-Based Sign Language Recognition: A Comprehensive Approach Using Transfer Learning with Explainability. [Paper]
  • 「Arxiv 2024.08.26」Bengali Sign Language Recognition through Hand Pose Estimation using Multi-Branch Spatial-Temporal Attention Model. [Paper]
  • 「Arxiv 2024.09.27」Hierarchical Windowed Graph Attention Network and a Large Scale Dataset for Isolated Indian Sign Language Recognition. [Paper]
  • #「Arxiv 2024.11.10」Classification in Japanese Sign Language Based on Dynamic Facial Expressions. [Paper]
  • 「Arxiv 2024.12.10」Real-time Sign Language Recognition Using MobileNetV2 and Transfer Learning. [Paper]
  • 「Arxiv 2024.12.16」Training Strategies for Isolated Sign Language Recognition. [Paper]
  • 「Arxiv 2024.12.24」Learning Sign Language Representation using CNN LSTM, 3DCNN, CNN RNN LSTM and CCN TD. [Paper]
  • #「Arxiv 2025.02.27」Representing Signs as Signs: One-Shot ISLR to Facilitate Functional Sign Language Technologies. [Paper]
  • #「Arxiv 2025.03.04」BdSLW401: Transformer-Based Word-Level Bangla Sign Language Recognition Using Relative Quantization Encoding (RQE). [Paper]
  • 「Arxiv 2025.03.16」Cross-Modal Consistency Learning for Sign Language Recognition. [Paper]
  • #「Arxiv 2025.03.16」ISLR101: an Iranian Word-Level Sign Language Recognition Dataset. [Paper]
  • 「Arxiv 2025.04.10」Breaking the Barriers: Video Vision Transformers for Word-Level Sign Language Recognition. [Paper]
  • 「Arxiv 2025.04.23」SSLR: A Semi-Supervised Learning Method for Isolated Sign Language Recognition. [Paper]

Continous Sign Language Recognition

Conference

  • 【ACMMM 2023】AdaBrowse: Adaptive Video Browser for Efficient Continuous Sign Language Recognition. [Paper]
  • 【ACMMM 2023】Towards Real-Time Sign Language Recognition and Translation on Edge Devices. [Paper]
  • 【ICCV 2023】CoSign: Exploring Co-occurrence Signals in Skeleton-based Continuous Sign Language Recognition. [Paper]
  • 【ICCV 2023】Improving Continuous Sign Language Recognition with Cross-Lingual Signs. [Paper]
  • 【ICCV 2023】C2ST: Cross-modal Contextualized Sequence Transduction for Continuous Sign Language Recognition. [Paper]
  • 【EMNLP Findings 2023]】Handshape-Aware Sign Language Recognition: Extended Datasets and Exploration of Handshape-Inclusive Methods. [paper]
  • 【AAAI 2023】Self-Emphasizing Network for Continuous Sign Language Recognition. [Paper] [Code]
  • 【CVPR 2023】CVT-SLR: Contrastive Visual-Textual Transformation for Sign Language Recognition with Variational Alignment. [Paper] [Code]
  • 【CVPR 2023】Continuous Sign Language Recognition with Correlation Network. [Paper] [Code]
  • 【CVPR 2023】Distilling Cross-Temporal Contexts for Continuous Sign Language Recognition. [Paper]
  • 【ECCV 2024】 EvSign: Sign Language Recognition and Translation with Streaming Events. [paper]
  • 【EMNLP 2024】Towards Online Continuous Sign Language Recognition and Translation. [paper]]
  • 【AAAI 2024】KD-MSLRT: Lightweight Sign Language Recognition Model Based on Mediapipe and 3D to 1D Knowledge Distillation. [Paper]
  • 【AAAI 2024】Cross-Sentence Gloss Consistency for Continuous Sign Language Recognition. [Paper]
  • 【AAAI 2024】TCNet: Continuous Sign Language Recognition from Trajectories and Correlated Regions. [Paper]
  • 【CVPR 2024】SignGraph: A Sign Sequence is Worth Graphs of Nodes. [Paper] [code]
  • 【IJCAI 2023】Contrastive Learning for Sign Language Recognition and Translation. [Paper]

Submittion

  • 【ICLR 2024】SignKD: Multi-modal Hierarchical Knowledge Distillation for Continuous Sign Language Recognition. [Paper]

Journal

  • (TMM 2023)Prior-Aware Cross Modality Augmentation Learning for Continuous Sign Language Recognition. [Paper]

  • (TETCI 2024)Spatial Temporal Aggregation for Efficient Continuous Sign Language Recognition. [Paper]

  • (TIP 2024)Gloss Prior Guided Visual Feature Learning for Continuous Sign Language Recognition. [Paper]

  • (TCSVT 2023)Spatial-Temporal Enhanced Network for Continuous Sign Language Recognition. [Paper]

  • (TMM 2023)Collaborative Multilingual Continuous Sign Language Recognition: A Unified Framework. [Paper]

  • (TMM 2024)A Sign Language Recognition Framework Based on Cross-Modal Complementary Information Fusion. [Paper]

  • (PR 2024)Scalable Frame Resolution for Efficient Continuous Sign Language Recognition. [Paper]

  • (PR 2023)Multi-scale local-temporal similarity fusion for continuous sign language recognition. [Paper]

  • (TPAMI 2025)MixSignGraph: A Sign Sequence is Worth Mixed Graphs of Nodes. [Paper]

Preprint

  • 「Arxiv 2024.01.22」SignVTCL: Multi-Modal Continuous Sign Language Recognition Enhanced by Visual-Textual Contrastive Learning. [Paper]
  • 「Arxiv 2024.02.29」Continuous Sign Language Recognition Based on Motor attention mechanism and frame-level Self-distillation. [Paper]
  • 「Arxiv 2024.04.12」Improving Continuous Sign Language Recognition with Adapted Image Models [Paper] [Code]
  • 「Arxiv 2024.04.17」CorrNet+: Sign Language Recognition and Translation via Spatial-Temporal Correlation [Paper] [Code]
  • #「Arxiv 2024.04.21」Stream State-tying for Sign Language Recognition. [Paper]
  • 「Arxiv 2024.05.02」A Hong Kong Sign Language Corpus Collected from Sign-interpreted TV News [Paper]
  • 「Arxiv 2024.05.16」A Tale of Two Languages: Large-Vocabulary Continuous Sign Language Recognition from Spoken Language Supervision. [Paper]
  • 「Arxiv 2024.05.20」Continuous Sign Language Recognition with Adapted Conformer via Unsupervised Pretraining. [Paper]
  • 「Arxiv 2024.06.26」Continuous Sign Language Recognition Using Intra-inter Gloss Attention. [Paper]
  • #「Arxiv 2024.08.14」Sign language recognition based on deep learning and low-cost handcrafted descriptors. [Paper]
  • 「Arxiv 2024.09.02」SCOPE: Sign Language Contextual Processing with Embedding from LLMs [Paper]
  • 「Arxiv 2024.09.18」A Chinese Continuous Sign Language Dataset Based on Complex Environments. [Paper]
  • #「Arxiv 2024.11.07」Continuous Sign Language Recognition System using Deep Learning with MediaPipe Holistic. [Paper]
  • 「Arxiv 2025.03.11」OLMD: Orientation-aware Long-term Motion Decoupling for Continuous Sign Language Recognition. [Paper]
  • 「Arxiv 2025.03.21」Stack Transformer Based Spatial-Temporal Attention Model for Dynamic Multi-Culture Sign Language Recognition [Paper]
  • 「Arxiv 2025.04.02」CLIP-SLA: Parameter-Efficient CLIP Adaptation for Continuous Sign Language Recognition. [Paper]
  • 「Arxiv 2025.04.22」SignX: The Foundation Model for Sign Recognition. [Paper]

Sign Language Translation

Conference

  • 【ICLR 2023】SLTUNET: A Simple Unified Model for Sign Language Translation. [paper] [Code]

  • 【ACL 2023】Gloss-Free End-to-End Sign Language Translation. [Paper] [Code]

  • #【ACL 2023】Neural Machine Translation Methods for Translating Text to Sign Language Glosses. [Paper]

  • #【ACL 2023】Considerations for meaningful sign language machine translation based on glosses. [Paper]

  • #【ACL 2023】ISLTranslate: Dataset for Translating Indian Sign Language. [Paper] [Code]

  • 【EMNLP 2023】Cross-modality Data Augmentation for End-to-End Sign Language Translation. [paper] [Code]

  • 【NeurIPS 2023】YouTube-ASL: A Large-Scale, Open-Domain American Sign Language-English Parallel Corpus. [paper]

  • 【NeurIPS 2023】Auslan-Daily: Australian Sign Language Translation for Daily Communication and News. [Paper]]

  • 【CVPRW 2023】Sign Language Translation from Instructional Videos. [Paper] [Project] [Code]

  • 【CVPR 2023】Gloss Attention for Gloss-free Sign Language Translation. [Paper] [Code]

  • 【ICCV 2023】Sign Language Translation with Iterative Prototype. [Paper]

  • 【ICCV 2023】Gloss-free Sign Language Translation: Improving from Visual-Language Pretraining. [paper] [Code]

  • 【ACL 2024】Sign Language Translation with Sentence Embedding Supervision. [paper]

  • 【ACL 2024】Unsupervised Sign Language Translation and Generation. [Paper]

  • 【ICLR 2024】Sign2GPT: Leveraging Large Language Models for Gloss-Free Sign Language Translation. [paper]

  • 【AAAI 2024】Conditional Variational Autoencoder for Sign Language Translation with Cross-Modal Alignment. [paper] [Code]

  • 【LREC-COLING 2024】Factorized Learning Assisted with Large Language Model for Gloss-free Sign Language Translation. [paper]

  • 【ACL 2024】Towards Privacy-Aware Sign Language Translation at Scale. [Paper] [Code]

  • 【CVPR 2024】LLMs are Good Sign Language Translators. [paper]

  • 【NeurIPS 2024】Improving Gloss-free Sign Language Translation by Reducing Representation Density. [paper] [code]

  • 【NeurIPS 2024】Scaling Sign Language Translation. [Paper]

  • 【NeurIPS 2024】MM-WLAuslan: Multi-View Multi-Modal Word-Level Australian Sign Language Recognition Dataset. [Paper]

  • 【ECCV 2024】A Simple Baseline for Spoken Language to Sign Language Translation with 3D Avatars. [Paper]

  • 【ECCV 2024】Visual Alignment Pre-training for Sign Language Translation. [Paper]

  • 【IJCAI 2024】Efficient Sign Language Translation with a Curriculum-based Non-autoregressive Decoder. [Paper]

  • 【ICLR 2025】YouTube-SL-25: A Large-Scale, Open-Domain Multilingual Sign Language Parallel Corpus. [Paper]

Submittion

  • 【NeurIPS 2023】Towards Faithful Sign Language Translation. [Paper]
  • 【ICLR 2025】Hybrid Model Collaboration For Sign Language Translation With VQ-VAE And RAG Enhanced LLMS. [Paper]

Journal

  • (TPAMI 2023)SignNet II: A Transformer-Based Two-Way Sign Language Translation Model. [Paper]
  • (TCSVT 2024)Improving End-to-End Sign Language Translation With Adaptive Video Representation Enhanced Transformer. [Paper]
  • (TCSVT 2024)Overcoming Modality Bias in Question-Driven Sign Language Video Translation. [Paper]

Preprint

  • 「Arxiv 2024.02.11」American Sign Language Video to Text Translation. [Paper]
  • 「Arxiv 2024.02.14」Towards Privacy-Aware Sign Language Translation at Scale. [Paper]
  • 「Arxiv 2024.03.19」Factorized Learning Assisted with Large Language Model for Gloss-free Sign Language Translation. [Paper]
  • 「Arxiv 2024.05.09」Multi-Stream Keypoint Attention Network for Sign Language Recognition and Translation. [Paper]
  • 「Arxiv 2024.06.10」SignBLEU: Automatic Evaluation of Multi-channel Sign Language Translation. [Paper]
  • 「Arxiv 2024.06.11」SignMusketeers: An Efficient Multi-Stream Approach for Sign Language Translation at Scale. [Paper]
  • 「Arxiv 2024.06.16」Reconsidering Sentence-Level Sign Language Translation. [Paper]
  • 「Arxiv 2024.07.12」Gloss2Text: Sign Language Gloss translation using LLMs and Semantically Aware Label Smoothing. [Paper]
  • 「Arxiv 2024.07.23」E-TSL: A Continuous Educational Turkish Sign Language Dataset with Baseline Methods. [Paper]
  • 「Arxiv 2024.08.13」Fingerspelling within Sign Language Translation. [Paper]
  • 「Arxiv 2024.08.19」C2RL: Content and Context Representation Learning for Gloss-free Sign Language Translation and Retrieval. [Paper]
  • 「Arxiv 2024.08.20」Event Stream based Sign Language Translation: A High-Definition Benchmark Dataset and A New Algorithm. [Paper]
  • 「Arxiv 2024.08.27」From Rule-Based Models to Deep Learning Transformers Architectures for Natural Language Processing and Sign Language Translation Systems: Survey, Taxonomy and Performance Evaluation. [Paper]
  • 「Arxiv 2024.09.03」Less is more: concatenating videos for Sign Language Translation from a small set of signs. [Paper]
  • 「Arxiv 2024.09.15」ELMI: Interactive and Intelligent Sign Language Translation of Lyrics for Song Signing. [Paper]
  • 「Arxiv 2024.09.17」American Sign Language to Text Translation using Transformer and Seq2Seq with LSTM. [Paper]
  • 「Arxiv 2024.10.01」Advanced Arabic Alphabet Sign Language Recognition Using Transfer Learning and Transformer Models. [Paper]
  • 「Arxiv 2024.10.18」SignAttention: On the Interpretability of Transformer Models for Sign Language Translation. [Paper]
  • 「Arxiv 2024.10.25」Diverse Sign Language Translation. [Paper]
  • 「Arxiv 2024.11.04」A Spatio-Temporal Representation Learning as an Alternative to Traditional Glosses in Sign Language Translation and Production. [Paper]
  • 「Arxiv 2024.11.15」An Efficient Sign Language Translation Using Spatial Configuration and Motion Dynamics with LLMs. [Paper]
  • 「Arxiv 2024.11.19」Enhanced Sign Language Translation between American Sign Language (ASL) and Indian Sign Language (ISL) Using LLMs. [Paper]
  • 「Arxiv 2024.11.19」Signformer is all you need: Towards Edge AI for Sign Language. [Paper]
  • 「Arxiv 2024.11.25」SHuBERT: Self-Supervised Sign Language Representation Learning via Multi-Stream Cluster Prediction. [Paper]
  • 「Arxiv 2024.11.25」Leveraging the Power of MLLMs for Gloss-Free Sign Language Translation. [Paper]
  • 「Arxiv 2024.12.21」Real-time Bangla Sign Language Translator. [Paper]
  • 「Arxiv 2024.12.21」LLaVA-SLT: Visual Language Tuning for Sign Language Translation. [Paper]
  • 「Arxiv 2024.12.24」Improvement in Sign Language Translation Using Text CTC Alignment. [Paper]
  • 「Arxiv 2025.02.04」 Spatio-temporal transformer to support automatic sign language translation. [Paper]
  • 「Arxiv 2025.02.17」GLoT: A Novel Gated-Logarithmic Transformer for Efficient Sign Language Translation. [Paper]
  • 「Arxiv 2025.03.03」Co-creation for Sign Language Processing and Machine Translation. [Paper]
  • 「Arxiv 2025.03.09」Sign Language Translation using Frame and Event Stream: Benchmark Dataset and Algorithms. [Paper]
  • 「Arxiv 2025.03.25」A multitask transformer to sign language translation using motion gesture primitives. [Paper]
  • 「Arxiv 2025.04.03」State-of-the-Art Translation of Text-to-Gloss using mBART : A case study of Bangla. [Paper]
  • 「Arxiv 2025.04.16」ADAT: Time-Series-Aware Adaptive Transformer Architecture for Sign Language Translation. [Paper]

Fingerspelling Recognition

Preprint

  • 「Arxiv 2024.08.17」An Open-Source American Sign Language Fingerspell Recognition and Semantic Pose Retrieval Interface. [Paper]
  • 「Arxiv 2024.11.23」AzSLD: Azerbaijani Sign Language Dataset for Fingerspelling, Word, and Sentence Translation with Baseline Software. [Paper]
  • 「Arxiv 2025.02.15」SpellRing: Recognizing Continuous Fingerspelling in American Sign Language using a Ring. [Paper]

Sign Language Prodiction

Conference

  • 【ECCV 2024】Pose Guided Fine-Grained Sign Language Video Generation. [Paper]
  • 【ECCV 2024】SignGen: End-to-End Sign Language Video Generation with Latent Diffusion. [Paper]
  • 【ACL 2024】T2S-GPT: Dynamic Vector Quantization for Autoregressive Sign Language Production from Text. [Paper]
  • 【BMVC 2024】Sign Stitching: A Novel Approach to Sign Language Production. [Paper]
  • 【CVPR 2024】Neural Sign Actors: A Diffusion Model for 3D Sign Language Production from Text. [Paper]

Submittion

  • 【ICLR 2024】NaturalSigner: Diffusion Models are Natural Sign Language Generator. [Paper]
  • 【ICLR 2025】DiffSign: AI-Assisted Generation of Customizable Sign Language Videos With Enhanced Realism. [Paper]

Preprint

  • 「Arxiv 2024.04.17」Select and Reorder: A Novel Approach for Neural Sign Language Production. [Paper]
  • 「Arxiv 2024.12.07」SignAvatar: Sign Language 3D Motion Reconstruction and Generation. [Paper]
  • 「Arxiv 2024.05.16」Diversity-Aware Sign Language Production through a Pose Encoding Variational Autoencoder. [Paper]
  • 「Arxiv 2024.07.04」MS2SL: Multimodal Spoken Data-Driven Continuous Sign Language Production. [Paper]
  • 「Arxiv 2024.11.26」DiffSLT: Enhancing Diversity in Sign Language Translation via Diffusion Model. [Paper]
  • 「Arxiv 2024.11.26」Signs as Tokens: An Autoregressive Multilingual Sign Language Generator. [Paper]
  • 「Arxiv 2024.11.29」SignLLM: Sign Language Production Large Language Models. [Paper]
  • 「Arxiv 2024.12.19」Sign-IDD: Iconicity Disentangled Diffusion for Sign Language Production. [Paper] [Code]
  • 「Arxiv 2024.12.22」Linguistics-Vision Monotonic Consistent Network for Sign Language Production. [Paper]
  • 「Arxiv 2025.01.01」Beyond Words: AuralLLM and SignMST-C for Precise Sign Language Production and Bidirectional Accessibility. [Paper]
  • 「Arxiv 2025.01.12」Comparison of Autoencoders for tokenization of ASL datasets. [Paper]
  • 「Arxiv 2025.02.08」Towards AI-driven Sign Language Generation with Non-manual Markers. [Paper]
  • 「Arxiv 2025.03.04」A Transformer-Based Framework for Greek Sign Language Production using Extended Skeletal Motion Representations. [Paper]
  • 「Arxiv 2025.03.20」Text-Driven Diffusion Model for Sign Language Production. [Paper]
  • 「Arixv 2025.04.09」Disentangle and Regularize: Sign Language Production with Articulator-Based Disentanglement and Channel-Aware Regularization. [Paper]

Sign Language Understanding

Conference

  • 【ICLR 2025】Uni-Sign: Toward Unified Sign Language Understanding at Scale. [Paper] [Code]
  • 【ECCV 2024】SignAvatars: A Large-scale 3D Sign Language Holistic Motion Dataset and Benchmark. [Paper]

Journal

  • (TPAMI 2023)SignBERT+: Hand-Model-Aware Self-Supervised Pre-Training for Sign Language Understanding. [Paper]

Preprint

  • 「Arxiv 2024.08.16」Scaling up Multimodal Pre-training for Sign Language Understanding. [Paper]
  • 「Arxiv 2024.10.07」Studying and Mitigating Biases in Sign Language Understanding Models.[Paper]
  • 「Arxiv 2025.03.11」SignRep: Enhancing Self-Supervised Sign Representations. [Paper]

Sign Language Detection

  • Real Time American Sign Language Detection Using Yolo-v9
  • Enhancing Sign Language Detection through Mediapipe and Convolutional Neural Networks (CNN)
  • A Transformer Model for Boundary Detection in Continuous Sign Language

Sign Language Retrieval

  • 【CVPR 2024】CiCo: Domain-Aware Sign Language Retrieval via Cross-Lingual Contrastive Learning
  • 【ACMMM 2024】SEDS: Semantically Enhanced Dual-Stream Encoder for Sign Language Retrieval
  • SLVideo: A Sign Language Video Moment Retrieval Framework
  • 【ECCV 2024】Uncertainty-aware sign language video retrieval with probability distribution modeling

Sign Language Pretrain

  • SignCLIP: Connecting Text and Sign Language by Contrastive Learning

Sign Language Segmentation

Preprint

  • 「Arxiv 2025.03.05」Deep Understanding of Sign Language for Sign to Subtitle Alignment. [Paper]
  • 「Arxiv 2025.04.11」Hands-On: Segmenting Individual Signs from Continuous Sequences. [Paper]

Sign Language Related

Conference

  • 【EMNLP 2024】ASL STEM Wiki: Dataset and Benchmark for Interpreting STEM Articles. [Paper]
  • 【CVPR 2025】Discrete to Continuous: Generating Smooth Transition Poses from Sign Language Observations. [Paper]
  • 【CVPR 2025】VSNet: Focusing on the Linguistic Characteristics of Sign Language. [Paper]

Submittion

  • 【ICLR 2023】URVoice: An Akl-Toussaint/ Graham- Sklansky Approach towards Convex Hull Computation for Sign Language Interpretation. [Paper]

Preprint

  • 「Arxiv 2023.04.01」Ham2Pose: Animating Sign Language Notation Into Pose Sequences. [Paper]
  • 「Arxiv 2024.06.18」A Comparative Study of Continuous Sign Language Recognition Techniques. [Paper]
  • 「Arxiv 2024.12.02」Real-Time Multilingual Sign Language Processing. [Paper]
  • 「Arxiv 2024.12.11」2M-BELEBELE: Highly Multilingual Speech and American Sign Language Comprehension Dataset. [Paper]
  • 「Arxiv 2025.03.09」Virtual Co-presenter: Connecting Deaf and Hard-of-hearing Livestreamers and Hearing audience in E-commerce Livestreaming. [Paper]
  • 「Arxiv 2025.04.04」See-Through Face Display for DHH People: Enhancing Gaze Awareness in Remote Sign Language Conversations with Camera-Behind Displays. [Paper]
  • 「Arxiv 2025.04.08」Towards an AI-Driven Video-Based American Sign Language Dictionary: Exploring Design and Usage Experience with Learners. [Paper]

Video Understanding

Conference

  • 【NeurIPS 2024】ShareGPT4Video: Improving Video Understanding and Generation with Better Captions. [Paper]

  • 【NeurIPS 2024】Animal-Bench: Benchmarking Multimodal Video Models for Animal-centric Video Understanding. [Paper]

  • 【NeurIPS 2024】TOPA: Extending Large Language Models for Video Understanding via Text-Only Pre-Alignment. [Paper]

  • 【NeurIPS 2024】Streaming Long Video Understanding with Large Language Models. [Paper]

  • 【NeurIPS 2024】VERIFIED: A Video Corpus Moment Retrieval Benchmark for Fine-Grained Video Understanding. [Paper]

  • 【NeurIPS 2024】Video Token Merging for Long Video Understanding. [Paper]

  • 【NeurIPS 2024】MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding. [Paper]

  • 【ICLR 2025】TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning. [Paper]

  • 【ICLR 2025】CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding. [[Paper](CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding)]

  • 【ICLR 2025】VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks. [Paper]

  • 【ICLR 2025】SVBench: A Benchmark with Temporal Multi-Turn Dialogues for Streaming Video Understanding. [Paper]

  • 【ICLR 2025】Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge. [Paper]

  • 【CVPR 2025】STOP: Integrated Spatial-Temporal Dynamic Prompting for Video Understanding

  • 【CVPR 2025】MMVU: Measuring Expert-Level Multi-Discipline Video Understanding

  • 【CVPR 2025】OVBench: How Far is Your Video-LLMs from Real-World Online Video Understanding?

  • 【CVPR 2025】Towards Vision Language Models For Extra-Long Video Understanding

  • 【CVPR 2025】VideoICL: Confidence-based Iterative In-context Learning for Out-of-Distribution Video Understanding

  • 【CVPR 2025】ViCaS: A Dataset for Combining Holistic and Pixel-level Video Understanding using Captions with Grounded Segmentation

  • 【CVPR 2025】Towards Universal Soccer Video Understanding

  • 【CVPR 2025】Adaptive Keyframe Sampling for Long Video Understanding

  • 【CVPR 2025】BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understanding

  • 【CVPR 2025】MLVU: Benchmarking Multi-task Long Video Understanding

  • 【CVPR 2025】DrVideo: Document Retrieval Based Long Video Understanding

  • 【CVPR 2025】VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation

  • 【CVPR 2025】Mamba4D: Efficient 4D Point Cloud Video Understanding with Disentangled Spatial-Temporal State Space Models

  • 【CVPR 2025】DynFocus: Dynamic Cooperative Network Empowers LLMs with Video Understanding

  • 【CVPR 2025】VidHalluc: Evaluating Temporal Hallucinations in Multimodal Large Language Models for Video Understanding

  • 【CVPR 2025】Adapting Pre-trained 3D Models for Point Cloud Video Understanding via Cross-frame Spatio-temporal Perception

  • 【CVPR 2025】Re-thinking Temporal Search for Long-Form Video Understanding

  • 【CVPR 2025】Apollo: An Exploration of Video Understanding in Large Multi-Modal Models

  • 【CVPR 2025】HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding

  • 【CVPR 2025】M-LLM Based Video Frame Selection for Efficient Video Understanding

  • 【CVPR 2025】VideoChat-Online: Towards Online Spatial-Temporal Video Understanding via Large Video Language Models

Video Classification

Large-scale Video Classification with Convolutional Neural Networks

Beyond Short Snippets: Deep Networks for Video Classification

Conference

  • 【ICLR 2023】Temporal Coherent Test Time Optimization for Robust Video Classification. [Paper]

Action Recognition

Two-Stream Convolutional Networks for Action Recognition in Videos

Learning Spatiotemporal Features with 3D Convolutional Networks

Convolutional Two-Stream Network Fusion for Video Action Recognition

Temporal Segment Networks: Towards Good Practices for Deep Action Recognition

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

A Closer Look at Spatiotemporal Convolutions for Action Recognition

Non-local Neural Networks

SlowFast Networks for Video Recognition

Is Space-Time Attention All You Need for Video Understanding?

Conference

  • 【ICLR 2023】Graph Contrastive Learning for Skeleton-based Action Recognition. [Paper]
  • 【ICLR 2023】AIM: Adapting Image Models for Efficient Video Action Recognition. [Paper]
  • 【ICML 2024】Memory Consolidation Enables Long-Context Video Understanding. [Paper]
  • 【ICML 2024】VideoPrism: A Foundational Visual Encoder for Video Understanding. [Paper]
  • 【NeurIPS 2024】CHASE: Learning Convex Hull Adaptive Shift for Skeleton-based Multi-Entity Action Recognition. [Paper]
  • 【NeurIPS 2024】ActionAtlas: A VideoQA Benchmark for Domain-specialized Action Recognition. [Paper]
  • 【NeurIPS 2024】Recovering Complete Actions for Cross-dataset Skeleton Action Recognition. [Paper]
  • 【ICLR 2024】SpikePoint: An Efficient Point-based Spiking Neural Network for Event Cameras Action Recognition. [Paper]
  • 【ICLR 2024】FROSTER: Frozen CLIP is A Strong Teacher for Open-Vocabulary Action Recognition. [Paper]
  • 【ICLR 2025】TASAR: Transfer-based Attack on Skeletal Action Recognition. [Paper]
  • 【ICLR 2025】ALBAR: Adversarial Learning approach to mitigate Biases in Action Recognition. [Paper]
  • 【CVPR 2025】Semantic-guided Cross-Model Prompt Learning for skeleton-based zero-shot action recognition
  • 【CVPR 2025】Neuron : Learning Context-Aware Evolving Representations for Zero-Shot Skeleton Action Recognition
  • 【CVPR 2025】Revealing Key Details to See Differences: A Novel Prototypical Perspective for Skeleton-based Action Recognition
  • 【CVPR 2025】TAMT: Temporal-Aware Model Tuning for Cross-Domain Few-Shot Action Recognition
  • 【CVPR 2025】Temporal Alignment-Free Video Matching for Few-shot Action Recognition
  • 【CVPR 2025】Are Spatial-Temporal Graph Convolution Networks for Human Action Recognition Over-Parameterized?

Sequence Modeling

Conference

  • 【ICML 2023】Sequence Modeling with Multiresolution Convolutional Memory. [Paper]
  • 【ICML 2023】Simple Hardware-Efficient Long Convolutions for Sequence Modeling. [Paper]
  • 【ICML 2023】CAB: Comprehensive Attention Benchmarking on Long Sequence Modeling. [Paper]
  • 【ICLR 2023】Simplified State Space Layers for Sequence Modeling. [Paper]
  • 【ICLR 2023】Learning Cut Selection for Mixed-Integer Linear Programming via Hierarchical Sequence Model. [[Paper](Learning Cut Selection for Mixed-Integer Linear Programming via Hierarchical Sequence Model)]
  • 【ICLR 2023】Bayes risk CTC: Controllable CTC alignment in Sequence-to-Sequence tasks. [Paper]
  • 【ICLR 2023】Planning with Sequence Models through Iterative Energy Minimization. [Paper]
  • 【ICLR 2023】Sub-Task Dec 6DAF omposition Enables Learning in Sequence to Sequence Tasks. [Paper]
  • 【ICLR 2023】ChordMixer: A Scalable Neural Attention Model for Sequences with Different Length. [Paper]
  • 【ICLR 2023】Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task. [Paper]
  • 【ICLR 2023】What Makes Convolutional Models Great on Long Sequence Modeling? [Paper]
  • 【ICLR 2023】Multiple sequence alignment as a sequence-to-sequence learning problem. [Paper]
  • 【ICLR 2023】Data Continuity Matters: Improving Sequence Modeling with Lipschitz Regularizer. [Paper]
  • 【ICLR 2023】Continuous-Discrete Convolution for Geometry-Sequence Modeling in Proteins. [Paper]
  • 【ICLR 2023】Toeplitz Neural Network for Sequence Modeling. [Paper]
  • 【ICML 2024】Hierarchical State Space Models for Continuous Sequence-to-Sequence Modeling. [Paper]
  • 【ICML 2024】Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling. [Paper]
  • 【ICML 2024】Reinformer: Max-Return Sequence Modeling for Offline RL. [Paper]
  • 【ICML 2024】VQDNA: Unleashing the Power of Vector Quantization for Multi-Species Genomic Sequence Modeling. [Paper]
  • 【ICML 2024】Never Train from Scratch: Fair Comparison of Long-Sequence Models Requires Data-Driven Priors. [Paper]
  • 【ICML 2024】FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores. [Paper]
  • 【ICML 2024】IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs. [Paper]
  • 【ICML 2024】Traveling Waves Encode The Recent Past and Enhance Sequence Learning. [Paper]
  • 【ICML 2024】SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking. [Paper]
  • 【ICML 2024】Robustifying State-space Models for Long Sequences via Approximate Diagonalization. [Paper]
  • 【ICML 2024】Parallelizing non-linear sequential models over the sequence length. [Paper]
  • 【NeurIPS 2024】Understanding the Expressive Power and Mechanisms of Transformer for Sequence Modeling. [Paper]
  • 【NeurIPS 2024】Approximation Rate of the Transformer Architecture for Sequence Modeling. [Paper]
  • 【NeurIPS 2024】MambaLRP: Explaining Selective State Space Sequence Models. [Paper]
  • 【NeurIPS 2024】Mini-Sequence Transformers: Optimizing Intermediate Memory for Long Sequences Training. [Paper]
  • 【NeurIPS 2024】Improving Adaptivity via Over-Parameterization in Sequence Models. [Paper]
  • 【NeurIPS 2024】Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum. [Paper]
  • 【NeurIPS 2024】3DET-Mamba: Causal Sequence Modelling for End-to-End 3D Object Detection. [Paper]
  • 【NeurIPS 2024】Reparameterized Multi-Resolution Convolutions for Long Sequence Modelling. [Paper]
  • 【NeurIPS 2024】Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling. [Paper]
  • 【NeurIPS 2024】Parallelizing Linear Transformers with the Delta Rule over Sequence Length. [Paper]
  • 【NeurIPS 2024】Gated Slot Attention for Efficient Linear-Time Sequence Modeling. [Paper]
  • 【NeurIPS 2024】Meta-DT: Offline Meta-RL as Conditional Sequence Modeling with World Model Disentanglement. [Paper]
  • 【ICLR 2025】Tackling Data Corruption in Offline Reinforcement Learning via Sequence Modeling. [Paper]
  • 【ICLR 2025】Limits of Deep Learning: Sequence Modeling through the Lens of Complexity Theory. [Paper]
  • 【ICLR 2025】Samba: Synchronized Set-of-Sequences Modeling for Multiple Object Tracking. [Paper]
  • 【ICLR 2025】Why RoPE Struggles to Maintain Long-Term Decay in Long Sequences? [Paper]
  • 【ICLR 2025】FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference. [Paper]
  • 【ICLR 2025】Bio-xLSTM: Generative modeling, representation and in-context learning of biological and chemical sequences. [Paper]
  • 【ICLR 2025】Flash Inference: Near Linear Time Inference for Long Convolution Sequence Models and Beyond. [Paper]
  • 【ICLR 2025】mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models. [[Paper](mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models)]
  • 【CVPR 2025】Image Re-ranking with Long-Context Sequence Modeling
  • 【CVPR 2025】Parallel Sequence Modeling via Generalization Spatial Propagation Network
  • 【CVPR 2025】Bridging Gait Recognition and Large Language Models Sequence Modeling
  • 【CVPR 2025】Contextual AD Narration with Interleaved Multimodal Sequence
  • 【CVPR 2025】HOIGPT: Learning Long Sequence Hand-Object Interaction with Language Models
  • 【CVPR 2025】Advancing Semantic Future Prediction through Multimodal Visual Sequence Transformers
  • 【CVPR 2025】DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos
  • 【CVPR 2025】KeyFace: Expressive Audio-Driven Facial Animation for Long Sequences via KeyFrame Interpolation

[Arxiv 2018] An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling

Gated Linear Attention Transformers with Hardware-Efficient Training

Long Short-Term Memory

DeltaProduct: Increasing the Expressivity of DeltaNet Through Products of Householders

An Uncertainty Principle for Linear Recurrent Neural Networks

SSMLoRA: Enhancing Low-Rank Adaptation with State Space Model

【NeurIPS 2024】Parallelizing Linear Transformers with the Delta Rule over Sequence Length. [Paper]

「Arxiv 2024.08.11」Learning to (Learn at Test Time): RNNs with Expressive Hidden States. [Paper] [Code]

「Arxiv 2024.08.27」Gated Linear Attention Transformers with Hardware-Efficient Training. [Paper]

「Arxiv 2024.11.09」Gated Delta Networks: Improving Mamba2 with Delta Rule. [Paper] [Code]

Temporal Action Segmentation

  1. 【Fully-Supervised】【CVPR 2024】FACT: Frame-Action Cross-Attention Temporal Modeling for Efficient Fully-Supervised Action Segmentation

  2. 【科学问题】如何使帧和动作交互

  3. 【解决方案】训练一组 Action Token,使用注意力机制进行帧到动作和动作到帧的跨注意力

  4. 【Fully-Supervised】【NeurIPS 2024】Activity Grammars for Temporal Action Segmentation

  5. Diffusion Action Segmentation

  6. 【科学问题】如何建模超长视频语义

  7. 【解决方案】分为两支,一支用 local window attention 计算建模局部,一支采样帧,做全局建模

  8. 【Fully-Supervised】【ICCV 2023】How Much Temporal Long-Term Context is Needed for Action Segmentation?

  9. 【Fully-Supervised】【ISKE 2023】Streaming Video Temporal Action Segmentation In Real Time

  10. 【Fully-Supervised】【NeurIPS 2022】Don't Pour Cereal into Coffee: Differentiable Temporal Logic for Temporal Action Segmentation

  11. 【Fully-Supervised】【PR 2022】Maximization and Restoration: Action Segmentation through Dilation Passing and Temporal Reconstruction

  12. 【Fully-Supervised】【ICIP 2022】Mcfm: Mutual Cross Fusion Module for Intermediate Fusion-Based Action Segmentation

  13. 【Fully-Supervised】【IVC 2022】Multistage temporal convolution transformer for action segmentation

  14. 【Fully-Supervised】【Arxiv 2022】Semantic2Graph: Graph-based Multi-modal Feature Fusion for Action Segmentation in Videos

  15. 【Fully-Supervised】【IJCAI 2022】Uncertainty-Aware Representation Learning for Action Segmentation

  16. 【Fully-Supervised】【ECCV 2022】Unified Fully and Timestamp Supervised Temporal Action Segmentation via Sequence to Sequence Translation

  17. 【Fully-Supervised】【BMVC 2021】ASFormer: Transformer for Action Segmentation

  18. 【Fully-Supervised】【Arxiv 2021】Coarse to Fine Multi-Resolution Temporal Convolutional Network

  19. 【Fully-Supervised】【GCPR 2021】FIFA: Fast Inference Approximation for Action Segmentation

  20. 【Fully-Supervised】【CVPR 2021】Global2Local: Efficient Structure Search for Video Action Segmentation

  21. 【Weakly-Supervised】【CVPR 2024】Efficient and Effective Weakly-Supervised Action Segmentation via Action-Transition-Aware Boundary Alignment

  22. 【科学问题】如何理解动作过度

  23. 【Weakly-Supervised】【IROS 2023】Is Weakly-supervised Action Segmentation Ready For Human-Robot Interaction ? No , Let ’ s Improve It With Action-union Learning

  24. 【科学问题】如何从时间戳中生成其他帧的伪标签

  25. 【解决问题】如何更好利用时间戳

  26. 【Weakly-Supervised】【CVPR 2023】Reducing the Label Bias for Timestamp Supervised Temporal Action Segmentation

  27. 【Weakly-Supervised】【IJCAI 2023】Timestamp-Supervised Action Segmentation in the Perspective of Clustering

  28. 【Weakly-Supervised】【ECCV 2022】A Generalized & Robust Framework For Timestamp Supervision in Temporal Action Segmentation

  29. 【Weakly-Supervised】【WACV 2022】Hierarchical Modeling for Task Recognition and Action Segmentation in Weakly-Labeled Instructional Videos

  30. 【Weakly-Supervised】【BMVC 2022】Robust Action Segmentation from Timestamp Supervision

  31. 【Weakly-Supervised】【CVPR 2022】Semi-Weakly-Supervised Learning of Complex Actions from Instructional Task Videos

  32. 【Weakly-Supervised】【CVPR 2022】Set-Supervised Action Learning in Procedural Task Videos via Pairwise Order Consistency

  33. 【Weakly-Supervised】【TMM 2022】Temporal Action Segmentation with High-level Complex Activity Labels

  34. 【Weakly-Supervised】【IROS 2022】Timestamp-Supervised Action Segmentation with Graph Convolutional Networks

  35. 【Weakly-Supervised】【ICME 2022】Turning to a Teacher for Timestamp Supervised Temporal Action Segmentation

  36. 【Weakly-Supervised】【ECCV 2022】Unified Fully and Timestamp Supervised Temporal Action Segmentation via Sequence to Sequence Translation

  37. 【Weakly-Supervised】【CVPR 2022】Weakly-Supervised Online Action Segmentation in Multi-View Instructional Videos

  38. 【Weakly-Supervised】【CVPR 2021】Anchor-Constrained Viterbi for Set-Supervised Action Segmentation

  39. 【Weakly-Supervised】【TPAMI 2021】Fast Weakly Supervised Action Segmentation Using Mutual Consistency

  40. 【Weakly-Supervised】【CVPR 2021】Learning Discriminative Prototypes with Dynamic Time Warping

  41. 【Weakly-Supervised】【CVPR 2021】Temporal Action Segmentation from Timestamp Supervision

  42. 【Weakly-Supervised】【CVPR 2021】Weakly-Supervised Action Segmentation and Alignment via Transcript-Aware Union-of-Subspaces Learning

  43. 【Weakly-Supervised】【AAAI 2021】Weakly-supervised Temporal Action Localization by Uncertainty Modeling

  44. 【Unsupervised】【CVPR 2024】Temporally Consistent Unbalanced Optimal Transport for Unsupervised Action Segmentation

Automatic Speech Recognition

2024.11.00 ~ 2024.09.20

  1. 【Arxiv 2024】Unified Speech Recognition: A Single Model for Auditory, Visual, and Audiovisual Inputs
  2. 【TASLP 2024】Optimizing Contextual Speech Recognition Using Vector Quantization for Efficient Retrieval
  3. 【Arxiv 2024】Augmenting Polish Automatic Speech Recognition System With Synthetic Data
  4. 【Arxiv 2024】Using Confidence Scores to Improve Eyes-free Detection of Speech Recognition Errors
  5. 【Arxiv 2024】A Survey on Speech Large Language Models
  6. 【Arxiv 2024】We Augmented Whisper With kNN and You Won't Believe What Came Next
  7. 【EMNLP 2024 Findings】STTATTS: Unified Speech-To-Text And Text-To-Speech Model
  8. 【Arxiv 2024】Evaluating and Improving Automatic Speech Recognition Systems for Korean Meteorological Experts
  9. 【Arxiv 2024】Improving Automatic Speech Recognition with Decoder-Centric Regularisation in Encoder-Decoder Models
  10. 【Arxiv 2024】Enhancing Low-Resource ASR through Versatile TTS: Bridging the Data Gap
  11. 【Arxiv 2024】DENOASR: Debiasing ASRs through Selective DenoisingTemporal Action Segmentation
  12. 【Arxiv 2024】End-to-End Transformer-based Automatic Speech Recognition for Northern Kurdish: A Pioneering Approach
  13. 【Arxiv 2024】Moonshine: Speech Recognition for Live Transcription and Voice Commands
  14. 【Arxiv 2024】Enhancing Multimodal Sentiment Analysis for Missing Modality through Self-Distillation and Unified Modality Cross-Attention
  15. 【Arxiv 2024】AC-Mix: Self-Supervised Adaptation for Low-Resource Automatic Speech Recognition using Agnostic Contrastive Mixup
  16. 【Arxiv 2024】A two-stage transliteration approach to improve performance of a multilingual ASR
  17. 【Arxiv 2024】Parameter-efficient Adaptation of Multilingual Multimodal Models for Low-resource ASR
  18. 【Arxiv 2024】Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation
  19. 【AAAI-FSS 2024】Automatic Screening for Children with Speech Disorder using Automatic Speech Recognition: Opportunities and Challenges
  20. 【Arxiv 2024】In-Materia Speech Recognition
  21. 【Arxiv 2024】Automatic Speech Recognition with BERT and CTC Transformers: A Review
  22. 【Arxiv 2024】Enhancing Indonesian Automatic Speech Recognition: Evaluating Multilingual Models with Diverse Speech Variabilities
  23. **【ICASSP 2024】**Full-Rank No More: Low-Rank Weight Training for Modern Speech Recognition Models
  24. 【Arxiv 2024】Advocating Character Error Rate for Multilingual ASR Evaluation
  25. 【Arxiv 2024】Incorporating Talker Identity Aids With Improving Speech Recognition in Adversarial Environments
  26. 【Arxiv 2024】CR-CTC: Consistency regularization on CTC for improved speech recognition
  27. 【Arxiv 2024】Efficient and Robust Long-Form Speech Recognition with Hybrid H3-Conformer
  28. 【Arxiv 2024】Reverb: Open-Source ASR and Diarization from Rev
  29. 【Arxiv 2024】SeeSay: An Assistive Device for the Visually Impaired Using Retrieval Augmented Generation
  30. 【Arxiv 2024】Efficient Streaming LLM for Speech Recognition
  31. 【Arxiv 2024】Convolutional Variational Autoencoders for Spectrogram Compression in Automatic Speech Recognition
  32. 【Arxiv 2024】Algorithms For Automatic Accentuation And Transcription Of Russian Texts In Speech Recognition Systems
  33. 【Arxiv 2024】Automatic Speech Recognition for the Ika Language
  34. **【EMNLP 224】**VHASR: A Multimodal Speech Recognition System With Vision Hotwords
  35. 【Arxiv 2024】End-to-End Speech Recognition with Pre-trained Masked Language Model
  36. 【ICASSP 2025】Alignment-Free Training for Transducer-based Multi-Talker ASR
  37. 【ICASSP 2025】Mamba for Streaming ASR Combined with Unimodal Aggregation
  38. 【Interspeech 2024】Boosting Hybrid Autoregressive Transducer-based ASR with Internal Acoustic Model Training and Dual Blank Thresholding
  39. 【ICASSP 2025】Alignment-Free Training for Transducer-based Multi-Talker ASR
  40. 【ICASSP 2025】Predictive Speech Recognition and End-of-Utterance Detection Towards Spoken Dialog Systems
  41. 【ICASSP 2025】HDMoLE: Mixture of LoRA Experts with Hierarchical Routing and Dynamic Thresholds for Fine-Tuning LLM-based ASR Models
  42. 【Interspeech 2024】Fine-Tuning Automatic Speech Recognition for People with Parkinson's: An Effective Strategy for Enhancing Speech Technology Accessibility
  43. 【ICASSP 2025】Efficient Long-Form Speech Recognition for General Speech In-Context Learning

Contrastive Learning

SimCLR

BYOL

MoCo v1

MoCo v2

MoCo v3

SimSiam

Dino


Update Log

Conference

ICML 2024

  • Sign Language
  • Video Understanding
  • Sequence Modeling

ICML 2023

  • Sign Language
  • Video Understanding
  • Sequence Modeling

ICLR 2025

  • Sign Language
  • Video Understanding
  • Sequence Modeling

ICLR 2024

  • Sign Language
  • Video Understanding
  • Sequence Modeling

ICLR 2023

  • Sign Language
  • Video Understanding
  • Sequence Modeling

NeruIPS 2024

  • Sign Language
  • Video Understanding
  • Sequence Modeling

NeruIPS 2023

  • Sign Language
  • Video Understanding
  • Sequence Modeling

CVPR 2025

  • Sign language
  • Video Understanding
  • Video Classification
  • Action Recognition
  • Sequence

CVPR 2024

  • Sign Language
  • Video Understanding
  • Sequence Modeling

CVPR 2023

  • Sign Language
  • Video Understanding
  • 9E19 Sequence Modeling

ICCV 2023

  • Sign Language
  • Video Understanding
  • Sequence Modeling

ECCV 2024

  • Sign Language
  • Video Understanding
  • Sequence Modeling

AAAI 2024

  • Sign Language
  • Video Understanding
  • Sequence Modeling

AAAI 2023

  • Sign Language
  • Video Understanding
  • Sequence Modeling

WWW 2024

  • Sign Language
  • Video Understanding
  • Sequence Modeling

WWW 2023

  • Sign Language
  • Video Understanding
  • Sequence Modeling

ACMMM 2024

  • Sign Language
  • Video Understanding
  • Sequence Modeling

ACMMM 2023

  • Sign Language
  • Video Understanding
  • Sequence Modeling

SIGIR 2024

  • Sign Language
  • Video Understanding
  • Sequence Modeling

SIGIR 2023

  • Sign Language
  • Video Understanding
  • Sequence Modeling

IJCAI 2024

  • Sign Language
  • Video Understanding
  • Sequence Modeling

IJCAI 2023

  • Sign Language
  • Video Understanding
  • Sequence Modeling

KDD 2024

  • Sign Language
  • Video Understanding
  • Sequence Modeling

KDD 2023

  • Sign Language
  • Video Understanding
  • Sequence Modeling

TPAMI 2024

  • Sign Language
  • Video Understanding
  • Sequence Modeling

TPAMI 2023

  • Sign Language
  • Video Understanding
  • Sequence Modeling

JMLR 2024

  • Sign Language
  • Video Understanding
  • Sequence Modeling

JMLR 2023

  • Sign Language
  • Video Understanding
  • Sequence Modeling

TOG 2024

  • Sign Language
  • Video Understanding
  • Sequence Modeling

TOG 2023

  • Sign Language
  • Video Understanding
  • Sequence Modeling

IJCV 2024

  • Sign Language
  • Video Understanding
  • Sequence Modeling

IJCV 2023

  • Sign Language
  • Video Understanding
  • Sequence Modeling

TIP 2024

  • Sign Language
  • Video Understanding
  • Sequence Modeling

TIP 2023

  • Sign Language
  • Video Understanding
  • Sequence Modeling

TC 2024

  • Sign Language
  • Video Understanding
  • Sequence Modeling

TC 2023

  • Sign Language
  • Video Understanding
  • Sequence Modeling

TCSVT 2024

  • Sign Language
  • Video Understanding
  • Sequence Modeling

TCSVT 2023

  • Sign Language
  • Video Understanding
  • Sequence Modeling

TMM 2024

  • Sign Language
  • Video Understanding
  • Sequence Modeling

TMM 2023

  • Sign Language
  • Video Understanding
  • Sequence Modeling

TVCG 2024

  • Sign Language
  • Video Understanding
  • Sequence Modeling

TVCG 2023

  • Sign Language
  • Video Understanding
  • Sequence Modeling

PR 2024

  • Sign Language
  • Video Understanding
  • Sequence Modeling

PR 2023

  • Sign Language
  • Video Understanding
  • Sequence Modeling

Arxiv

2025.03

  • Sign Language
  • Video Understanding
  • Sequence Modeling

2025.02

  • Sign Language
  • Video Understanding
  • Sequence Modeling

2025.01

  • Sign Language
  • Video Understanding
  • Sequence Modeling

2024.12

  • Sign Language
  • Video Understanding
  • Sequence Modeling

2024.11

  • Sign Language
  • Video Understanding
  • Sequence Modeling

2024.10

  • Sign Language
  • Video Understanding
  • Sequence Modeling

2024.09

  • Sign Language
  • Video Understanding Sequence Modeling

2024.08

  • Sign Language
  • Video Understanding
  • Sequence Modeling

2024.07

  • Sign Language
  • Video Understanding
  • Sequence Modeling

2024.06

  • Sign Language
  • Video Understanding
  • Sequence Modeling

2024.05

  • Sign Language
  • Video Understanding
  • Sequence Modeling

2024.04

  • Sign Language
  • Video Understanding
  • Sequence Modeling

2024.03

  • Sign Language
  • Video Understanding
  • Sequence Modeling

2024.02

  • Sign Language
  • Video Understanding
  • Sequence Modeling

2024.01

  • Sign Language
  • Video Understanding
  • Sequence Modeling

About

An Awesome Collection for Sequence Modeling in Diversity Field

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0