记录当天阅读 1 小时以上的文献 (不一定读完), 勾选表示做了笔记.
- 20250213 [2020] End-to-end object detection with transformer
- 20250212 [2017] Attention is All You Need
- 20250121 [2022] SVTR_ Scene Text Recognition with a Single Visual Model
- 20241111 [2022 IJCAI] SVTR_ Scene Text Recognition with a Single Visual Model
- 20241111 [2020 AAAI] Real-time Scene Text Detection with Differentiable Binarization
- 20240408 [2023] Improved Baselines with Visual Instruction Tuning
- 20240227 泛读大模型压缩相关文献
- 20240227 [2022 ICLR] Finetuned Language Models Are Zero-Shot Learners
- 20240223 [2015] Cross Modal Distillation for Supervision Transfer
- 20240222 泛读大模型压缩相关文献
- 20240222 [2022] Contrastive Learning Rivals Masked Image Modeling in Fine-tuning via Feature Distillation
- 20240222 [2023] Multimodal Chain-of-Thought Reasoning in Language Models
- 20240221 泛读大模型压缩相关文献
- 20240221 [2018] Improving language understanding by generative pre-training
- 20240220 [2023 ACL] Distilling Step-by-Step_ Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes
- 20240220 [2018] Distilling Task-Specific Knowledge from BERT into Simple Neural Networks
- 20240220 [2020] DistilBERT, a distilled version of BERT_ smaller, faster, cheaper and lighter
- 20240204 [2023 CVPR] Micron-BERT_ BERT-based Facial Micro-Expression Recognition
- 20231120 [2023] Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
- 20231116 [2023] Shikra_ Unleashing Multimodal LLM’s Referential Dialogue Magic
- 20231116 [2023] Ferret_ Refer and Ground Anything Anywhere at Any Granularity
- 20231107 [2023] Visual Instruction Tuning
- 20231107 [2023] What Makes for Good Visual Instructions_ Synthesizing Complex Visual Reasoning Instructions for Visual Instruction Tuning
- 20231019 [2021] Improving Calibration for Long-Tailed Recognition
- 20231017 [2023 NeurIPS] Multi-modal Queried Object Detection in the Wild
- 20231017 [2019] Objects365_ A Large-scale, High-quality Dataset for Object Detection
- 20230829 [2021] RegionCLIP_ Region-based Language-Image Pretraining
- 20230829 [2021] OpenPrompt_ An Open-source Framework for Prompt-learning
- 20230630 泛读多模态任务微调相关文献
- 20230627 [2023] Segment Anything
- 20230627 [2020] End-to-End Object Detection with Transformers
- 20220621 [2022 ECCV] Visual Prompt Tuning
- 20220621 [2023] Segment Anything in High Quality
- 20220510 泛读视觉语言预训练相关文献
- 20230427 [2021] Swin Transformer_ Hierarchical Vision Transformer using Shifted Windows
- 20230427 [2022] Expanding Language-Image Pretrained Models for General Video Recognition
- 20230422 [2020] An image is worth 16x16 words_ Transformers for image recognition at scale
- 20230422 [2021] Align before Fuse_ Vision and Language Representation Learning with Momentum Distillation
- 20230418 [2021] BEiT_ BERT Pre-Training of Image Transformers
- 20230418 [2021] iBOT_ Image BERT Pre-Training with Online Tokenizer
- 20230418 [2023] DINOv2_ Learning Robust Visual Features without Supervision
- 20230417 [2021] Emerging Properties in Self-Supervised Vision Transformers
- 20230416 [2023] Scaling Vision Transformers to 22 Billion Parameters
- 20230416 [2022] LiT_ Zero-Shot Transfer with Locked-image text Tuning
- 20230416 [2021 ICML] Scaling up visual and vision-language representation learning with noisy text supervision
- 20230404 [2022] Confident Learning_ Estimating Uncertainty in Dataset Labels
- 20230206 [2021 NIPS] SegFormer_ Simple and Efficient Design for Semantic Segmentation with Transformers
- 20221212 泛读 image editing 相关文献
- 20221122 [2018 ECCV] Learning to Navigate for Fine-grained Classification
- 20221121 [2022 CVPR] Fine-Grained Object Classification via Self-Supervised Pose Alignment
- 20221121 [2018 CVPR] Cascaded Pyramid Network for Multi-Person Pose Estimation
- 20221019 [2021 CVPR] Benchmarking Representation Learning for Natural World Image Collections
- 20220906 [2019 CVPR] Feature Selective Anchor-Free Module for Single-Shot Object Detection
- 20220729 [2022] Language Models are General-Purpose Interfaces
- 20220728 [2022 ICML] OFA_ Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
- 20220621 [2020] Rethinking of Pedestrian Attribute Recognition_ Realistic Datasets and A Strong Baseline
- 20220614 [2021] Are Large-scale Datasets Necessary for Self-Supervised Pre-training
- 20220430 泛读 self-supervised learning 相关文献
- 20220429 泛读 self-supervised learning 相关文献
- 20220418 [2021] DataPerf: Benchmarking Data for Better ML
- 20220413 [2018] Arbitrary-Oriented Scene Text Detection via Rotation Proposals
- 20220413 [2017] Attention is All You Need
- 20220412 [2022] Towards Online Domain Adaptive Object Detection
- 20220411 [2020] Channel Distillation_ Channel-Wise Attention for Knowledge Distillation
- 20220312 [2021] GAN inversion: A survey
- 20220311 [2022 WACV] Latent to Latent_ A Learned Mapper for Identity Preserving Editing of Multiple Face Attributes in StyleGAN-generated Images
- 20220308 [2021] Pivotal Tuning for Latent-based Editing of Real Images
- 20220307 [2020] LSUN-Stanford Car Dataset_ Enhancing Large-Scale Car Image Datasets Using Deep Learning for Usage in GAN Training
- 20220307 [2022] Self-Distilled StyleGAN_ Towards Generation from Internet Photos
- 20220307 [2016] Image-to-Image Translation with Conditional Adversarial Networks
- 20220307 [2017 ICCV] Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks
- 20220304 [2019] Interpreting the Latent Space of GANs for Semantic Face Editing
- 20220303 [2020 NeurIPS] Training generative adversarial networks with limited data
- 20220301 [2021] Lite-HRNet_ A Lightweight High-Resolution Network
- 20220228 [2021] SimMIM_ A Simple Framework for Masked Image Modeling
- 20220224 [2021] End-to-End Object Detection with Fully Convolutional Network
- 20220223 [2018] Unsupervised Feature Learning via Non-Parametric Instance Discrimination
- 20220222 [2020] Unsupervised Image-to-Image Translation via Pre-trained StyleGAN2 Network
- 20220217 [2017 ICCV] Arbitrary style transfer in real-time with adaptive instance normalization
- 20220217 [2019 ICCV] Image2StyleGAN_ How to Embed Images Into the StyleGAN Latent Space
- 20220217 [2021] Encoding in Style_ a StyleGAN Encoder for Image-to-Image Translation
- 20220216 [2020 CVPR] Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection
- 20220216 [2017] An Implementation of Faster RCNN with Study for Region Sampling
- 20220215 [2018 ECCV] Acquisition of Localization Confidence for Accurate Object Detection
- 20220215 [2019] FCOS_ Fully Convolutional One-Stage Object Detection
- 20220214 [2021] Swin Transformer_ Hierarchical Vision Transformer using Shifted Windows
- 20220210 [2021 CVPR] Exploring Simple Siamese Representation Learning
- 20220210 [2017] Large batch training of convolutional networks
- 20220209 [2016] Perceptual Losses for Real-Time Style Transfer and Super-Resolution
- 20220127 [2021] Sample and Computation Redistribution for Efficient Face Detection
- 20220107 [2021] Residual Attention_ A Simple but Effective Method for Multi-Label Recognition
- 20220105 [2021] PP-YOLOv2_ A Practical Object Detector
- 20220104 [2016] DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations
- 20220104 [2019 CVPR] DeepFashion2_ A Versatile Benchmark for Detection, Pose Estimation, Segmentation and Re-Identification of Clothing Images
- 20211230 [2020] ELF_ An Early-Exiting Framework for Long-Tailed Classification
- 20211118 [2019] M2det_ A single-shot object detector based on multi-level feature pyramid network
- 20211118 [2018 CVPR] Scale-Transferrable Object Detection
- 20211017 [2021] You Only Look One-level Feature
- 20211015 [2021] Hand Image Understanding via Deep Multi-Task Learning
- 20210913 [2021] YOLO5Face_ Why Reinventing a Face Detector
- 20210907 [2021] You Only Look One-level Feature
- 20210824 [2020] EfficientDet_ Scalable and Efficient Object Detection
- 20210824 [2021] Revisiting Classification Perspective on Scene Text Recognition
- 20210809 [2020] Attention_ A Lightweight 2D Hand Pose Estimation Approach
- 20210730 [2021 CVPR] Exploring Simple Siamese Representation Learning
- 20210729 [2019] RetinaFace_ Single-stage Dense Face Localisation in the Wild
- 20210728 [2021 CVPR] Multi-Scale Aligned Distillation for Low-Resolution Detection
- 20210725 [2018 CVPR] Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation
- 20210724 [2019] Bag of Freebies for Training Object Detection Neural Networks
- 20210723 [2021] YOLOX_ Exceeding YOLO Series in 2021
- 20210719 [2020 ICLR] CurricularFace_ Adaptive Curriculum Learning Loss For Deep Face Recognition
- 20210713 [2016] Simple Online And Realtime Tracking
- 20210713 [2017] Simple Online and Realtime Tracking with a Deep Association Metric
- 20210628 [2018] The iNaturalist Species Classification and Detection Dataset
- 20210627 [2021] TinaFace_ Strong but Simple Baseline for Face Detection
- 20210627 [2021 CVPR] EPSANet_ An Efficient Pyramid Split Attention Block on Convolutional Neural Network
- 20210617 [2018] CrowdHuman_ A Benchmark for Detecting Human in a Crowd
- 20210218 [2019 CVPR] Look More Than Once_ An Accurate Detector for Text of Arbitrary Shapes
- 20210218 [2021] Pushing the Envelope of Thin Crack Detection
- 20210202 [2018] Shape Robust Text Detection with Progressive Scale Expansion Network
- 20210202 [2017 CVPR] EAST_ An Efficient and Accurate Scene Text Detector
- 20210201 [2019 CVPR] Look More Than Once_ An Accurate Detector for Text of Arbitrary Shapes
- 20210112 [2021] Pushing the Envelope of Thin Crack Detection
- 20210112 [2016] Deeptext_ A unified framework for text proposal generation and text detection in natural images
- 20210112 [2021] Research on Fast Text Recognition Method for Financial Ticket Image
- 20210111 [2020 CVPRW] CSPNet_ A new backbone that can enhance learning capability of CNN
- 20201229 [2020] Scene Text Detection with Scribble Lines
- 20201217 [2020] Group Masked Autoencoder Based Density Estimator For Audio Anomaly Detection
- 20201217 [2019] Real-time Scene Text Detection with Differentiable Binarization
- 20201208 [2020] MAAD-Face_ A Massively Annotated Attribute Dataset for Face Images
- 20201208 [2020] OneNet_ End-to-End One-Stage Object Detection by Classificaion Cost
- 20201207 [2020] Cc-Loss_ Channel Correlation Loss For Image Classification
- 20201203 [2020 ECCV] PIoU Loss_ Towards Accurate Oriented Object Detection in Complex Environments
- 20201114 [2020] TResNet_ High Performance GPU-Dedicated Architecture
- 20201109 [2020] Attentional Feature Fusion
- 20201103 [2020 ECCV] Dive Deeper Into Box for Object Detection
- 20200816 [2020] Prime-Aware Adaptive Distillation
- 20200816 [2019 CVPR] C3AE: Exploring the Limits of Compact Model for Age Estimation
- 20200813 [2020] Incomplete Descriptor Mining with Elastic Loss for Person Re-Identification
- 20200812 [2019] Multi-Similarity Loss with General Pair Weighting for Deep Metric Learning
- 20200804 [2020] PP-YOLO_ An Effective and Efficient Implementation of Object Detector
- 20200700 [2011] Blind/Referenceless Image Spatial Quality Evaluator
- 20200616 [2020] Rethinking ImageNet Pre-training
- 20200105 [2019] MRCNet_ Crowd Counting and Density Map Estimation in Aerial and Ground Imagery
- 20170813 [2013 CVPR] Saliency Detection via Graph-Based Manifold Ranking
- 20170214 [2016] Understanding and Improving Convolutional Neural Networks via CReLU
- 20170122 [2016] Deep Learning without Poor Local Minima
- 20170120 [2016] Large-Margin Softmax Loss for Convolutional Neural Networks
- 20170000 [2016 ICLR] Deep Compression_ Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
- 20161230 [2011] Fast coordinate descent methods with variable
- 20150000 [2012 NPAR] Combining Sketch and Tone for Pencil Drawing Production
- 20140000 [2010 CVPR] Detecting Text in Natural Scenes with Stroke Width Transform