- Advanced Neural Network Applications
- Domain Adaptation and Few-Shot Learning
- Visual Attention and Saliency Detection
- Advanced Image and Video Retrieval Techniques
- Multimodal Machine Learning Applications
- Advanced Image Processing Techniques
- Generative Adversarial Networks and Image Synthesis
- Advanced Vision and Imaging
- Face Recognition and Perception
- Historical Astronomy and Related Studies
- semigroups and automata theory
- Image Retrieval and Classification Techniques
- Polynomial and algebraic computation
- Digital Imaging for Blood Diseases
- Image Enhancement Techniques
- Human Pose and Action Recognition
- Video Surveillance and Tracking Methods
- Brain Tumor Detection and Classification
- Computability, Logic, AI Algorithms
Nankai University
2020-2024
Towson University
2013
Abstract While originally designed for natural language processing tasks, the self-attention mechanism has recently taken various computer vision areas by storm. However, 2D nature of images brings three challenges applying in vision: (1) treating as 1D sequences neglects their structures; (2) quadratic complexity is too expensive high-resolution images; (3) it only captures spatial adaptability but ignores channel adaptability. In this paper, we propose a novel linear attention named large...
We present SegNeXt, a simple convolutional network architecture for semantic segmentation. Recent transformer-based models have dominated the field of segmentation due to efficiency self-attention in encoding spatial information. In this paper, we show that attention is more efficient and effective way encode contextual information than mechanism transformers. By re-examining characteristics owned by successful models, discover several key components leading performance improvement models....
Optical flow, which captures motion information across frames, is exploited in recent video inpainting methods through propagating pixels along its trajectories. However, the hand-crafted flow-based processes these are applied separately to form whole pipeline. Thus, less efficient and rely heavily on intermediate results from earlier stages. In this paper, we propose an End-to-End framework for Flow-Guided Video Inpainting (E <sup xmlns:mml="http://www.w3.org/1998/Math/MathML"...
Masked image modeling (MIM) has achieved promising results on various vision tasks. However, the limited discriminability of learned representation manifests there is still plenty to go for making a stronger learner. Towards this goal, we propose Contrastive Autoencoders (CMAE), new self-supervised pre-training method learning more comprehensive and capable representations. By elaboratively unifying contrastive (CL) masked model through novel designs, CMAE leverages their respective...
Vision Transformers have been the most popular network architecture in visual recognition recently due to strong ability of encode global information. However, its high computational cost when processing high-resolution images limits applications downstream tasks. In this paper, we take a deep look at internal structure self-attention and present simple Transformer style convolutional neural (ConvNet) for recognition. By comparing design principles recent ConvNets Transformers, propose...
This paper does not attempt to design a state-of-the-art method for visual recognition but investigates more efficient way make use of convolutions encode spatial features. By comparing the principles recent convolutional neural networks ConvNets) and Vision Transformers, we propose simplify self-attention by leveraging modulation operation. We show that such simple approach can better take advantage large kernels (>=7x7) nested in layers. build family hierarchical ConvNets using proposed...
Image relighting is attracting increasing interest due to its various applications. From a research perspective, im-age can be exploited conduct both image normalization for domain adaptation, and also data augmentation. It has multiple direct uses photo montage aesthetic enhancement. In this paper, we review the NTIRE 2021 depth guided challenge.We rely on VIDIT dataset each of our two challenge tracks, including information. The first track one-to-one where goal transform illumination...
While originally designed for natural language processing tasks, the self-attention mechanism has recently taken various computer vision areas by storm. However, 2D nature of images brings three challenges applying in vision. (1) Treating as 1D sequences neglects their structures. (2) The quadratic complexity is too expensive high-resolution images. (3) It only captures spatial adaptability but ignores channel adaptability. In this paper, we propose a novel linear attention named large...
Masked image modeling (MIM) has achieved promising results on various vision tasks. However, the limited discriminability of learned representation manifests there is still plenty to go for making a stronger learner. Towards this goal, we propose Contrastive Autoencoders (CMAE), new self-supervised pre-training method learning more comprehensive and capable representations. By elaboratively unifying contrastive (CL) masked model through novel designs, CMAE leverages their respective...
Salient object detection models often demand a considerable amount of computation cost to make precise prediction for each pixel, making them hardly applicable on low-power devices. In this paper, we aim relieve the contradiction between and model performance by improving network efficiency higher degree. We propose flexible convolutional module, namely generalized OctConv (gOctConv), efficiently utilize both in-stage cross-stages multi-scale features, while reducing representation...
Contrastive Masked Autoencoder (CMAE), as a new self-supervised framework, has shown its potential of learning expressive feature representations in visual image recognition. This work shows that CMAE also trivially generalizes well on video action recognition without modifying the architecture and loss criterion. By directly replacing original pixel shift with temporal shift, our for recognition, CMAE-V short, can generate stronger than counterpart based pure masked autoencoders. Notably,...
Understanding whether self-supervised learning methods can scale with unlimited data is crucial for training large-scale models. In this work, we conduct an empirical study on the scaling capability of masked image modeling (MIM) (e.g., MAE) visual recognition. Unlike most previous works that depend widely-used ImageNet dataset, which manually curated and object-centric, take a step further propose to investigate problem in more practical setting. Specifically, utilize web-collected...
The Fundamental Theorem of Algebra (FTA) has been studied for more than 300 years: or less satisfactory proofs FTA emerged in the 18th and 19th centuries.Proofs denoted as 'algebraic' 'elementary' derived from axioms defining a Real-Closed Field (RCF).A proof is given that brings up-to-date work Gauss (1816) P. Gordan (1879).It does not refer explicitly to complex numbers but instead works with auxiliary polynomials two variables.We report computer software developed effect symbolic...
Optical flow, which captures motion information across frames, is exploited in recent video inpainting methods through propagating pixels along its trajectories. However, the hand-crafted flow-based processes these are applied separately to form whole pipeline. Thus, less efficient and rely heavily on intermediate results from earlier stages. In this paper, we propose an End-to-End framework for Flow-Guided Video Inpainting (E$^2$FGVI) elaborately designed three trainable modules, namely,...