NFDI4DS | UHH-SEMS - Publication Details

Visual attention network

OPENALEX - Publications

Meng-Hao Guo Cheng-Ze Lu Zheng-Ning Liu Ming–Ming Cheng Shi‐Min Hu

Abstract While originally designed for natural language processing tasks, the self-attention mechanism has recently taken various computer vision areas by storm. However, 2D nature of images brings three challenges applying in vision: (1) treating as 1D sequences neglects their structures; (2) quadratic complexity is too expensive high-resolution images; (3) it only captures spatial adaptability but ignores channel adaptability. In this paper, we propose a novel linear attention named large...

10.1007/s41095-023-0364-2 article EN cc-by Computational Visual Media 2023-07-28

SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation

OPENALEX - Publications

Meng-Hao Guo Cheng-Ze Lu Qibin Hou Zhengning Liu Ming–Ming Cheng and 1 more

We present SegNeXt, a simple convolutional network architecture for semantic segmentation. Recent transformer-based models have dominated the field of segmentation due to efficiency self-attention in encoding spatial information. In this paper, we show that attention is more efficient and effective way encode contextual information than mechanism transformers. By re-examining characteristics owned by successful models, discover several key components leading performance improvement models....

10.48550/arxiv.2209.08575 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Towards An End-to-End Framework for Flow-Guided Video Inpainting

OPENALEX - Publications

Zhen Li Cheng-Ze Lu Jianhua Qin Chunle Guo Ming–Ming Cheng

Optical flow, which captures motion information across frames, is exploited in recent video inpainting methods through propagating pixels along its trajectories. However, the hand-crafted flow-based processes these are applied separately to form whole pipeline. Thus, less efficient and rely heavily on intermediate results from earlier stages. In this paper, we propose an End-to-End framework for Flow-Guided Video Inpainting (E <sup xmlns:mml="http://www.w3.org/1998/Math/MathML"...

10.1109/cvpr52688.2022.01704 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

Contrastive Masked Autoencoders are Stronger Vision Learners

OPENALEX - Publications

Zhicheng Huang Xiaojie Jin Cheng-Ze Lu Qibin Hou Ming–Ming Cheng and 3 more

Masked image modeling (MIM) has achieved promising results on various vision tasks. However, the limited discriminability of learned representation manifests there is still plenty to go for making a stronger learner. Towards this goal, we propose Contrastive Autoencoders (CMAE), new self-supervised pre-training method learning more comprehensive and capable representations. By elaboratively unifying contrastive (CL) masked model through novel designs, CMAE leverages their respective...

10.1109/tpami.2023.3336525 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2023-11-28

Conv2Former: A Simple Transformer-Style ConvNet for Visual Recognition

OPENALEX - Publications

Qibin Hou Cheng-Ze Lu Ming–Ming Cheng Jiashi Feng

Vision Transformers have been the most popular network architecture in visual recognition recently due to strong ability of encode global information. However, its high computational cost when processing high-resolution images limits applications downstream tasks. In this paper, we take a deep look at internal structure self-attention and present simple Transformer style convolutional neural (ConvNet) for recognition. By comparing design principles recent ConvNets Transformers, propose...

10.1109/tpami.2024.3401450 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2024-05-15

Conv2Former: A Simple Transformer-Style ConvNet for Visual Recognition

OPENALEX - Publications

Qibin Hou Cheng-Ze Lu Ming–Ming Cheng Jiashi Feng

This paper does not attempt to design a state-of-the-art method for visual recognition but investigates more efficient way make use of convolutions encode spatial features. By comparing the principles recent convolutional neural networks ConvNets) and Vision Transformers, we propose simplify self-attention by leveraging modulation operation. We show that such simple approach can better take advantage large kernels (>=7x7) nested in layers. build family hierarchical ConvNets using proposed...

10.48550/arxiv.2211.11943 preprint EN cc-by-nc-sa arXiv (Cornell University) 2022-01-01

NTIRE 2021 Depth Guided Image Relighting Challenge

OPENALEX - Publications

Majed El Helou Ruofan Zhou Sabine Süsstrunk Radu Timofte Maitreya Suin and 39 more

Image relighting is attracting increasing interest due to its various applications. From a research perspective, im-age can be exploited conduct both image normalization for domain adaptation, and also data augmentation. It has multiple direct uses photo montage aesthetic enhancement. In this paper, we review the NTIRE 2021 depth guided challenge.We rely on VIDIT dataset each of our two challenge tracks, including information. The first track one-to-one where goal transform illumination...

10.1109/cvprw53098.2021.00069 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2021-06-01

Visual Attention Network

OPENALEX - Publications

Meng-Hao Guo Cheng-Ze Lu Zheng-Ning Liu Ming–Ming Cheng Shi‐Min Hu

While originally designed for natural language processing tasks, the self-attention mechanism has recently taken various computer vision areas by storm. However, 2D nature of images brings three challenges applying in vision. (1) Treating as 1D sequences neglects their structures. (2) The quadratic complexity is too expensive high-resolution images. (3) It only captures spatial adaptability but ignores channel adaptability. In this paper, we propose a novel linear attention named large...

10.48550/arxiv.2202.09741 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Contrastive Masked Autoencoders are Stronger Vision Learners

OPENALEX - Publications

Zhicheng Huang Xiaojie Jin Cheng-Ze Lu Qibin Hou Ming–Ming Cheng and 3 more

Masked image modeling (MIM) has achieved promising results on various vision tasks. However, the limited discriminability of learned representation manifests there is still plenty to go for making a stronger learner. Towards this goal, we propose Contrastive Autoencoders (CMAE), new self-supervised pre-training method learning more comprehensive and capable representations. By elaboratively unifying contrastive (CL) masked model through novel designs, CMAE leverages their respective...

10.48550/arxiv.2207.13532 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Highly Efficient Salient Object Detection with 100K Parameters

OPENALEX - Publications

Shanghua Gao Yongqiang Tan Ming–Ming Cheng Cheng-Ze Lu Yunpeng Chen and 1 more

Salient object detection models often demand a considerable amount of computation cost to make precise prediction for each pixel, making them hardly applicable on low-power devices. In this paper, we aim relieve the contradiction between and model performance by improving network efficiency higher degree. We propose flexible convolutional module, namely generalized OctConv (gOctConv), efficiently utilize both in-stage cross-stages multi-scale features, while reducing representation...

10.48550/arxiv.2003.05643 preprint EN other-oa arXiv (Cornell University) 2020-01-01

CMAE-V: Contrastive Masked Autoencoders for Video Action Recognition

OPENALEX - Publications

Cheng-Ze Lu Xiaojie Jin Zhicheng Huang Qibin Hou Ming–Ming Cheng and 1 more

Contrastive Masked Autoencoder (CMAE), as a new self-supervised framework, has shown its potential of learning expressive feature representations in visual image recognition. This work shows that CMAE also trivially generalizes well on video action recognition without modifying the architecture and loss criterion. By directly replacing original pixel shift with temporal shift, our for recognition, CMAE-V short, can generate stronger than counterpart based pure masked autoencoders. Notably,...

10.48550/arxiv.2301.06018 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Delving Deeper into Data Scaling in Masked Image Modeling

OPENALEX - Publications

Cheng-Ze Lu Xiaojie Jin Qibin Hou Jun Hao Liew Ming–Ming Cheng and 1 more

Understanding whether self-supervised learning methods can scale with unlimited data is crucial for training large-scale models. In this work, we conduct an empirical study on the scaling capability of masked image modeling (MIM) (e.g., MAE) visual recognition. Unlike most previous works that depend widely-used ImageNet dataset, which manually curated and object-centric, take a step further propose to investigate problem in more practical setting. Specifically, utilize web-collected...

10.48550/arxiv.2305.15248 preprint EN other-oa arXiv (Cornell University) 2023-01-01

COMPUTABLE IMPLEMENTATION OF ``FUNDAMENTAL THEOREM OF ALGEBRA"

OPENALEX - Publications

Jon A. Sjogren X. Li Minghui Zhao Cheng-Ze Lu

The Fundamental Theorem of Algebra (FTA) has been studied for more than 300 years: or less satisfactory proofs FTA emerged in the 18th and 19th centuries.Proofs denoted as 'algebraic' 'elementary' derived from axioms defining a Real-Closed Field (RCF).A proof is given that brings up-to-date work Gauss (1816) P. Gordan (1879).It does not refer explicitly to complex numbers but instead works with auxiliary polynomials two variables.We report computer software developed effect symbolic...

10.12732/ijpam.v86i1.9 article EN International Journal of Pure and Apllied Mathematics 2013-07-12

Towards An End-to-End Framework for Flow-Guided Video Inpainting

OPENALEX - Publications

Zhen Li Cheng-Ze Lu Jianhua Qin Chunle Guo Ming–Ming Cheng

Optical flow, which captures motion information across frames, is exploited in recent video inpainting methods through propagating pixels along its trajectories. However, the hand-crafted flow-based processes these are applied separately to form whole pipeline. Thus, less efficient and rely heavily on intermediate results from earlier stages. In this paper, we propose an End-to-End framework for Flow-Guided Video Inpainting (E$^2$FGVI) elaborately designed three trainable modules, namely,...

10.48550/arxiv.2204.02663 preprint EN cc-by-nc-sa arXiv (Cornell University) 2022-01-01