Cheng-Ze Lu

ORCID: 0000-0002-8225-6311
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Neural Network Applications
  • Domain Adaptation and Few-Shot Learning
  • Visual Attention and Saliency Detection
  • Advanced Image and Video Retrieval Techniques
  • Multimodal Machine Learning Applications
  • Advanced Image Processing Techniques
  • Generative Adversarial Networks and Image Synthesis
  • Advanced Vision and Imaging
  • Face Recognition and Perception
  • Historical Astronomy and Related Studies
  • semigroups and automata theory
  • Image Retrieval and Classification Techniques
  • Polynomial and algebraic computation
  • Digital Imaging for Blood Diseases
  • Image Enhancement Techniques
  • Human Pose and Action Recognition
  • Video Surveillance and Tracking Methods
  • Brain Tumor Detection and Classification
  • Computability, Logic, AI Algorithms

Nankai University
2020-2024

Towson University
2013

Abstract While originally designed for natural language processing tasks, the self-attention mechanism has recently taken various computer vision areas by storm. However, 2D nature of images brings three challenges applying in vision: (1) treating as 1D sequences neglects their structures; (2) quadratic complexity is too expensive high-resolution images; (3) it only captures spatial adaptability but ignores channel adaptability. In this paper, we propose a novel linear attention named large...

10.1007/s41095-023-0364-2 article EN cc-by Computational Visual Media 2023-07-28

We present SegNeXt, a simple convolutional network architecture for semantic segmentation. Recent transformer-based models have dominated the field of segmentation due to efficiency self-attention in encoding spatial information. In this paper, we show that attention is more efficient and effective way encode contextual information than mechanism transformers. By re-examining characteristics owned by successful models, discover several key components leading performance improvement models....

10.48550/arxiv.2209.08575 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Optical flow, which captures motion information across frames, is exploited in recent video inpainting methods through propagating pixels along its trajectories. However, the hand-crafted flow-based processes these are applied separately to form whole pipeline. Thus, less efficient and rely heavily on intermediate results from earlier stages. In this paper, we propose an End-to-End framework for Flow-Guided Video Inpainting (E <sup xmlns:mml="http://www.w3.org/1998/Math/MathML"...

10.1109/cvpr52688.2022.01704 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

Masked image modeling (MIM) has achieved promising results on various vision tasks. However, the limited discriminability of learned representation manifests there is still plenty to go for making a stronger learner. Towards this goal, we propose Contrastive Autoencoders (CMAE), new self-supervised pre-training method learning more comprehensive and capable representations. By elaboratively unifying contrastive (CL) masked model through novel designs, CMAE leverages their respective...

10.1109/tpami.2023.3336525 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2023-11-28

Vision Transformers have been the most popular network architecture in visual recognition recently due to strong ability of encode global information. However, its high computational cost when processing high-resolution images limits applications downstream tasks. In this paper, we take a deep look at internal structure self-attention and present simple Transformer style convolutional neural (ConvNet) for recognition. By comparing design principles recent ConvNets Transformers, propose...

10.1109/tpami.2024.3401450 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2024-05-15

This paper does not attempt to design a state-of-the-art method for visual recognition but investigates more efficient way make use of convolutions encode spatial features. By comparing the principles recent convolutional neural networks ConvNets) and Vision Transformers, we propose simplify self-attention by leveraging modulation operation. We show that such simple approach can better take advantage large kernels (>=7x7) nested in layers. build family hierarchical ConvNets using proposed...

10.48550/arxiv.2211.11943 preprint EN cc-by-nc-sa arXiv (Cornell University) 2022-01-01

Image relighting is attracting increasing interest due to its various applications. From a research perspective, im-age can be exploited conduct both image normalization for domain adaptation, and also data augmentation. It has multiple direct uses photo montage aesthetic enhancement. In this paper, we review the NTIRE 2021 depth guided challenge.We rely on VIDIT dataset each of our two challenge tracks, including information. The first track one-to-one where goal transform illumination...

10.1109/cvprw53098.2021.00069 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2021-06-01

While originally designed for natural language processing tasks, the self-attention mechanism has recently taken various computer vision areas by storm. However, 2D nature of images brings three challenges applying in vision. (1) Treating as 1D sequences neglects their structures. (2) The quadratic complexity is too expensive high-resolution images. (3) It only captures spatial adaptability but ignores channel adaptability. In this paper, we propose a novel linear attention named large...

10.48550/arxiv.2202.09741 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Masked image modeling (MIM) has achieved promising results on various vision tasks. However, the limited discriminability of learned representation manifests there is still plenty to go for making a stronger learner. Towards this goal, we propose Contrastive Autoencoders (CMAE), new self-supervised pre-training method learning more comprehensive and capable representations. By elaboratively unifying contrastive (CL) masked model through novel designs, CMAE leverages their respective...

10.48550/arxiv.2207.13532 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Salient object detection models often demand a considerable amount of computation cost to make precise prediction for each pixel, making them hardly applicable on low-power devices. In this paper, we aim relieve the contradiction between and model performance by improving network efficiency higher degree. We propose flexible convolutional module, namely generalized OctConv (gOctConv), efficiently utilize both in-stage cross-stages multi-scale features, while reducing representation...

10.48550/arxiv.2003.05643 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Contrastive Masked Autoencoder (CMAE), as a new self-supervised framework, has shown its potential of learning expressive feature representations in visual image recognition. This work shows that CMAE also trivially generalizes well on video action recognition without modifying the architecture and loss criterion. By directly replacing original pixel shift with temporal shift, our for recognition, CMAE-V short, can generate stronger than counterpart based pure masked autoencoders. Notably,...

10.48550/arxiv.2301.06018 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Understanding whether self-supervised learning methods can scale with unlimited data is crucial for training large-scale models. In this work, we conduct an empirical study on the scaling capability of masked image modeling (MIM) (e.g., MAE) visual recognition. Unlike most previous works that depend widely-used ImageNet dataset, which manually curated and object-centric, take a step further propose to investigate problem in more practical setting. Specifically, utilize web-collected...

10.48550/arxiv.2305.15248 preprint EN other-oa arXiv (Cornell University) 2023-01-01

The Fundamental Theorem of Algebra (FTA) has been studied for more than 300 years: or less satisfactory proofs FTA emerged in the 18th and 19th centuries.Proofs denoted as 'algebraic' 'elementary' derived from axioms defining a Real-Closed Field (RCF).A proof is given that brings up-to-date work Gauss (1816) P. Gordan (1879).It does not refer explicitly to complex numbers but instead works with auxiliary polynomials two variables.We report computer software developed effect symbolic...

10.12732/ijpam.v86i1.9 article EN International Journal of Pure and Apllied Mathematics 2013-07-12

Optical flow, which captures motion information across frames, is exploited in recent video inpainting methods through propagating pixels along its trajectories. However, the hand-crafted flow-based processes these are applied separately to form whole pipeline. Thus, less efficient and rely heavily on intermediate results from earlier stages. In this paper, we propose an End-to-End framework for Flow-Guided Video Inpainting (E$^2$FGVI) elaborately designed three trainable modules, namely,...

10.48550/arxiv.2204.02663 preprint EN cc-by-nc-sa arXiv (Cornell University) 2022-01-01
Coming Soon ...