- Advanced Vision and Imaging
- Image Processing Techniques and Applications
- Advanced Image Processing Techniques
- Multimodal Machine Learning Applications
- Computer Graphics and Visualization Techniques
- Optical measurement and interference techniques
- Advanced Image and Video Retrieval Techniques
- Generative Adversarial Networks and Image Synthesis
- Domain Adaptation and Few-Shot Learning
- Advanced Neural Network Applications
- GaN-based semiconductor devices and materials
- Human Pose and Action Recognition
- Image Enhancement Techniques
- Adversarial Robustness in Machine Learning
- Explainable Artificial Intelligence (XAI)
- 3D Shape Modeling and Analysis
- Advanced Algorithms and Applications
- Video Analysis and Summarization
- Visual Attention and Saliency Detection
- Semiconductor materials and devices
- Fault Detection and Control Systems
- Advanced Computational Techniques and Applications
- Cavitation Phenomena in Pumps
- High-Voltage Power Transmission Systems
- Synthetic Aperture Radar (SAR) Applications and Techniques
Peking University First Hospital
2025
Peking University
2025
Nanjing University of Finance and Economics
2024
Adobe Systems (United States)
2016-2024
Zhejiang University
2003-2024
Xi'an Polytechnic University
2023-2024
Beijing University of Posts and Telecommunications
2004-2022
China Institute of Water Resources and Hydropower Research
2021
North China Electric Power University
2010-2021
South China University of Technology
2020
Single image depth prediction is a challenging task due to its ill-posed nature and challenges with capturing ground truth for supervision. Large-scale disparity data generated from stereo photos 3D videos promising source of supervision, however, such can only approximate the inverse up an affine transformation. To more effectively learn pseudo-depth data, we propose use simple pair-wise ranking loss novel sampling strategy. Instead randomly point pairs, guide better characterize structure...
Despite significant progress in monocular depth estimation the wild, recent state-of-the-art methods cannot be used to recover accurate 3D scene shape due an unknown shift induced by shift-invariant reconstruction losses mixed-data prediction training, and possible camera focal length. We investigate this problem detail, propose a two-stage framework that first predicts up scale from single image, then use point cloud encoders predict missing length allow us realistic shape. In addition, we...
We develop an approach to learning visual representations that embraces multimodal data, driven by a combination of intra- and inter-modal similarity preservation objectives. Unlike existing pre-training methods, which solve proxy prediction task in single domain, our method exploits intrinsic data properties within each modality semantic information from cross-modal correlation simultaneously, hence improving the quality learned representations. By including training unified framework with...
Neural image/video captioning models can generate accurate descriptions, but their internal process of mapping regions to words is a black box and therefore difficult explain. Top-down neural saliency methods find important given high-level semantic task such as object classification, cannot use natural language sentence the top-down input for task. In this paper, we propose Caption-Guided Visual Saliency expose region-to-word in modern encoder-decoder networks demonstrate that it learned...
Monocular depth estimation is an ill-posed problem, and as such critically relies on scene priors semantics. Due to its complexity, we propose a deep neural network model based semantic divide-and-conquer approach. Our decomposes into segments, object instances background stuff classes, then predicts scale shift invariant map for each segment in canonical space. Semantic segments of the same category share decoder, so global prediction task decomposed series category-specific ones, which are...
We propose Mask Guided (MG) Matting, a robust matting framework that takes general coarse mask as guidance. MG Matting leverages network (PRN) design which encourages the model to provide self-guidance progressively refine uncertain regions through decoding process. A series of guidance perturbation operations are also introduced in training further enhance its robustness external show PRN can generalize unseen types masks such trimap and low-quality alpha matte, making it suitable for...
Modeling layout is an important first step for graphic design. Recently, methods generating layouts have progressed, particularly with Generative Adversarial Networks (GANs). However, the problem of specifying locations and sizes design elements usually involves constraints respect to element attributes, such as area, aspect ratio reading-order. Automating attribute conditional remains a complex unsolved problem. In this article, we introduce Attribute-conditioned Layout GAN incorporate...
Image harmonization aims to improve the quality of image compositing by matching "appearance" (e.g., color tone, brightness and contrast) between foreground background images. However, collecting large-scale annotated datasets for this task requires complex professional retouching. Instead, we propose a novel Self-Supervised Harmonization framework (SSH) that can be trained using just "free" natural images without being edited. We reformulate problem from representation fusion perspective,...
We propose a new framework for conditional image synthesis from semantic layouts of any precision levels, ranging pure text to 2D canvas with precise shapes. More specifically, the input layout consists one or more regions free-form descriptions and adjustable which can be set based on desired controllability. The naturally reduces text-to-image (T2I) at lowest level no shape information, it becomes segmentation-to-image (S2I) highest level. By supporting levels in-between, our is flexible...
Layout is important for graphic design and scene generation. We propose a novel Generative Adversarial Network, called LayoutGAN, that synthesizes layouts by modeling geometric relations of different types 2D elements. The generator LayoutGAN takes as input set randomly-placed elements, represented vectors uses self-attention modules to refine their labels parameters jointly produce realistic layout. Accurate alignment critical good layouts. We, thus, differentiable wireframe rendering layer...
Object compositing based on 2D images is a challenging problem since it typically involves multiple processing stages such as color harmonization, geometry correction and shadow generation to generate realistic results. Furthermore, annotating training data pairs for requires substantial manual effort from professionals, hardly scalable. Thus, with the recent advances in generative models, this work, we propose selfsupervised framework object by leveraging power of conditional diffusion...
We aim to generate high resolution shallow depth-of-field (DoF) images from a single all-in-focus image with controllable focal distance and aperture size. To achieve this, we propose novel neural network model comprised of depth prediction module, lens blur guided upsampling module. All modules are differentiable learned data. train our collect dataset 2462 RGB-D captured by mobile phones dual-lens camera, use existing segmentation datasets improve border prediction. further leverage...
We present a scalable approach for learning powerful visual features emotion recognition. A critical bottleneck in recognition is the lack of large scale datasets that can be used features. To this end, we curate webly derived dataset, StockEmotion, which has more than million images. StockEmotion uses 690 related tags as labels giving us fine-grained and diverse set labels, circumventing difficulty manually obtaining annotations. use dataset to train feature extraction network, EmotionNet,...
Image compositing is a task of combining regions from different images to compose new image. A common use case background replacement portrait images. To obtain high quality composites, professionals typically manually perform multiple editing steps such as segmentation, matting and foreground color decontamination, which very time consuming even with sophisticated photo tools. In this paper, we propose method can automatically generate high-quality image with-out any user input. Our be...
Despite significant progress made in the past few years, challenges remain for depth estimation using a single monocular image. First, it is nontrivial to train metric-depth prediction model that can generalize well diverse scenes mainly due limited training data. Thus, researchers have built large-scale relative datasets are much easier collect. However, existing models often fail recover accurate 3D scene shapes unknown shift caused by with We tackle this problem here and attempt estimate...
We propose BokehMe, a hybrid bokeh rendering framework that marries neural renderer with classical physically motivated renderer. Given single image and potentially imperfect disparity map, BokehMe generates high-resolution photo-realistic effects adjustable blur size, focal plane, aperture shape. To this end, we analyze the errors from scattering-based method derive formulation to calculate an error map. Based on formulation, implement by two-stage fix erroneous areas The employs dynamic...
Geometric camera calibration is often required for applications that understand the perspective of image. We propose Perspective Fields as a representation models local properties an contain per-pixel information about view, parameterized Up-vector and Latitude value. This has number advantages; it makes minimal assumptions model invariant or equivariant to common image editing operations like cropping, warping, rotation. It also more interpretable aligned with human perception. train neural...
Gait analysis is widely utilized for the diagnosis and prognosis of various diseases. Recently, innovative convenient markerless motion capture systems have been developed to replace traditional marker-based three-dimensional systems. s:This study evaluate test-retest reliability a novel video-based system(Watrix, China) assess its concordance with system (BTS, Italy) in population young healthy subjects. Our included 36 adult participants. Each subject underwent three assessments using...
Inefficient remaining useful life (RUL) estimation may cause unpredictable failures and unscheduled maintenance of machining tools. Multi-sensor data fusion will improve the RUL prediction reliability by fusing more sensor information related to process In this paper, a multi-sensor system for online tools is proposed. The integrates signal collection, preprocess complementary ensemble empirical mode decomposition, feature extraction in time domain, frequency domain time-frequency such...