- Advanced Vision and Imaging
- Image Enhancement Techniques
- Generative Adversarial Networks and Image Synthesis
- Advanced Image and Video Retrieval Techniques
- Advanced Image Processing Techniques
- Video Analysis and Summarization
- Human Motion and Animation
- Computer Graphics and Visualization Techniques
- Robotics and Sensor-Based Localization
- Advanced Image Fusion Techniques
- Color Science and Applications
- Image and Signal Denoising Methods
- Image Processing Techniques and Applications
- Visual Attention and Saliency Detection
- Face recognition and analysis
- Music and Audio Processing
- Advanced Neural Network Applications
- Color perception and design
- Video Coding and Compression Technologies
- Speech and Audio Processing
- Industrial Vision Systems and Defect Detection
- Optical measurement and interference techniques
- Human Pose and Action Recognition
- Multimodal Machine Learning Applications
- Infrared Target Detection Methodologies
Tencent (China)
2022-2025
Wuhan Institute of Technology
2023-2025
Kuaishou (China)
2025
Dalian University of Technology
2024
Anqing Normal University
2023
Chinese University of Hong Kong
2018-2022
Renmin University of China
2022
Wuhan University
2015-2019
Central South University
2019
State Key Laboratory of Information Engineering in Surveying Mapping and Remote Sensing
2017-2019
In this paper, we propose a unified framework to generate pleasant and high-quality street-view panorama by stitching multiple panoramic images captured from the cameras mounted on mobile platform. Our proposed is comprised of four major steps: image warping, color correction, optimal seam line detection blending. Since input are without precisely common projection center scenes with depth differences respect different extents, such cannot be aligned in geometry. Therefore, an efficient...
It is a classical task to automatically extract road networks from very high-resolution (VHR) images in remote sensing. This paper presents novel method for extracting VHR remotely sensed complex urban scenes. Inspired by image segmentation, edge detection, and object skeleton extraction, we develop multitask convolutional neural network (CNN), called RoadNet, simultaneously predict surfaces, edges, centerlines, which the first work such field. The RoadNet solves seven important issues this...
Speech-driven 3D facial animation has been widely studied, yet there is still a gap to achieving realism and vividness due the highly ill-posed nature scarcity of audio-visual data. Existing works typically formulate cross-modal mapping into regression task, which suffers from regression-to-mean problem leading over-smoothed motions. In this paper, we propose cast speech-driven as code query task in finite proxy space learned codebook, effectively promotes generated motions by reducing...
Creating a vivid video from the event or scenario in our imagination is truly fascinating experience. Recent advancements text-to-video synthesis have unveiled potential to achieve this with prompts only. While text convenient conveying overall scene context, it may be insufficient control precisely. In paper, we explore customized generation by utilizing as context description and motion structure (e.g. frame- wise depth) concrete guidance. Our method, dubbed Make-Your-Video, involves...
In the image fusion mission, crucial task is to generate high-quality images for highlighting key objects while enhancing scenes be understood. To complete this and provide a powerful interpretability as well strong generalization ability in producing enjoyable results which are comfortable vision tasks (such detection their segmentation), we present novel interpretable decomposition scheme develop target-aware Taylor expansion approximation (T <sup...
We present VideoReTalking, a new system to edit the faces of real-world talking head video according input audio, producing high-quality and lip-syncing output even with different emotion. Our disentangles this objective into three sequential tasks: (1) face generation canonical expression; (2) audio-driven lip-sync; (3) enhancement for improving photo-realism. Given talking-head video, we first modify expression each frame same template using editing network, resulting in expression. This...
Image inpainting aims to fill the missing hole of input. It is hard solve this task efficiently when facing high-resolution images due two reasons: (1) Large reception field needs be handled for image inpainting. (2) The general encoder and decoder network synthesizes many background pixels synchronously form matrix. In paper, we try break above limitations first time thanks recent development continuous implicit representation. detail, down-sample encode degraded produce spatial-adaptive...
Video generation has increasingly gained interest in both academia and industry. Although commercial tools can generate plausible videos, there is a limited number of open-source models available for researchers engineers. In this work, we introduce two diffusion high-quality video generation, namely text-to-video (T2V) image-to-video (I2V) models. T2V synthesize based on given text input, while I2V incorporate an additional image input. Our proposed model realistic cinematic-quality videos...
Talent is an important strategic resource for regional economic development. Based on the background of “the talent war” that has broken out between various cities in recent years, this study empirically verified influence policy urban innovation 277 prefecture-level China from 2010 to 2019 using multi-period difference-in-differences model. The results indicated caused by positively influenced innovation, causing, instance, a dramatic increase number patents inventions. Among subsidy...
Fisheye image rectification and estimation of intrinsic parameters for real scenes have been addressed in the literature by using line information on distorted images. In this paper, we propose an easily implemented fisheye algorithm with constrains undistorted perspective plane. A novel Multi-Label Energy Optimization (MLEO) method is adopted to merge short circular arcs sharing same or approximately select long camera rectification. Further efficient estimate automatically selecting three...
Colorization is multimodal by nature and challenges existing frameworks to achieve colorful structurally consistent results. Even the sophisticated autoregressive model struggles maintain long-distance color consistency due fragility of sequential dependence. To overcome this challenge, we propose a novel colorization framework that disentangles multimodality structure through global anchors, so both aspects could be learned effectively. Our key insight several carefully located anchors...
Once a color image is converted to grayscale, it common belief that the original cannot be fully restored, even with state-of-the-art colorization methods. In this paper, we propose an innovative method synthesize invertible grayscale. It grayscale can restore its color. The key idea here encode information into synthesized in way users recognize any anomalies. We learn and embed color-encoding scheme via convolutional neural network (CNN). consists of encoding convert decoding invert then...
This paper presents the idea ofmono-nizingbinocular videos and a frame-work to effectively realize it. Mono-nize means we purposely convert abinocular video into regular monocular with stereo informationimplicitly encoded in visual but nearly-imperceptible form. Hence, wecan impartially distribute show mononized as an ordinarymonocular video. Unlike ordinary videos, can restore from itthe original binocular it on stereoscopic display. To start,we formulate encoding-and-decoding framework...
Accurate Story visualization requires several necessary elements, such as identity consistency across frames, the alignment between plain text and visual content, a reasonable layout of objects in images. Most previous works endeavor to meet these requirements by fitting text-to-image (T2I) model on set videos same style with characters, e.g., FlintstonesSV dataset. However, learned T2I models typically struggle adapt new scenes, styles, often lack flexibility revise synthesized This paper...
We introduce ToonCrafter, a novel approach that transcends traditional correspondence-based cartoon video interpolation, paving the way for generative interpolation. Traditional methods, implicitly assume linear motion and absence of complicated phenomena like dis-occlusion, often struggle with exaggerated non-linear large motions occlusion commonly found in cartoons, resulting implausible or even failed interpolation results. To overcome these limitations, we explore potential adapting...
Remote photoplethysmography (rPPG) aims to measure non-contact physiological signals from facial videos, which has shown great potential in many applications. Most existing methods directly extract video-based rPPG features by designing neural networks for heart rate estimation. Although they can achieve acceptable results, the recovery of signal faces intractable challenges when interference real-world scenarios takes place on video. Specifically, videos are inevitably affected...
Color consistency correction is a challenging problem in image stitching, because it matters several factors, including tone, contrast and fidelity, to present natural appearance. In this paper, we propose an effective color method which feasible optimize the across images guarantee imaging quality of individual meanwhile. Our first apply well-directed alteration detection algorithms find coherent-content regions inter-image overlaps where reliable correspondences are extracted. Then,...
Manga inpainting fills up the disoccluded pixels due to removal of dialogue balloons or "sound effect" text. This process is long needed by industry for language localization and conversion animated manga. It mostly done manually, as existing methods (mostly natural image inpainting) cannot produce satisfying results. more tricky than because its highly abstract illustration using structural lines screentone patterns, which confuses semantic interpretation visual content synthesis. In this...