Menghan Xia

ORCID: 0000-0001-9664-4967
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Vision and Imaging
  • Image Enhancement Techniques
  • Generative Adversarial Networks and Image Synthesis
  • Advanced Image and Video Retrieval Techniques
  • Advanced Image Processing Techniques
  • Video Analysis and Summarization
  • Human Motion and Animation
  • Computer Graphics and Visualization Techniques
  • Robotics and Sensor-Based Localization
  • Advanced Image Fusion Techniques
  • Color Science and Applications
  • Image and Signal Denoising Methods
  • Image Processing Techniques and Applications
  • Visual Attention and Saliency Detection
  • Face recognition and analysis
  • Music and Audio Processing
  • Advanced Neural Network Applications
  • Color perception and design
  • Video Coding and Compression Technologies
  • Speech and Audio Processing
  • Industrial Vision Systems and Defect Detection
  • Optical measurement and interference techniques
  • Human Pose and Action Recognition
  • Multimodal Machine Learning Applications
  • Infrared Target Detection Methodologies

Tencent (China)
2022-2025

Wuhan Institute of Technology
2023-2025

Kuaishou (China)
2025

Dalian University of Technology
2024

Anqing Normal University
2023

Chinese University of Hong Kong
2018-2022

Renmin University of China
2022

Wuhan University
2015-2019

Central South University
2019

State Key Laboratory of Information Engineering in Surveying Mapping and Remote Sensing
2017-2019

In this paper, we propose a unified framework to generate pleasant and high-quality street-view panorama by stitching multiple panoramic images captured from the cameras mounted on mobile platform. Our proposed is comprised of four major steps: image warping, color correction, optimal seam line detection blending. Since input are without precisely common projection center scenes with depth differences respect different extents, such cannot be aligned in geometry. Therefore, an efficient...

10.3390/s17010001 article EN cc-by Sensors 2016-12-22

It is a classical task to automatically extract road networks from very high-resolution (VHR) images in remote sensing. This paper presents novel method for extracting VHR remotely sensed complex urban scenes. Inspired by image segmentation, edge detection, and object skeleton extraction, we develop multitask convolutional neural network (CNN), called RoadNet, simultaneously predict surfaces, edges, centerlines, which the first work such field. The RoadNet solves seven important issues this...

10.1109/tgrs.2018.2870871 article EN IEEE Transactions on Geoscience and Remote Sensing 2018-10-24

Speech-driven 3D facial animation has been widely studied, yet there is still a gap to achieving realism and vividness due the highly ill-posed nature scarcity of audio-visual data. Existing works typically formulate cross-modal mapping into regression task, which suffers from regression-to-mean problem leading over-smoothed motions. In this paper, we propose cast speech-driven as code query task in finite proxy space learned codebook, effectively promotes generated motions by reducing...

10.1109/cvpr52729.2023.01229 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

10.1109/cvpr52733.2024.00698 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

Creating a vivid video from the event or scenario in our imagination is truly fascinating experience. Recent advancements text-to-video synthesis have unveiled potential to achieve this with prompts only. While text convenient conveying overall scene context, it may be insufficient control precisely. In paper, we explore customized generation by utilizing as context description and motion structure (e.g. frame- wise depth) concrete guidance. Our method, dubbed Make-Your-Video, involves...

10.1109/tvcg.2024.3365804 article EN IEEE Transactions on Visualization and Computer Graphics 2024-01-01

In the image fusion mission, crucial task is to generate high-quality images for highlighting key objects while enhancing scenes be understood. To complete this and provide a powerful interpretability as well strong generalization ability in producing enjoyable results which are comfortable vision tasks (such detection their segmentation), we present novel interpretable decomposition scheme develop target-aware Taylor expansion approximation (T <sup...

10.1109/tcsvt.2024.3524794 article EN IEEE Transactions on Circuits and Systems for Video Technology 2025-01-01

We present VideoReTalking, a new system to edit the faces of real-world talking head video according input audio, producing high-quality and lip-syncing output even with different emotion. Our disentangles this objective into three sequential tasks: (1) face generation canonical expression; (2) audio-driven lip-sync; (3) enhancement for improving photo-realism. Given talking-head video, we first modify expression each frame same template using editing network, resulting in expression. This...

10.1145/3550469.3555399 article EN 2022-11-29

Image inpainting aims to fill the missing hole of input. It is hard solve this task efficiently when facing high-resolution images due two reasons: (1) Large reception field needs be handled for image inpainting. (2) The general encoder and decoder network synthesizes many background pixels synchronously form matrix. In paper, we try break above limitations first time thanks recent development continuous implicit representation. detail, down-sample encode degraded produce spatial-adaptive...

10.1609/aaai.v37i2.25263 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2023-06-26

Video generation has increasingly gained interest in both academia and industry. Although commercial tools can generate plausible videos, there is a limited number of open-source models available for researchers engineers. In this work, we introduce two diffusion high-quality video generation, namely text-to-video (T2V) image-to-video (I2V) models. T2V synthesize based on given text input, while I2V incorporate an additional image input. Our proposed model realistic cinematic-quality videos...

10.48550/arxiv.2310.19512 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Talent is an important strategic resource for regional economic development. Based on the background of “the talent war” that has broken out between various cities in recent years, this study empirically verified influence policy urban innovation 277 prefecture-level China from 2010 to 2019 using multi-period difference-in-differences model. The results indicated caused by positively influenced innovation, causing, instance, a dramatic increase number patents inventions. Among subsidy...

10.3390/land11091485 article EN cc-by Land 2022-09-05

Fisheye image rectification and estimation of intrinsic parameters for real scenes have been addressed in the literature by using line information on distorted images. In this paper, we propose an easily implemented fisheye algorithm with constrains undistorted perspective plane. A novel Multi-Label Energy Optimization (MLEO) method is adopted to merge short circular arcs sharing same or approximately select long camera rectification. Further efficient estimate automatically selecting three...

10.1109/cvpr.2015.7299041 article EN 2015-06-01

Colorization is multimodal by nature and challenges existing frameworks to achieve colorful structurally consistent results. Even the sophisticated autoregressive model struggles maintain long-distance color consistency due fragility of sequential dependence. To overcome this challenge, we propose a novel colorization framework that disentangles multimodality structure through global anchors, so both aspects could be learned effectively. Our key insight several carefully located anchors...

10.1145/3550454.3555432 article EN ACM Transactions on Graphics 2022-11-30

10.1016/j.isprsjprs.2017.11.012 article EN ISPRS Journal of Photogrammetry and Remote Sensing 2017-11-24

Once a color image is converted to grayscale, it common belief that the original cannot be fully restored, even with state-of-the-art colorization methods. In this paper, we propose an innovative method synthesize invertible grayscale. It grayscale can restore its color. The key idea here encode information into synthesized in way users recognize any anomalies. We learn and embed color-encoding scheme via convolutional neural network (CNN). consists of encoding convert decoding invert then...

10.1145/3272127.3275080 article EN ACM Transactions on Graphics 2018-11-28

This paper presents the idea ofmono-nizingbinocular videos and a frame-work to effectively realize it. Mono-nize means we purposely convert abinocular video into regular monocular with stereo informationimplicitly encoded in visual but nearly-imperceptible form. Hence, wecan impartially distribute show mononized as an ordinarymonocular video. Unlike ordinary videos, can restore from itthe original binocular it on stereoscopic display. To start,we formulate encoding-and-decoding framework...

10.1145/3414685.3417764 article EN ACM Transactions on Graphics 2020-11-27

Accurate Story visualization requires several necessary elements, such as identity consistency across frames, the alignment between plain text and visual content, a reasonable layout of objects in images. Most previous works endeavor to meet these requirements by fitting text-to-image (T2I) model on set videos same style with characters, e.g., FlintstonesSV dataset. However, learned T2I models typically struggle adapt new scenes, styles, often lack flexibility revise synthesized This paper...

10.1145/3610548.3618184 article EN cc-by 2023-12-10

We introduce ToonCrafter, a novel approach that transcends traditional correspondence-based cartoon video interpolation, paving the way for generative interpolation. Traditional methods, implicitly assume linear motion and absence of complicated phenomena like dis-occlusion, often struggle with exaggerated non-linear large motions occlusion commonly found in cartoons, resulting implausible or even failed interpolation results. To overcome these limitations, we explore potential adapting...

10.1145/3687761 article EN other-oa ACM Transactions on Graphics 2024-11-19

Remote photoplethysmography (rPPG) aims to measure non-contact physiological signals from facial videos, which has shown great potential in many applications. Most existing methods directly extract video-based rPPG features by designing neural networks for heart rate estimation. Although they can achieve acceptable results, the recovery of signal faces intractable challenges when interference real-world scenarios takes place on video. Specifically, videos are inevitably affected...

10.1109/jbhi.2025.3540134 article EN IEEE Journal of Biomedical and Health Informatics 2025-01-01

Color consistency correction is a challenging problem in image stitching, because it matters several factors, including tone, contrast and fidelity, to present natural appearance. In this paper, we propose an effective color method which feasible optimize the across images guarantee imaging quality of individual meanwhile. Our first apply well-directed alteration detection algorithms find coherent-content regions inter-image overlaps where reliable correspondences are extracted. Then,...

10.1109/iccvw.2017.351 article EN 2017-10-01

Manga inpainting fills up the disoccluded pixels due to removal of dialogue balloons or "sound effect" text. This process is long needed by industry for language localization and conversion animated manga. It mostly done manually, as existing methods (mostly natural image inpainting) cannot produce satisfying results. more tricky than because its highly abstract illustration using structural lines screentone patterns, which confuses semantic interpretation visual content synthesis. In this...

10.1145/3450626.3459822 article EN ACM Transactions on Graphics 2021-07-19
Coming Soon ...