- 3D Shape Modeling and Analysis
- Generative Adversarial Networks and Image Synthesis
- Computer Graphics and Visualization Techniques
- Advanced Vision and Imaging
- Human Motion and Animation
- Face recognition and analysis
- Image Enhancement Techniques
- Human Pose and Action Recognition
- Advanced Image Processing Techniques
- Image Retrieval and Classification Techniques
- 3D Surveying and Cultural Heritage
- Visual Attention and Saliency Detection
- Image Processing and 3D Reconstruction
- Advanced Image and Video Retrieval Techniques
- Video Analysis and Summarization
- Advanced Neural Network Applications
- Image Processing Techniques and Applications
- Multimodal Machine Learning Applications
- Domain Adaptation and Few-Shot Learning
- Music and Audio Processing
- Music Technology and Sound Studies
- Hand Gesture Recognition Systems
- Optical measurement and interference techniques
- Aesthetic Perception and Analysis
- Olfactory and Sensory Function Studies
OriginWater (China)
2020-2024
Kuaishou (China)
2018-2024
Beijing University of Chemical Technology
2024
Sichuan University
2024
Nanjing University of Science and Technology
2024
City University of Hong Kong
2024
Tianjin University
2024
Zhengzhou University
2023
Xinxiang Medical University
2023
Beijing University of Civil Engineering and Architecture
2023
Object detection has achieved remarkable progress in the past decade. However, of oriented and densely packed objects remains challenging because following inherent reasons: (1) receptive fields neurons are all axis-aligned same shape, whereas usually diverse shapes align along various directions; (2) models typically trained with generic knowledge may not generalize well to handle specific at test time; (3) limited dataset hinders development on this task. To resolve first two issues, we...
We introduce a new silhouette-based representation for modeling clothed human bodies using deep generative models. Our method can reconstruct complete and textured 3D model of person wearing clothes from single input picture. Inspired by the visual hull algorithm, our implicit uses 2D silhouettes joints body pose to describe immense shape complexity variations people. Given segmented silhouette its inferred picture, we first synthesize consistent novel view points around subject. The...
The goal of image style transfer is to render an with artistic features guided by a reference while maintaining the original content. Owing locality in convolutional neural networks (CNNs), extracting and global information input images difficult. Therefore, traditional methods face biased content representation. To address this critical issue, we take long-range dependencies into account for proposing transformer-based approach called StyTr2. In contrast visual transformers other vision...
The artistic style within a painting is the means of expression, which includes not only material, colors, and brushstrokes, but also high-level attributes, including semantic elements object shapes. Previous arbitrary example-guided image generation methods often fail to control shape changes or convey elements. Pre-trained text-to-image synthesis diffusion probabilistic models have achieved remarkable quality require extensive textual descriptions accurately portray attributes particular...
In this work, we tackle the challenging problem of arbitrary image style transfer using a novel feature representation learning method. A suitable representation, as key component in stylization tasks, is essential to achieve satisfactory results. Existing deep neural network based approaches reasonable results with guidance from second-order statistics such Gram matrix content features. However, they do not leverage sufficient information, which artifacts local distortions and...
Video style transfer is attracting increasing attention from the artificial intelligence community because of its numerous applications, such as augmented reality and animation production. Relative to traditional image transfer, video presents new challenges, including how effectively generate satisfactory stylized results for any specified while maintaining temporal coherence across frames. Towards this end, we propose a Multi-Channel Correlation network (MCCNet), which can be trained fuse...
There are currently no solutions for enabling direct face-to-face interaction between virtual reality (VR) users wearing head-mounted displays (HMDs). The main challenge is that the headset obstructs a significant portion of user's face, preventing effective facial capture with traditional techniques. To advance as next-generation communication platform, we develop novel HMD enables 3D performance-driven animation in real-time. Our wearable system uses ultra-thin flexible electronic...
We introduce a realtime facial tracking system specifically designed for performance capture in unconstrained settings using consumer-level RGB-D sensor. Our framework provides uninterrupted 3D tracking, even the presence of extreme occlusions such as those caused by hair, hand-to-face gestures, and wearable accessories. Anyone's face can be instantly tracked users switched without an extra calibration step. During we explicitly segment regions from any occluding parts detecting outliers...
Human hair presents highly convoluted structures and spans an extraordinarily wide range of hairstyles, which is essential for the digitization compelling virtual avatars but also one most challenging to create. Cutting-edge modeling techniques typically rely on expensive capture devices significant manual labor. We introduce a novel data-driven framework that can digitize complete complex 3D hairstyles from single-view photograph. first construct large database manually crafted models...
Recent advances in single-view 3D hair digitization have made the creation of high-quality CG characters scalable and accessible to end-users, enabling new forms personalized VR gaming experiences. To handle complexity variety structures, most cutting-edge techniques rely on successful retrieval a particular model from comprehensive database. Not only are aforementioned data-driven methods storage intensive, but they also prone failure for highly unconstrained input images, complicated...
Aggregation structures with explicit information, such as image attributes and scene semantics, are effective popular for intelligent systems assessing aesthetics of visual data. However, useful information may not be available due to the high cost manual annotation expert design. In this paper, we present a novel multi-patch (MP) aggregation method aesthetic assessment. Different from state-of-the-art methods, which augment an MP network various attributes, train model in end-to-end manner...
We present a deep generative scene modeling technique for indoor environments. Our goal is to train model using feed-forward neural network that maps prior distribution (e.g., normal distribution) the of primary objects in scenes. introduce 3D object arrangement representation models locations and orientations objects, based on their size shape attributes. Moreover, our applicable with different multiplicities (repetition counts), selected from database. show principled way this by combining...
Recent years have witnessed significant progress in 3D hand mesh recovery. Nevertheless, because of the intrinsic 2D-to-3D ambiguity, recovering camera-space information from a single RGB image remains challenging. To tackle this problem, we divide recovery into two sub-tasks, i.e., root-relative and root First, joint landmarks silhouette are extracted input to provide 2D cues for tasks. In task, exploit semantic relations among joints generate cues. Such generated coordinates expressed...
In this work, we propose a framework for singleview hand mesh reconstruction, which can simultaneously achieve high reconstruction accuracy, fast inference speed, and temporal coherence. Specifically, 2D encoding, lightweight yet effective stacked structures. Regarding 3D decoding, provide an efficient graph operator, namely depth-separable spiral convolution. Moreover, present novel feature lifting module bridging the gap between representations. This begins with map-based position...
Personalizing generative models offers a way to guide image generation with user-provided references. Current personalization methods can invert an object or concept into the textual conditioning space and compose new natural sentences for text-to-image diffusion models. However, representing editing specific visual attributes such as material, style, layout remains challenge, leading lack of disentanglement editability. To address this problem, we propose novel approach that leverages...
Despite the impressive results of arbitrary image-guided style transfer methods, text-driven image stylization has recently been proposed for transferring a natural into stylized one according to textual descriptions target provided by user. Unlike previous image-to-image approaches, text-guided progress provides users with more precise and intuitive way express desired style. However, huge discrepancy between cross-modal inputs/outputs makes it challenging conduct in typical feed-forward...
We introduce a data-driven hair capture framework based on example strands generated through simulation. Our method can robustly reconstruct faithful 3D models from unprocessed input point clouds with large amounts of outliers. Current state-of-the-art techniques use geometrically-inspired heuristics to derive global strand structures, which yield implausible for hairstyles involving occlusions, multiple layers, or wisps varying lengths. address this problem using voting-based fitting...
Arbitrary image stylization by neural networks has become a popular topic, and video is attracting more attention as an extension of stylization. However, when methods are applied to videos, unsatisfactory results that suffer from severe flickering effects appear. In this article, we conducted detailed comprehensive analysis the cause such effects. Systematic comparisons among typical style transfer approaches show feature migration modules for state-of-the-art (SOTA) learning systems...
A variety of phenomena can be characterized by repetitive small scale elements within a large domain. Examples include stack fresh produce, plate spaghetti, or mosaic pattern. Although certain results produced via manual placement procedural/physical simulation, these methods labor intensive, difficult to control, limited specific phenomena.
A variety of phenomena can be characterized by repetitive small scale elements within a large domain. Examples include stack fresh produce, plate spaghetti, or mosaic pattern. Although certain results produced via manual placement procedural/physical simulation, these methods labor intensive, difficult to control, limited specific phenomena. We present discrete element textures, data-driven method for synthesizing according input exemplar and output Our preserves both individual properties...
Taking a satisfactory picture in low-light environment remains challenging problem. Low-light imaging mainly suffers from noise due to the low signal-to-noise ratio. Many methods have been proposed for task of image denoising, but they fail work under extremely conditions. Recently, deep learning based approaches presented that higher objective quality than traditional methods, usually high computational cost which makes them impractical use real-time applications or where processing power...
This work presents Unified Contrastive Arbitrary Style Transfer (UCAST), a novel style representation learning and transfer framework, that can fit in most existing arbitrary image models, such as CNN-based, ViT-based, flow-based methods. As the key component tasks, suitable is essential to achieve satisfactory results. Existing approaches based on deep neural networks typically use second-order statistics generate output. However, these hand-crafted features computed from single cannot...
In this work, we tackle the challenging problem of learning-based single-view 3D hair modeling. Due to great difficulty collecting paired real image and data, using synthetic data provide prior knowledge for domain becomes a leading solution. This unfortunately introduces challenge gap. inherent realistic rendering, existing methods typically use orientation maps instead images as input bridge We firmly think an intermediate representation is essential, but argue that map dominant...