- Multimodal Machine Learning Applications
- 3D Shape Modeling and Analysis
- Computer Graphics and Visualization Techniques
- Human Pose and Action Recognition
- Domain Adaptation and Few-Shot Learning
- Topic Modeling
- Research studies in Vietnam
- Human Motion and Animation
- Speech Recognition and Synthesis
- Advanced Sensor and Control Systems
- Image Processing and 3D Reconstruction
- Advanced Neural Network Applications
- Advanced Image and Video Retrieval Techniques
- Advanced Text Analysis Techniques
- Analysis of environmental and stochastic processes
- Advanced Algorithms and Applications
- Advanced Decision-Making Techniques
- Natural Language Processing Techniques
Xiamen University
2023-2025
Ministry of Education of the People's Republic of China
2025
Tencent (China)
2024
Text-driven 3D stylization is a complex and crucial task in the fields of computer vision (CV) graphics (CG), aimed at transforming bare mesh to fit tar-get text. Prior methods adopt text-independent multilayer perceptrons (MLPs) predict attributes target with supervision CLIP loss. However, such architecture lacks textual guidance during predicting attributes, thus leading unsatisfactory slow convergence. To address these limitations, we present X-Mesh, an innovative text-driven framework...
In 3D Referring Expression Segmentation (3D-RES), the earlier approach adopts a two-stage paradigm, extracting segmentation proposals and then matching them with referring expressions. However, this conventional paradigm encounters significant challenges, most notably in terms of generation lackluster initial pronounced deceleration inference speed. Recognizing these limitations, we introduce an innovative end-to-end Superpoint-Text Matching Network (3D-STMN) that is enriched by...
In recent times, automatic text-to-3D content creation has made significant progress, driven by the development of pretrained 2D diffusion models. Existing methods typically optimize 3D representation to ensure that rendered image aligns well with given text, as evaluated model. Nevertheless, a substantial domain gap exists between images and assets, primarily attributed variations in camera-related attributes exclusive presence foreground objects. Consequently, employing models directly for...
In recent years, 3D understanding has turned to 2D vision-language pre-trained models overcome data scarcity challenges. However, existing methods simply transfer alignment strategies, aligning representations with single-view images and coarse-grained parent category text. These approaches introduce information degradation insufficient synergy issues, leading performance loss. Information arises from overlooking the fact that a representation should be equivalent series of multi-view more...
Despite considerable progress, the advancement of Panoptic Narrative Grounding (PNG) remains hindered by costly annotations. In this paper, we introduce a novel Semi-Supervised (SS-PNG) learning scheme, capitalizing on smaller set labeled image-text pairs and larger unlabeled to achieve competitive performance. Unlike visual segmentation tasks, PNG involves one pixel belonging multiple open-ended nouns. As result, existing multi-class based semi-supervised frameworks cannot be directly...
Recently, diffusion models have increasingly demonstrated their capabilities in vision understanding. By leveraging prompt-based learning to construct sentences, these shown proficiency classification and visual grounding tasks. However, existing approaches primarily showcase ability perform sentence-level localization, leaving the potential for contextual information phrase-level understanding largely unexplored. In this paper, we utilize Panoptic Narrative Grounding (PNG) as a proxy task...
3D Referring Expression Segmentation (3D-RES) aims to segment objects by correlating referring expressions with point clouds. However, traditional approaches frequently encounter issues like over-segmentation or mis-segmentation, due insufficient emphasis on spatial information of instances. In this paper, we introduce a Rule-Guided Spatial Awareness Network (RG-SAN) utilizing solely the target instance for supervision. This approach enables network accurately depict relationships among all...
The rising importance of 3D representation learning, pivotal in computer vision, autonomous driving, and robotics, is evident. However, a prevailing trend, which straightforwardly resorted to transferring 2D alignment strategies the domain, encounters three distinct challenges: (1) Information Degradation: This arises from data with mere single-view images generic texts, neglecting need for multi-view detailed subcategory texts. (2) Insufficient Synergy: These align representations image...
Hitherto, data points are extremely popular in many difference branches of natural science. For this purpose, is a mathematical article mainly focuses on how to separate or categorize efficiently based their characteristics. The methods the following. linear separable data, most fundamental Support Vector Machine(SVM) model can be used, while for non-linear slack variables and kernel tricks two efficient techniques. To test whether there significant among kernels, Gaussian polynomial chosen....