Haohan Weng

ORCID: 0000-0003-4954-4546
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Computer Graphics and Visualization Techniques
  • 3D Shape Modeling and Analysis
  • Generative Adversarial Networks and Image Synthesis
  • Advanced Vision and Imaging
  • Computational Geometry and Mesh Generation
  • Handwritten Text Recognition Techniques
  • Advanced Neural Network Applications
  • Image Processing and 3D Reconstruction
  • Domain Adaptation and Few-Shot Learning
  • Image Processing Techniques and Applications
  • Video Analysis and Summarization
  • Advanced Image Processing Techniques
  • Neural Networks and Applications
  • Advanced Multi-Objective Optimization Algorithms
  • Model-Driven Software Engineering Techniques
  • Multimodal Machine Learning Applications

South China University of Technology
2023-2025

We present Hunyuan3D 2.0, an advanced large-scale 3D synthesis system for generating high-resolution textured assets. This includes two foundation components: a shape generation model -- Hunyuan3D-DiT, and texture Hunyuan3D-Paint. The generative model, built on scalable flow-based diffusion transformer, aims to create geometry that properly aligns with given condition image, laying solid downstream applications. benefiting from strong geometric priors, produces vibrant maps either generated...

10.48550/arxiv.2501.12202 preprint EN arXiv (Cornell University) 2025-01-21

Triangle meshes are fundamental to 3D applications, enabling efficient modification and rasterization while maintaining compatibility with standard rendering pipelines. However, current automatic mesh generation methods typically rely on intermediate representations that lack the continuous surface quality inherent meshes. Converting these into produces dense, suboptimal outputs. Although recent autoregressive approaches demonstrate promise in directly modeling vertices faces, they...

10.48550/arxiv.2501.14317 preprint EN arXiv (Cornell University) 2025-01-24

Large image diffusion models enable novel view synthesis with high quality and excellent zero-shot capability. However, such based on image-to-image translation have no guarantee of consistency, limiting the performance for downstream tasks like 3D reconstruction image-to-3D generation. To empower we propose Consistent123 to synthesize views simultaneously by incorporating additional cross-view attention layers shared self-attention mechanism. The proposed mechanism improves interaction...

10.48550/arxiv.2310.08092 preprint EN cc-by arXiv (Cornell University) 2023-01-01

In the process of graphic layout generation, user specifications including element attributes and their relationships are commonly used to constrain layouts (e.g.,"put image above button''). It is natural encode spatial constraints between elements using a graph. This paper presents two-stage generation framework: graph generator subsequent decoder which conditioned on previous output Training two highly dependent networks separately as in work, we observe that generates out-of-distribution...

10.24963/ijcai.2023/649 article EN 2023-08-01

Image outpainting is a challenge for image processing since it needs to produce big scenery from few patches. In general, two-stage frameworks are utilized unpack complex tasks and complete them step-by-step. However, the time consumption caused by training two networks will hinder method adequately optimizing parameters of with limited iterations. this article, broad generative network (BG-Net) proposed. As reconstruction in first stage, can be quickly trained utilizing ridge regression...

10.1109/tnnls.2023.3264617 article EN IEEE Transactions on Neural Networks and Learning Systems 2023-05-23

Templates serve as a good starting point to implement design (e.g., banner, slide) but it takes great effort from designers manually create. In this paper, we present Desigen, an automatic template creation pipeline which generates background images well harmonious layout elements over the background. Different natural images, image should preserve enough non-salient space for overlaying elements. To equip existing advanced diffusion-based models with stronger spatial control, propose two...

10.48550/arxiv.2403.09093 preprint EN arXiv (Cornell University) 2024-03-14

Generating compact and sharply detailed 3D meshes poses a significant challenge for current generative models. Different from extracting dense neural representation, some recent works try to model the native mesh distribution (i.e., set of triangles), which generates more results as humans crafted. However, due complexity variety topology, these methods are typically limited small datasets with specific categories hard extend. In this paper, we introduce generic scalable generation framework...

10.48550/arxiv.2405.16890 preprint EN arXiv (Cornell University) 2024-05-27

10.1109/cvpr52733.2024.01209 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

We propose a compressive yet effective mesh representation, Blocked and Patchified Tokenization (BPT), facilitating the generation of meshes exceeding 8k faces. BPT compresses sequences by employing block-wise indexing patch aggregation, reducing their length approximately 75\% compared to original sequences. This compression milestone unlocks potential utilize data with significantly more faces, thereby enhancing detail richness improving robustness. Empowered BPT, we have built foundation...

10.48550/arxiv.2411.07025 preprint EN arXiv (Cornell University) 2024-11-11

10.1109/smc54092.2024.10831051 article EN 2022 IEEE International Conference on Systems, Man, and Cybernetics (SMC) 2024-10-06

Convolutional Neural Networks (CNNs) have exhibited their great power in a variety of vision tasks. However, the lack transform-invariant property limits further applications complicated real-world scenarios. In this work, we proposed novel generalized one dimension convolutional operator (OneDConv), which dynamically transforms convolution kernels based on input features computationally and parametrically efficient manner. The can extract naturally. It improves robustness generalization...

10.48550/arxiv.2201.05781 preprint EN other-oa arXiv (Cornell University) 2022-01-01
Coming Soon ...