NFDI4DS | UHH-SEMS - Publication Details

Song–Hai Zhang

ORCID: 0000-0003-0460-1586

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5049883689

Research Areas

Advanced Vision and Imaging
Computer Graphics and Visualization Techniques
3D Shape Modeling and Analysis
Advanced Image and Video Retrieval Techniques
Virtual Reality Applications and Impacts
Generative Adversarial Networks and Image Synthesis
Advanced Neural Network Applications
Visual Attention and Saliency Detection
Advanced Image Processing Techniques
3D Surveying and Cultural Heritage
Video Analysis and Summarization
Image Retrieval and Classification Techniques
Video Surveillance and Tracking Methods
Human Motion and Animation
Human Pose and Action Recognition
Robotics and Sensor-Based Localization
Image Enhancement Techniques
Face recognition and analysis
Evacuation and Crowd Dynamics
Remote Sensing and LiDAR Applications
Advanced Optical Imaging Technologies
Optical measurement and interference techniques
Image Processing and 3D Reconstruction
Image and Video Quality Assessment
Image Processing Techniques and Applications

Tsinghua University
2015-2024

Qinghai University
2024

Jiangnan University
2024

Bridge University
2024

First Affiliated Hospital of Henan University of Science and Technology
2022-2024

National Engineering Research Center for Information Technology in Agriculture
2022

Xian Yang Central Hospital
2021

Henan Psychiatric Hospital
2021

Nanchang Institute of Technology
2019

Sichuan University
2018

Attention mechanisms in computer vision: A survey

OPENALEX - Publications

Meng-Hao Guo Tian-Xing Xu Jiangjiang Liu Zheng-Ning Liu Peng-Tao Jiang and 5 more

Humans can naturally and effectively find salient regions in complex scenes. Motivated by this observation, attention mechanisms were introduced into computer vision with the aim of imitating aspect human visual system. Such an mechanism be regarded as a dynamic weight adjustment process based on features input image. Attention have achieved great success many tasks, including image classification, object detection, semantic segmentation, video understanding, generation, 3D vision,...

10.1007/s41095-022-0271-y article EN cc-by Computational Visual Media 2022-03-15

Traffic-Sign Detection and Classification in the Wild

OPENALEX - Publications

Zhe Zhu Dun Liang Song–Hai Zhang Xiaolei Huang Baoli Li and 1 more

Although promising results have been achieved in the areas of traffic-sign detection and classification, few works provided simultaneous solutions to these two tasks for realistic real world images. We make contributions this problem. Firstly, we created a large benchmark from 100000 Tencent Street View panoramas, going beyond previous benchmarks. It provides images containing 30000 instances. These cover variations illuminance weather conditions. Each is annotated with class label, its...

10.1109/cvpr.2016.232 article EN 2016-06-01

Pose2Seg: Detection Free Human Instance Segmentation

OPENALEX - Publications

Song–Hai Zhang Ruilong Li Xin Dong Paul L. Rosin Zixi Cai and 4 more

The standard approach to image instance segmentation is perform the object detection first, and then segment from bounding-box. More recently, deep learning methods like Mask R-CNN them jointly. However, little research takes into account uniqueness of "human" category, which can be well defined by pose skeleton. Moreover, human skeleton used better distinguish instances with heavy occlusion than using bounding-boxes. In this paper, we present a brand new pose-based framework for humans...

10.1109/cvpr.2019.00098 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

NeRFReN: Neural Radiance Fields with Reflections

OPENALEX - Publications

Yuan-Chen Guo Di Kang Linchao Bao Yu He Song–Hai Zhang

Neural Radiance Fields (NeRF) has achieved unprece-dented view synthesis quality using coordinate-based neu-ral scene representations. However, NeRF's depen-dency can only handle simple reflections like highlights but cannot deal with complex such as those from glass and mirrors. In these scenarios, NeRF models the virtual image real geometries which leads to inaccurate depth estimation, produces blurry renderings when multi-view consistency is violated reflected objects may be seen under...

10.1109/cvpr52688.2022.01786 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

Triplane Meets Gaussian Splatting: Fast and Generalizable Single-View 3D Reconstruction with Transformers

OPENALEX - Publications

Zi–Xin Zou Zhipeng Yu Yuan–Chen Guo Yangguang Li Liang Ding and 2 more

10.1109/cvpr52733.2024.00983 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

Deep Online Video Stabilization With Multi-Grid Warping Transformation Learning

OPENALEX - Publications

Miao Wang Guo-Ye Yang Jin-Kun Lin Song–Hai Zhang Ariel Shamir and 2 more

Video stabilization techniques are essential for most hand-held captured videos due to high-frequency shakes. Several 2D-, 2.5D-, and 3D-based have been presented previously, but the best of our knowledge, no solutions based on deep neural networks had proposed date. The main reason this omission is shortage in training data as well challenge modeling problem using networks. In paper, we present a video technique convolutional network. Previous works usually propose an off-line algorithm...

10.1109/tip.2018.2884280 article EN IEEE Transactions on Image Processing 2018-11-30

Example-Guided Style-Consistent Image Synthesis From Semantic Labeling

OPENALEX - Publications

Miao Wang Guo-Ye Yang Ruilong Li Runze Liang Song–Hai Zhang and 2 more

Example-guided image synthesis aims to synthesize an from a semantic label map and exemplary indicating style. We use the term "style" in this problem refer implicit characteristics of images, for example: portraits includes gender, racial identity, age, hairstyle; full body pictures it clothing; street scenes refers weather time day such like. A these cases indicates facial expression, pose, or scene segmentation. propose solution example-guided using conditional generative adversarial...

10.1109/cvpr.2019.00159 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

Traffic signal detection and classification in street views using an attention model

OPENALEX - Publications

Yifan Lu Jiaming Lu Song–Hai Zhang Peter Hall

Abstract Detecting small objects is a challenging task. We focus on special case: the detection and classification of traffic signals in street views. present novel framework that utilizes visual attention model to make more efficient, without loss accuracy, which generalizes. The designed generate set candidate regions at suitable scale so targets can be better located classified. In order evaluate our method context signal detection, we have built light benchmark with over 15,000...

10.1007/s41095-018-0116-x article EN cc-by Computational Visual Media 2018-08-04

NeRF-SR: High Quality Neural Radiance Fields using Supersampling

OPENALEX - Publications

Chen Wang Xian Wu Yuan-Chen Guo Song–Hai Zhang Yu‐Wing Tai and 1 more

We present NeRF-SR, a solution for high-resolution (HR) novel view synthesis with mostly low-resolution (LR) inputs. Our method is built upon Neural Radiance Fields (NeRF) that predicts per-point density and color multi-layer perceptron. While producing images at arbitrary scales, NeRF struggles resolutions go beyond observed images. key insight benefits from 3D consistency, which means an pixel absorbs information nearby views. first exploit it by supersampling strategy shoots multiple rays...

10.1145/3503161.3547808 article EN Proceedings of the 30th ACM International Conference on Multimedia 2022-10-10

Sketch2Model: View-Aware 3D Modeling from Single Free-Hand Sketches

OPENALEX - Publications

Song–Hai Zhang Yuan-Chen Guo Qing-Wen Gu

We investigate the problem of generating 3D meshes from single free-hand sketches, aiming at fast modeling for novice users. It can be regarded as a single-view reconstruction problem, but with unique challenges, brought by variation and conciseness sketches. Ambiguities in poorly-drawn sketches could make it hard to determine how sketched object is posed. In this paper, we address importance viewpoint specification overcoming such ambiguities, propose novel view-aware generation approach....

10.1109/cvpr46437.2021.00595 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

Sparse3D: Distilling Multiview-Consistent Diffusion for Object Reconstruction from Sparse Views

OPENALEX - Publications

Zi–Xin Zou Weihao Cheng Yan‐Pei Cao Shi-Sheng Huang Ying Shan and 1 more

Reconstructing 3D objects from extremely sparse views is a long-standing and challenging problem. While recent techniques employ image diffusion models for generating plausible images at novel viewpoints or distilling pre-trained priors into representations using score distillation sampling (SDS), these methods often struggle to simultaneously achieve high-quality, consistent, detailed results both novel-view synthesis (NVS) geometry. In this work, we present Sparse3D, reconstruction method...

10.1609/aaai.v38i7.28626 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2024-03-24

Strategies for reducing motion sickness in virtual reality through improved handheld controller movements

OPENALEX - Publications

Khang Yeu Tang Ge Yu Juhong Wang Yu He Sen‐Zhe Xu and 1 more

10.1016/j.gmod.2025.101254 article EN Graphical Models 2025-01-28

Vectorizing Cartoon Animations

OPENALEX - Publications

Song–Hai Zhang Tao Chen Yifei Zhang Shi-Min Hu Ralph R. Martin

We present a system for vectorizing 2D raster format cartoon animations. The output animations are visually flicker free, smaller in file size, and easy to edit. identify decorative lines separately from colored regions. use an accurate semantically meaningful image decomposition algorithm, supporting arbitrary color model each region. To ensure temporal coherence the output, we reconstruct universal background all frames extract foreground Simple user-assistance is required complete...

10.1109/tvcg.2009.9 article EN IEEE Transactions on Visualization and Computer Graphics 2009-01-16

Intelligent Visual Media Processing: When Graphics Meets Vision

OPENALEX - Publications

Ming‐Ming Cheng Qibin Hou Song–Hai Zhang Paul L. Rosin

10.1007/s11390-017-1681-7 article EN Journal of Computer Science and Technology 2017-01-01

ChoreoMaster

OPENALEX - Publications

Kang Chen Zhipeng Tan Lei Jin Song–Hai Zhang Yuan-Chen Guo and 2 more

Despite strong demand in the game and film industry, automatically synthesizing high-quality dance motions remains a challenging task. In this paper, we present ChoreoMaster, production-ready music-driven motion synthesis system. Given piece of music, ChoreoMaster can generate sequence to accompany input music terms style, rhythm structure. To achieve goal, introduce novel choreography-oriented choreomusical embedding framework, which successfully constructs unified space for both style...

10.1145/3450626.3459932 article EN ACM Transactions on Graphics 2021-07-19

TransLoc3D: point cloud based large-scale place recognition using adaptive receptive fields

OPENALEX - Publications

Tian-Xing Xu Yuan-Chen Guo Zhiqiang Li Ge Yu Yu‐Kun Lai and 1 more

Place recognition plays an essential role in the field of autonomous driving and robot navigation. Although a number point cloud based methods have been proposed achieved promising results, few them take size difference objects into consideration. For small like pedestrians vehicles, large receptive fields will capture unrelated information, while would fail to encode complete geometric information for such as buildings. We argue that fixed are not well suited place recognition, propose...

10.4310/cis.2023.v23.n1.a3 article EN Communications in Information and Systems 2023-01-01

CXTrack: Improving 3D Point Cloud Tracking with Contextual Information

OPENALEX - Publications

Tian-Xing Xu Yuan-Chen Guo Yu‐Kun Lai Song–Hai Zhang

3D single object tracking plays an essential role in many applications, such as autonomous driving. It remains a challenging problem due to the large appearance variation and sparsity of points caused by occlusion lim-ited sensor capabilities. Therefore, contextual information across two consecutive frames is crucial for effective tracking. However, containing useful are often overlooked cropped out existing methods, leading insufficient use important knowledge. To address this issue, we...

10.1109/cvpr52729.2023.00111 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Foreword to Chinagraph 2024 Special Section

OPENALEX - Publications

Kai Xu Song–Hai Zhang Juyong Zhang

10.1016/j.cag.2025.104183 article EN Computers & Graphics 2025-03-01

Multi-Task Learning for Food Identification and Analysis with Deep Convolutional Neural Networks

OPENALEX - Publications

Xijin Zhang Yifan Lu Song–Hai Zhang

10.1007/s11390-016-1642-6 article EN Journal of Computer Science and Technology 2016-05-01

PortraitNet: Real-time portrait segmentation network for mobile device

OPENALEX - Publications

Song–Hai Zhang Xin Dong Hui Li Ruilong Li Yong‐Liang Yang

10.1016/j.cag.2019.03.007 article EN Computers & Graphics 2019-04-04

Multi-User Redirected Walking in Separate Physical Spaces for Online VR Scenarios

OPENALEX - Publications

Sen‐Zhe Xu Jiahong Liu Miao Wang Fang‐Lue Zhang Song–Hai Zhang

With the recent rise of Metaverse, online multiplayer VR applications are becoming increasingly prevalent worldwide. However, as multiple users located in different physical environments, reset frequencies and timings can lead to serious fairness issues for collaborative/competitive applications. For apps/games, an ideal RDW strategy must make locomotion opportunities equal, regardless environment layouts. The existing methods lack scheme coordinate PEs, thus have issue triggering too many...

10.1109/tvcg.2023.3251648 article EN IEEE Transactions on Visualization and Computer Graphics 2023-03-02

Wonder3D: Single Image to 3D using Cross-Domain Diffusion

OPENALEX - Publications

Xiaoxiao Long Yuan-Chen Guo Cheng Lin Yuan Liu Zhiyang Dou and 6 more

In this work, we introduce Wonder3D, a novel method for efficiently generating high-fidelity textured meshes from single-view images.Recent methods based on Score Distillation Sampling (SDS) have shown the potential to recover 3D geometry 2D diffusion priors, but they typically suffer time-consuming per-shape optimization and inconsistent geometry. contrast, certain works directly produce information via fast network inferences, their results are often of low quality lack geometric details....

10.48550/arxiv.2310.15008 preprint EN cc-by-nc-nd arXiv (Cornell University) 2023-01-01

Knowledge graph construction with structure and parameter learning for indoor scene design

OPENALEX - Publications

Yuan Liang Fei Xu Song–Hai Zhang Yu‐Kun Lai Tai‐Jiang Mu

We consider the problem of learning a representation both spatial relations and dependencies between objects for indoor scene design. propose novel knowledge graph framework based on entity-relation model facts in design, further develop weaklysupervised algorithm extracting from small dataset using structure parameter learning. The proposed is flexible, transferable, readable. present variety computer-aided design applications this representation, to show usefulness robustness framework.

10.1007/s41095-018-0110-3 article EN cc-by Computational Visual Media 2018-03-21

DualVector: Unsupervised Vector Font Synthesis with Dual-Part Representation

OPENALEX - Publications

Ying-Tian Liu Zhifei Zhang Yuan-Chen Guo Matthew Fisher Zhaowen Wang and 1 more

Automatic generation of fonts can be an important aid to typeface design. Many current approaches regard glyphs as pixelated images, which present artifacts when scaling and inevitable quality losses after vectorization. On the other hand, existing vector font synthesis methods either fail represent shape concisely or require supervision during training. To push next level, we propose a novel dual-part representation for glyphs, where each glyph is modeled collection closed "positive"...

10.1109/cvpr52729.2023.01364 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

VMesh: Hybrid Volume-Mesh Representation for Efficient View Synthesis

OPENALEX - Publications

Yuan-Chen Guo Yan‐Pei Cao Chen Wang Yu He Ying Shan and 1 more

With the emergence of neural radiance fields (NeRFs), view synthesis quality has reached an unprecedented level. Compared to traditional mesh-based assets, this volumetric representation is more powerful in expressing scene geometry but inevitably suffers from high rendering costs and can hardly be involved further processes like editing, posing significant difficulties combination with existing graphics pipeline. In paper, we present a hybrid volume-mesh representation, VMesh, which depicts...

10.1145/3610548.3618161 article EN cc-by 2023-12-10

Coming Soon ...