Song–Hai Zhang

ORCID: 0000-0003-0460-1586
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Vision and Imaging
  • Computer Graphics and Visualization Techniques
  • 3D Shape Modeling and Analysis
  • Advanced Image and Video Retrieval Techniques
  • Virtual Reality Applications and Impacts
  • Generative Adversarial Networks and Image Synthesis
  • Advanced Neural Network Applications
  • Visual Attention and Saliency Detection
  • Advanced Image Processing Techniques
  • 3D Surveying and Cultural Heritage
  • Video Analysis and Summarization
  • Image Retrieval and Classification Techniques
  • Video Surveillance and Tracking Methods
  • Human Motion and Animation
  • Human Pose and Action Recognition
  • Robotics and Sensor-Based Localization
  • Image Enhancement Techniques
  • Face recognition and analysis
  • Evacuation and Crowd Dynamics
  • Remote Sensing and LiDAR Applications
  • Advanced Optical Imaging Technologies
  • Optical measurement and interference techniques
  • Image Processing and 3D Reconstruction
  • Image and Video Quality Assessment
  • Image Processing Techniques and Applications

Tsinghua University
2015-2024

Qinghai University
2024

Jiangnan University
2024

Bridge University
2024

First Affiliated Hospital of Henan University of Science and Technology
2022-2024

National Engineering Research Center for Information Technology in Agriculture
2022

Xian Yang Central Hospital
2021

Henan Psychiatric Hospital
2021

Nanchang Institute of Technology
2019

Sichuan University
2018

Humans can naturally and effectively find salient regions in complex scenes. Motivated by this observation, attention mechanisms were introduced into computer vision with the aim of imitating aspect human visual system. Such an mechanism be regarded as a dynamic weight adjustment process based on features input image. Attention have achieved great success many tasks, including image classification, object detection, semantic segmentation, video understanding, generation, 3D vision,...

10.1007/s41095-022-0271-y article EN cc-by Computational Visual Media 2022-03-15

Although promising results have been achieved in the areas of traffic-sign detection and classification, few works provided simultaneous solutions to these two tasks for realistic real world images. We make contributions this problem. Firstly, we created a large benchmark from 100000 Tencent Street View panoramas, going beyond previous benchmarks. It provides images containing 30000 instances. These cover variations illuminance weather conditions. Each is annotated with class label, its...

10.1109/cvpr.2016.232 article EN 2016-06-01

The standard approach to image instance segmentation is perform the object detection first, and then segment from bounding-box. More recently, deep learning methods like Mask R-CNN them jointly. However, little research takes into account uniqueness of "human" category, which can be well defined by pose skeleton. Moreover, human skeleton used better distinguish instances with heavy occlusion than using bounding-boxes. In this paper, we present a brand new pose-based framework for humans...

10.1109/cvpr.2019.00098 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

Neural Radiance Fields (NeRF) has achieved unprece-dented view synthesis quality using coordinate-based neu-ral scene representations. However, NeRF's depen-dency can only handle simple reflections like highlights but cannot deal with complex such as those from glass and mirrors. In these scenarios, NeRF models the virtual image real geometries which leads to inaccurate depth estimation, produces blurry renderings when multi-view consistency is violated reflected objects may be seen under...

10.1109/cvpr52688.2022.01786 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

Video stabilization techniques are essential for most hand-held captured videos due to high-frequency shakes. Several 2D-, 2.5D-, and 3D-based have been presented previously, but the best of our knowledge, no solutions based on deep neural networks had proposed date. The main reason this omission is shortage in training data as well challenge modeling problem using networks. In paper, we present a video technique convolutional network. Previous works usually propose an off-line algorithm...

10.1109/tip.2018.2884280 article EN IEEE Transactions on Image Processing 2018-11-30

Example-guided image synthesis aims to synthesize an from a semantic label map and exemplary indicating style. We use the term "style" in this problem refer implicit characteristics of images, for example: portraits includes gender, racial identity, age, hairstyle; full body pictures it clothing; street scenes refers weather time day such like. A these cases indicates facial expression, pose, or scene segmentation. propose solution example-guided using conditional generative adversarial...

10.1109/cvpr.2019.00159 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

Abstract Detecting small objects is a challenging task. We focus on special case: the detection and classification of traffic signals in street views. present novel framework that utilizes visual attention model to make more efficient, without loss accuracy, which generalizes. The designed generate set candidate regions at suitable scale so targets can be better located classified. In order evaluate our method context signal detection, we have built light benchmark with over 15,000...

10.1007/s41095-018-0116-x article EN cc-by Computational Visual Media 2018-08-04

We present NeRF-SR, a solution for high-resolution (HR) novel view synthesis with mostly low-resolution (LR) inputs. Our method is built upon Neural Radiance Fields (NeRF) that predicts per-point density and color multi-layer perceptron. While producing images at arbitrary scales, NeRF struggles resolutions go beyond observed images. key insight benefits from 3D consistency, which means an pixel absorbs information nearby views. first exploit it by supersampling strategy shoots multiple rays...

10.1145/3503161.3547808 article EN Proceedings of the 30th ACM International Conference on Multimedia 2022-10-10

We investigate the problem of generating 3D meshes from single free-hand sketches, aiming at fast modeling for novice users. It can be regarded as a single-view reconstruction problem, but with unique challenges, brought by variation and conciseness sketches. Ambiguities in poorly-drawn sketches could make it hard to determine how sketched object is posed. In this paper, we address importance viewpoint specification overcoming such ambiguities, propose novel view-aware generation approach....

10.1109/cvpr46437.2021.00595 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

Reconstructing 3D objects from extremely sparse views is a long-standing and challenging problem. While recent techniques employ image diffusion models for generating plausible images at novel viewpoints or distilling pre-trained priors into representations using score distillation sampling (SDS), these methods often struggle to simultaneously achieve high-quality, consistent, detailed results both novel-view synthesis (NVS) geometry. In this work, we present Sparse3D, reconstruction method...

10.1609/aaai.v38i7.28626 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2024-03-24

We present a system for vectorizing 2D raster format cartoon animations. The output animations are visually flicker free, smaller in file size, and easy to edit. identify decorative lines separately from colored regions. use an accurate semantically meaningful image decomposition algorithm, supporting arbitrary color model each region. To ensure temporal coherence the output, we reconstruct universal background all frames extract foreground Simple user-assistance is required complete...

10.1109/tvcg.2009.9 article EN IEEE Transactions on Visualization and Computer Graphics 2009-01-16

Despite strong demand in the game and film industry, automatically synthesizing high-quality dance motions remains a challenging task. In this paper, we present ChoreoMaster, production-ready music-driven motion synthesis system. Given piece of music, ChoreoMaster can generate sequence to accompany input music terms style, rhythm structure. To achieve goal, introduce novel choreography-oriented choreomusical embedding framework, which successfully constructs unified space for both style...

10.1145/3450626.3459932 article EN ACM Transactions on Graphics 2021-07-19

Place recognition plays an essential role in the field of autonomous driving and robot navigation. Although a number point cloud based methods have been proposed achieved promising results, few them take size difference objects into consideration. For small like pedestrians vehicles, large receptive fields will capture unrelated information, while would fail to encode complete geometric information for such as buildings. We argue that fixed are not well suited place recognition, propose...

10.4310/cis.2023.v23.n1.a3 article EN Communications in Information and Systems 2023-01-01

3D single object tracking plays an essential role in many applications, such as autonomous driving. It remains a challenging problem due to the large appearance variation and sparsity of points caused by occlusion lim-ited sensor capabilities. Therefore, contextual information across two consecutive frames is crucial for effective tracking. However, containing useful are often overlooked cropped out existing methods, leading insufficient use important knowledge. To address this issue, we...

10.1109/cvpr52729.2023.00111 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

10.1016/j.cag.2025.104183 article EN Computers & Graphics 2025-03-01

With the recent rise of Metaverse, online multiplayer VR applications are becoming increasingly prevalent worldwide. However, as multiple users located in different physical environments, reset frequencies and timings can lead to serious fairness issues for collaborative/competitive applications. For apps/games, an ideal RDW strategy must make locomotion opportunities equal, regardless environment layouts. The existing methods lack scheme coordinate PEs, thus have issue triggering too many...

10.1109/tvcg.2023.3251648 article EN IEEE Transactions on Visualization and Computer Graphics 2023-03-02

In this work, we introduce Wonder3D, a novel method for efficiently generating high-fidelity textured meshes from single-view images.Recent methods based on Score Distillation Sampling (SDS) have shown the potential to recover 3D geometry 2D diffusion priors, but they typically suffer time-consuming per-shape optimization and inconsistent geometry. contrast, certain works directly produce information via fast network inferences, their results are often of low quality lack geometric details....

10.48550/arxiv.2310.15008 preprint EN cc-by-nc-nd arXiv (Cornell University) 2023-01-01

We consider the problem of learning a representation both spatial relations and dependencies between objects for indoor scene design. propose novel knowledge graph framework based on entity-relation model facts in design, further develop weaklysupervised algorithm extracting from small dataset using structure parameter learning. The proposed is flexible, transferable, readable. present variety computer-aided design applications this representation, to show usefulness robustness framework.

10.1007/s41095-018-0110-3 article EN cc-by Computational Visual Media 2018-03-21

Automatic generation of fonts can be an important aid to typeface design. Many current approaches regard glyphs as pixelated images, which present artifacts when scaling and inevitable quality losses after vectorization. On the other hand, existing vector font synthesis methods either fail represent shape concisely or require supervision during training. To push next level, we propose a novel dual-part representation for glyphs, where each glyph is modeled collection closed "positive"...

10.1109/cvpr52729.2023.01364 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

With the emergence of neural radiance fields (NeRFs), view synthesis quality has reached an unprecedented level. Compared to traditional mesh-based assets, this volumetric representation is more powerful in expressing scene geometry but inevitably suffers from high rendering costs and can hardly be involved further processes like editing, posing significant difficulties combination with existing graphics pipeline. In paper, we present a hybrid volume-mesh representation, VMesh, which depicts...

10.1145/3610548.3618161 article EN cc-by 2023-12-10
Coming Soon ...