Hao Wang

ORCID: 0000-0002-3048-8268
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Multimodal Machine Learning Applications
  • Advanced Image and Video Retrieval Techniques
  • Advanced Neural Network Applications
  • Video Surveillance and Tracking Methods
  • Human Pose and Action Recognition
  • Domain Adaptation and Few-Shot Learning
  • Image Retrieval and Classification Techniques
  • Face recognition and analysis
  • Anomaly Detection Techniques and Applications
  • Face and Expression Recognition
  • Gait Recognition and Analysis
  • IoT-based Smart Home Systems
  • Maritime Navigation and Safety
  • Video Analysis and Summarization
  • Sleep and Work-Related Fatigue
  • Radiomics and Machine Learning in Medical Imaging
  • Oil Spill Detection and Mitigation
  • Machine Fault Diagnosis Techniques
  • Vehicle License Plate Recognition
  • Generative Adversarial Networks and Image Synthesis
  • Geological Modeling and Analysis
  • Emotion and Mood Recognition
  • Infrastructure Maintenance and Monitoring
  • Fire Detection and Safety Systems
  • Satellite Communication Systems

Harbin Engineering University
2021-2025

Xidian University
2005-2024

Carnegie Mellon University
2024

Central South University
2020

Shandong Institute of Automation
2019

Chinese Academy of Sciences
2019

University of Chinese Academy of Sciences
2019

Qingdao University of Technology
2016

Wuhan University of Technology
2011

Face detection has achieved great success using the region-based methods. In this report, we propose a face detector applying deep networks in fully convolutional fashion, named R-FCN. Based on Region-based Fully Convolutional Networks (R-FCN), our is more accurate and computational efficient compared with previous R-CNN based detectors. approach, adopt Residual Network (ResNet) as backbone network. Particularly, We exploit several new techniques including position-sensitive average pooling,...

10.48550/arxiv.1709.05256 preprint EN other-oa arXiv (Cornell University) 2017-01-01

Recognizing unseen attribute-object pairs never appearing in the training data is a challenging task, since an object often refers to specific entity while attribute abstract semantic description. Besides, attributes are highly correlated objects, i.e., tends describe different visual features of various objects. Existing methods mainly employ two classifiers recognize and separately, or simply simulate composition object, which ignore inherent discrepancy correlation between them. In this...

10.1109/iccv.2019.00384 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

Zero-shot sketch-based image retrieval (ZS-SBIR) is a specific cross-modal task that involves searching natural images through the use of free-hand sketches under zero-shot scenario. Most previous methods project sketch and features into low-dimensional common space for efficient retrieval, meantime align projected to their semantic (e.g., category-level word vectors) in order transfer knowledge from seen unseen classes. However, projection alignment are always coupled; as result, there lack...

10.1109/tip.2020.3020383 article EN IEEE Transactions on Image Processing 2020-01-01

Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) aims at searching corresponding natural images with the given free-hand sketches, under more realistic and challenging scenario of Learning (ZSL). Prior works concentrate much on aligning sketch image feature representations while ignoring explicit learning heterogeneous extractors to make themselves capable multi-modal features, expense deteriorating transferability from seen categories unseen ones. To address this issue, we propose a novel...

10.1109/tpami.2021.3123315 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2021-10-27

Object Tracking in satellite videos is a challenging task due to the small target size, low spatial resolution, limited appearance and texture information, potential for background confusion. While current state-of-the-art tracking methods perform well on natural images, they often produce unsatisfactory results when applied videos. In this paper, we address these challenges by leveraging location prompts refining feature extractor bounding box refinement module. Furthermore, integrate...

10.1109/tcsvt.2024.3358549 article EN IEEE Transactions on Circuits and Systems for Video Technology 2024-01-25

Pre-trained vision-language(V-L) models such as CLIP have demonstrated impressive Zero-Shot performance in many downstream tasks. Since adopting contrastive video-text pairs methods like to video tasks is limited by its high cost and scale, recent approaches focus on efficiently transferring the image-based domain. A major finding that fine-tuning pre-trained model achieve strong fully supervised leads low zero shot, few base novel generalization. Instead, freezing backbone network maintain...

10.1609/aaai.v38i6.28347 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2024-03-24

Clothes-Changing Person Re-Identification is a challenging problem in computer vision, primarily due to the appearance variations caused by clothing changes across different camera views. This poses significant challenges traditional person re-identification techniques that rely on features. These include inconsistency of and difficulty learning reliable clothing-irrelevant local To address this issue, we propose novel network architecture called Attention-Enhanced Multimodal Feature Fusion...

10.1007/s40747-024-01646-2 article EN cc-by-nc-nd Complex & Intelligent Systems 2024-11-08

10.1109/icassp49660.2025.10890602 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) is a specific cross-modal retrieval task for searching natural images given free-hand sketches under the zero-shot scenario. Most existing methods solve this problem by simultaneously projecting visual features and semantic supervision into low-dimensional common space efficient retrieval. However, such projection destroys completeness of knowledge in original space, so that it unable to transfer useful well when learning from different...

10.24963/ijcai.2020/137 preprint EN 2020-07-01

Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) is a novel cross-modal retrieval task, where abstract sketches are used as queries to retrieve natural images under zero-shot scenario. Most existing methods regard ZS-SBIR traditional classification problem and employ cross-entropy or triplet-based loss achieve retrieval, which neglect the problems of domain gap between large intra-class diversity in sketches. Toward this end, we propose Domain-Smoothing Network (DSN) for ZS-SBIR....

10.24963/ijcai.2021/158 article EN 2021-08-01

There are a large number of insulators on the transmission line, and insulator damage will have major impact power supply security. Image-based segmentation in lines is premise also critical task for line inspection. In this paper, modified conditional generative adversarial network pixel-level proposed. The generator reconstructed by encoder-decoder layers with asymmetric convolution kernel which can simplify complexity extract more kinds feature information. discriminator composed fully...

10.1155/2019/4245329 article EN cc-by Journal of Sensors 2019-11-12

Video referring segmentation focuses on segmenting out the object in a video based corresponding textual description. Previous works have primarily tackled this task by devising two crucial parts, an intra-modal module for context modeling and inter-modal heterogeneous alignment. However, there are essential drawbacks of approach: (1) it lacks joint learning alignment, leading to insufficient interactions among input elements; (2) both modules require task-specific expert knowledge design,...

10.1109/tip.2022.3161832 article EN IEEE Transactions on Image Processing 2022-01-01

Two-stream architecture have shown strong performance in video classification task. The key idea is to learn spatiotemporal features by fusing convolutional networks spatially and temporally. However, there are some problems within such architecture. First, it relies on optical flow model temporal information, which often expensive compute store. Second, has limited ability capture details local context information for data. Third, lacks explicit semantic guidance that greatly decrease the...

10.1609/aaai.v33i01.33019030 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2019-07-17

10.1007/s11063-021-10446-5 article EN Neural Processing Letters 2021-03-03

Zero-shot sketch-based image retrieval (ZS-SBIR) is a specific cross-modal task for searching natural images given free-hand sketches under the zero-shot scenario. Most existing methods solve this problem by simultaneously projecting visual features and semantic supervision into low-dimensional common space efficient retrieval. However, such projection destroys completeness of knowledge in original space, so that it unable to transfer useful well when learning from different modalities....

10.48550/arxiv.2003.09869 preprint EN other-oa arXiv (Cornell University) 2020-01-01

10.1007/s11771-020-4281-6 article EN Journal of Central South University 2020-01-01

Considering the great influence of residual frequency offset on phase estimate in burst-mode communication systems, symmetrical frame structure is applied to joint frequency-phase estimate. Firstly, based a general data structure, Cramer-Rao Bounds(CRBs) for asymmetrical burst and with DA/NDA synchronization mode are given. Secondly, concept used estimator, comparison analysis made compared case. On this basis, pre-phase/post-phase estimators proposed. Theoretical simulation results show...

10.1109/icsp.2016.7878021 article EN 2016-11-01

In traditional fault diagnosis method, a large number of experiments are needed to get the optimal performance classifier which diagnoses type fault.Because algorithm limit, there is no one can be applied all kinds diagnosis.In order avoid disadvantages caused by single approach, decision level fusion method based on multiple classifiers introduced in field diagnosis.The with fuzzy comprehensive evaluation put forward and basic model set up.The reasonable distribution weight that affects...

10.14257/ijhit.2016.9.2.17 article EN International Journal of Hybrid Information Technology 2016-02-28

<title>Abstract</title> Lightweight ship detection offers the dual benefits of rapid and low computational cost, making it particularly advantageous for inland waterway safety monitoring. This study introduces YOLO-GCV, a lightweight algorithm based on YOLOv7-tiny. The proposed strikes an effective balance between accuracy speed. First, ELAN-Ghost module is integrated into backbone network, while VoVGSCSP, another module, introduced neck to further streamline model structure. Coordinate...

10.21203/rs.3.rs-5239851/v1 preprint EN cc-by Research Square (Research Square) 2024-11-06

10.1109/biocas61083.2024.10798289 article EN 2022 IEEE Biomedical Circuits and Systems Conference (BioCAS) 2024-10-24

A random access framework based on interleaving multiplexing (RIM) is proposed to further improve throughput for internet of things in satellite networks, which resort the state-of-art physical layer techniques that can resolve multiple packet collisions issue. The interleavers are randomly selected by active devices from a set available interleavers, used differentiate signals different sharing subframe consists time slots. message passing detector (MPD) utilized decode collided packets. By...

10.1109/access.2021.3112128 article EN cc-by IEEE Access 2021-01-01

Two-stream architecture have shown strong performance in video classification task. The key idea is to learn spatio-temporal features by fusing convolutional networks spatially and temporally. However, there are some problems within such architecture. First, it relies on optical flow model temporal information, which often expensive compute store. Second, has limited ability capture details local context information for data. Third, lacks explicit semantic guidance that greatly decrease the...

10.48550/arxiv.1903.02155 preprint EN other-oa arXiv (Cornell University) 2019-01-01
Coming Soon ...