NFDI4DS | UHH-SEMS - Publication Details

Weilin Huang

ORCID: 0000-0002-1520-4140

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5101909917

Research Areas

Advanced Image and Video Retrieval Techniques
Advanced Neural Network Applications
Domain Adaptation and Few-Shot Learning
Multimodal Machine Learning Applications
Image Retrieval and Classification Techniques
Handwritten Text Recognition Techniques
Face and Expression Recognition
Face recognition and analysis
Human Pose and Action Recognition
Generative Adversarial Networks and Image Synthesis
Medical Image Segmentation Techniques
Natural Language Processing Techniques
Remote-Sensing Image Classification
Brain Tumor Detection and Classification
Machine Learning and Data Classification
Image Processing and 3D Reconstruction
Video Surveillance and Tracking Methods
Vehicle License Plate Recognition
Anomaly Detection Techniques and Applications
Video Analysis and Summarization
Neural Networks and Applications
Visual Attention and Saliency Detection
Advanced Vision and Imaging
3D Shape Modeling and Analysis
Organic Electronics and Photovoltaics

Tsinghua University
2025

Alibaba Group (China)
2023-2024

First Affiliated Hospital of Fujian Medical University
2024

Fujian Medical University
2024

Shenzhen University
2023

Shenzhen Academy of Robotics
2023

Wilmington University
2020-2022

National Cheng Kung University
2021-2022

Alibaba Group (United States)
2021-2022

Universitas Kristen Indonesia Maluku
2021

Multi-Similarity Loss With General Pair Weighting for Deep Metric Learning

OPENALEX - Publications

Xun Wang Xintong Han Weilin Huang Dengke Dong Matthew R. Scott

A family of loss functions built on pair-based computation have been proposed in the literature which provide a myriad solutions for deep metric learning. In this pa-per, we general weighting framework under-standing recent functions. Our contributions are three-fold: (1) establish General Pair Weighting (GPW) framework, casts sampling problem learning into unified view pair through gradient analysis, providing powerful tool understanding functions; (2) show that with GPW, various existing...

10.1109/cvpr.2019.00516 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

TOOD: Task-aligned One-stage Object Detection

OPENALEX - Publications

Chengjian Feng Yujie Zhong Yu Gao Matthew R. Scott Weilin Huang

One-stage object detection is commonly implemented by optimizing two sub-tasks: classification and localization, using heads with parallel branches, which might lead to a certain level of spatial misalignment in predictions between the tasks. In this work, we propose Task-aligned Object Detection (TOOD) that explicitly aligns tasks learning-based manner. First, design novel Head (T-Head) offers better balance learning task-interactive task-specific features, as well greater flexibility learn...

10.1109/iccv48922.2021.00349 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

Deformable Siamese Attention Networks for Visual Object Tracking

OPENALEX - Publications

Yuechen Yu Yilei Xiong Weilin Huang Matthew R. Scott

Siamese-based trackers have achieved excellent performance on visual object tracking. However, the target template is not updated online, and features of search image are computed independently in a Siamese architecture. In this paper, we propose Deformable Attention Networks, referred to as SiamAttn, by introducing new attention mechanism that computes deformable self-attention cross-attention. The learns strong context information via spatial attention, selectively emphasizes...

10.1109/cvpr42600.2020.00676 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

Text-Attentional Convolutional Neural Network for Scene Text Detection

OPENALEX - Publications

Tong He Weilin Huang Yu Qiao Jian Yao

Recent deep learning models have demonstrated strong capabilities for classifying text and non-text components in natural images. They extract a high-level feature globally computed from whole image component (patch), where the cluttered background information may dominate true features representation. This leads to less discriminative power poorer robustness. In this paper, we present new system scene detection by proposing novel text-attentional convolutional neural network (Text-CNN) that...

10.1109/tip.2016.2547588 article EN IEEE Transactions on Image Processing 2016-03-28

Reading Scene Text in Deep Convolutional Sequences

OPENALEX - Publications

Pan He Weilin Huang Yu Qiao Chen Change Loy Xiaoou Tang

We develop a Deep-Text Recurrent Network (DTRN)that regards scene text reading as sequence labelling problem. leverage recent advances of deep convolutional neural networks to generate an ordered highlevel from whole word image, avoiding the difficult character segmentation Then recurrent model, building on long short-term memory (LSTM), is developed robustly recognize generated CNN sequences, departing most existing approaches recognising each independently. Our model has number appealing...

10.1609/aaai.v30i1.10465 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2016-03-05

Single Shot Text Detector with Regional Attention

OPENALEX - Publications

Pan He Weilin Huang Tong He Qile Zhu Yu Qiao and 1 more

We present a novel single-shot text detector that directly outputs word-level bounding boxes in natural image. propose an attention mechanism which roughly identifies regions via automatically learned attentional map. This substantially suppresses background interference the convolutional features, is key to producing accurate inference of words, particularly at extremely small sizes. results single model essentially works coarse-to-fine manner. It departs from recent FCN-based detectors...

10.1109/iccv.2017.331 article EN 2017-10-01

Text Localization in Natural Images Using Stroke Feature Transform and Text Covariance Descriptors

OPENALEX - Publications

Weilin Huang Zhe Lin Shuicheng Yan Jue Wang

In this paper, we present a new approach for text localization in natural images, by discriminating and non-text regions at three levels: pixel, component line levels. Firstly, powerful low-level filter called the Stroke Feature Transform (SFT) is proposed, which extends widely-used Width (SWT) incorporating color cues of pixels, leading to significantly enhanced performance on inter-component separation intra-component connection. Secondly, based output SFT, apply two classifiers,...

10.1109/iccv.2013.157 article EN 2013-12-01

An End-to-End TextSpotter with Explicit Alignment and Attention

OPENALEX - Publications

Tong He Zhi Tian Weilin Huang Chunhua Shen Yu Qiao and 1 more

Text detection and recognition in natural images have long been considered as two separate tasks that are processed sequentially. Jointly training is non-trivial due to significant differences learning difficulties convergence rates. In this work, we present a conceptually simple yet efficient framework simultaneously processes the united framework. Our main contributions three-fold: (1) propose novel text-alignment layer allows it precisely compute convolutional features of text instance...

10.1109/cvpr.2018.00527 preprint EN 2018-06-01

Cross-Batch Memory for Embedding Learning

OPENALEX - Publications

Xun Wang Haozhi Zhang Weilin Huang Matthew R. Scott

Mining informative negative instances are of central importance to deep metric learning (DML). However, the hard-mining ability existing DML methods is intrinsically limited by mini-batch training, where only a accessible at each iteration. In this paper, we identify “slow drift” phenomena observing that embedding features drift exceptionally slow even as model parameters updating throughout training process. It suggests computed preceding iterations can considerably approximate their...

10.1109/cvpr42600.2020.00642 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

ClothFlow: A Flow-Based Model for Clothed Person Generation

OPENALEX - Publications

Xintong Han Weilin Huang Xiaojun Hu Matthew R. Scott

We present ClothFlow, an appearance-flow-based generative model to synthesize clothed person for posed-guided image generation and virtual try-on. By estimating a dense flow between source target clothing regions, ClothFlow effectively models the geometric changes naturally transfers appearance novel images as shown in Figure 1. achieve this with three-stage framework: 1) Conditioned on pose, we first estimate semantic layout provide richer guidance process. 2) Built two feature pyramid...

10.1109/iccv.2019.01057 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

Convolutional Character Networks

OPENALEX - Publications

Linjie Xing Zhi Tian Weilin Huang Matthew R. Scott

Recent progress has been made on developing a unified framework for joint text detection and recognition in natural images, but existing models were mostly built two-stage by involving ROI pooling, which can degrade the performance task. In this work, we propose convolutional character networks, referred as CharNet, is an one-stage model that process two tasks simultaneously one pass. CharNet directly outputs bounding boxes of words characters, with corresponding labels. We utilize basic...

10.1109/iccv.2019.00922 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

Channel Interaction Networks for Fine-Grained Image Categorization

OPENALEX - Publications

Yu Gao Xintong Han Xun Wang Weilin Huang Matthew R. Scott

Fine-grained image categorization is challenging due to the subtle inter-class differences. We posit that exploiting rich relationships between channels can help capture such differences since different correspond semantics. In this paper, we propose a channel interaction network (CIN), which models channel-wise interplay both within an and across images. For single image, self-channel (SCI) module proposed explore correlation image. This allows model learn complementary features from...

10.1609/aaai.v34i07.6712 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2020-04-03

Dual-stream pyramid registration network

OPENALEX - Publications

Miao Kang Xiaojun Hu Weilin Huang Matthew R. Scott Mauricio Reyes

10.1016/j.media.2022.102379 article EN Medical Image Analysis 2022-02-18

Knowledge Guided Disambiguation for Large-Scale Scene Classification With Multi-Resolution CNNs

OPENALEX - Publications

Limin Wang Sheng Guo Weilin Huang Yuanjun Xiong Yu Qiao

Convolutional neural networks (CNNs) have made remarkable progress on scene recognition, partially due to these recent large-scale datasets, such as the Places and Places2. Scene categories are often defined by multi-level information, including local objects, global layout, background environment, thus leading large intra-class variations. In addition, with increasing number of categories, label ambiguity has become another crucial issue in classification. This paper focuses recognition...

10.1109/tip.2017.2675339 article EN IEEE Transactions on Image Processing 2017-02-24

Places205-VGGNet Models for Scene Recognition

OPENALEX - Publications

Limin Wang Sheng Guo Weilin Huang Yu Qiao

VGGNets have turned out to be effective for object recognition in still images. However, it is unable yield good performance by directly adapting the VGGNet models trained on ImageNet dataset scene recognition. This report describes our implementation of training large-scale Places205 dataset. Specifically, we train three models, namely VGGNet-11, VGGNet-13, and VGGNet-16, using a Multi-GPU extension Caffe toolbox with high computational efficiency. We verify Places205-VGGNet datasets:...

10.48550/arxiv.1508.01667 preprint EN other-oa arXiv (Cornell University) 2015-01-01

Pedestrian detection with unsupervised multispectral feature learning using deep neural networks

OPENALEX - Publications

Yanpeng Cao Dayan Guan Weilin Huang Jiangxin Yang Yanlong Cao and 1 more

10.1016/j.inffus.2018.06.005 article EN Information Fusion 2018-06-21

Heterogeneous Face Recognition: A Common Encoding Feature Discriminant Approach

OPENALEX - Publications

Dihong Gong Zhifeng Li Weilin Huang Xuelong Li Dacheng Tao

Heterogeneous face recognition is an important, yet challenging problem in community. It refers to matching a probe image gallery of images taken from alternate imaging modality. The major challenge heterogeneous lies the great discrepancies between different modalities. Conventional feature descriptors, e.g., local binary patterns, histogram oriented gradients, and scale-invariant transform, are mostly designed handcrafted way thus generally fail extract common discriminant information...

10.1109/tip.2017.2651380 article EN IEEE Transactions on Image Processing 2017-01-10

Locally Supervised Deep Hybrid Model for Scene Recognition

OPENALEX - Publications

Sheng Guo Weilin Huang Limin Wang Yu Qiao

Convolutional neural networks (CNN) have recently achieved remarkable successes in various image classification and understanding tasks. The deep features obtained at the top fully-connected layer of CNN (FC-features) exhibit rich global semantic information are extremely effective classification. On other hand, convolutional middle layers also contain meaningful local information, but not fully explored for representation. In this paper, we propose a novel Locally-Supervised Deep Hybrid...

10.1109/tip.2016.2629443 article EN IEEE Transactions on Image Processing 2016-11-16

Exploring Classification Equilibrium in Long-Tailed Object Detection

OPENALEX - Publications

Chengjian Feng Yujie Zhong Weilin Huang

The conventional detectors tend to make imbalanced classification and suffer performance drop, when the distribution of training data is severely skewed. In this paper, we propose use mean score indicate accuracy for each category during training. Based on indicator, balance via an Equilibrium Loss (EBL) a Memory-augmented Feature Sampling (MFS) method. Specifically, EBL increases intensity adjustment decision boundary weak classes by designed score-guided loss margin between any two...

10.1109/iccv48922.2021.00340 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

FiNet: Compatible and Diverse Fashion Image Inpainting

OPENALEX - Publications

Xintong Han Zuxuan Wu Weilin Huang Matthew R. Scott Larry S. Davis

Visual compatibility is critical for fashion analysis, yet missing in existing image synthesis systems. In this paper, we propose to explicitly model visual through inpainting. We present Fashion Inpainting Networks (FiNet), a two-stage image-to-image generation framework that able perform compatible and diverse Disentangling the of shape appearance ensure photorealistic results, our consists network an network. More importantly, each network, introduce two encoders interacting with one...

10.1109/iccv.2019.00458 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

The iMaterialist Fashion Attribute Dataset

OPENALEX - Publications

Sheng Guo Weilin Huang Xiao Zhang Prasanna Srikhanta Yin Cui and 4 more

Many Large-scale image databases such as ImageNet have significantly advanced classification and other visual recognition tasks. However much of these datasets are constructed only for single-label coarse object-level classification. For real-world applications, multiple labels fine-grained categories often needed, yet very few exist publicly, especially those large-scale high quality. In this work, we contribute to the community a new dataset called iMaterialist Fashion Attribute...

10.1109/iccvw.2019.00377 preprint EN 2019-10-01

Coming Soon ...