NFDI4DS | UHH-SEMS - Publication Details

Recent advances in convolutional neural networks

OPENALEX - Publications

Jiuxiang Gu Zhenhua Wang Jason Kuen Lianyang Ma Amir Shahroudy and 6 more

10.1016/j.patcog.2017.10.013 article EN Pattern Recognition 2017-10-13

Dual Attention Matching Network for Context-Aware Feature Sequence Based Person Re-identification

OPENALEX - Publications

Jianlou Si Honggang Zhang Chun-Guang Li Jason Kuen Xiangfei Kong and 2 more

Typical person re-identification (ReID) methods usually describe each pedestrian with a single feature vector and match them in task-specific metric space. However, the based on are not sufficient enough to overcome visual ambiguity, which frequently occurs real scenario. In this paper, we propose novel end-to-end trainable framework, called Dual ATtention Matching network (DuATM), learn context-aware sequences perform attentive sequence comparison simultaneously. The core component of our...

10.1109/cvpr.2018.00562 article EN 2018-06-01

Recent Advances in Convolutional Neural Networks

OPENALEX - Publications

Jiuxiang Gu Zhenhua Wang Jason Kuen Lianyang Ma Amir Shahroudy and 7 more

In the last few years, deep learning has led to very good performance on a variety of problems, such as visual recognition, speech recognition and natural language processing. Among different types neural networks, convolutional networks have been most extensively studied. Leveraging rapid growth in amount annotated data great improvements strengths graphics processor units, research emerged swiftly achieved state-of-the-art results various tasks. this paper, we provide broad survey recent...

10.48550/arxiv.1512.07108 preprint EN other-oa arXiv (Cornell University) 2015-01-01

Recurrent Attentional Networks for Saliency Detection

OPENALEX - Publications

Jason Kuen Zhenhua Wang Gang Wang

Convolutional-deconvolution networks can be adopted to perform end-to-end saliency detection. But, they do not work well with objects of multiple scales. To overcome such a limitation, in this work, we propose recurrent attentional convolutional-deconvolution network (RACDNN). Using spatial transformer and units, RACDNN is able iteratively attend selected image sub-regions refinement progressively. Besides tackling the scale problem, also learn context-aware features from past iterations...

10.1109/cvpr.2016.399 article EN 2016-06-01

Multimodal Contrastive Training for Visual Representation Learning

OPENALEX - Publications

Xin Yuan Zhe Lin Jason Kuen Jianming Zhang Yilin Wang and 3 more

We develop an approach to learning visual representations that embraces multimodal data, driven by a combination of intra- and inter-modal similarity preservation objectives. Unlike existing pre-training methods, which solve proxy prediction task in single domain, our method exploits intrinsic data properties within each modality semantic information from cross-modal correlation simultaneously, hence improving the quality learned representations. By including training unified framework with...

10.1109/cvpr46437.2021.00692 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

Motion-Guided Cascaded Refinement Network for Video Object Segmentation

OPENALEX - Publications

Ping Hu Gang Wang Xiangfei Kong Jason Kuen Yap‐Peng Tan

Deep CNNs have achieved superior performance in many tasks of computer vision and image understanding. However, it is still difficult to effectively apply deep video object segmentation(VOS) since treating frames as separate static will lose the information hidden motion. To tackle this problem, we propose a Motion-guided Cascaded Refinement Network for VOS. By assuming motion normally different from background motion, frame first an active contour model on optical flow coarsely segment...

10.1109/cvpr.2018.00152 article EN 2018-06-01

SelfDoc: Self-Supervised Document Representation Learning

OPENALEX - Publications

Peizhao Li Jiuxiang Gu Jason Kuen Vlad I. Morariu Handong Zhao and 3 more

We propose SelfDoc, a task-agnostic pre-training framework for document image understanding. Because documents are multimodal and intended sequential reading, our exploits the positional, textual, visual information of every semantically meaningful component in document, it models contextualization between each block content. Unlike existing models, model is coarse-grained instead treating individual words as input, therefore avoiding an overly fine-grained with excessive contextualization....

10.1109/cvpr46437.2021.00560 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

Multi-Scale Aligned Distillation for Low-Resolution Detection

OPENALEX - Publications

Lu Qi Jason Kuen Jiuxiang Gu Zhe Lin Yi Wang and 3 more

In instance-level detection tasks (e.g., object detection), reducing input resolution is an easy option to improve runtime efficiency. However, this traditionally hurts the performance much. This paper focuses on boosting of low-resolution models by distilling knowledge from a high- or multi-resolution model. We first identify challenge applying distillation (KD) teacher and student networks that act different resolutions. To tackle it, we explore idea spatially aligning feature maps between...

10.1109/cvpr46437.2021.01421 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

Open-Vocabulary Instance Segmentation via Robust Cross-Modal Pseudo-Labeling

OPENALEX - Publications

Dat Huynh Jason Kuen Zhe Lin Jiuxiang Gu Ehsan Elhamifar

Open-vocabulary instance segmentation aims at segmenting novel classes without mask annotations. It is an important step toward reducing laborious human supervision. Most existing works first pretrain a model on captioned images covering many and then finetune it limited base with However, the high-level textual information learned from caption pretraining alone cannot effectively encode details required for pixelwise segmentation. To address this, we propose cross-modal pseudo-labeling...

10.1109/cvpr52688.2022.00689 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

SceneComposer: Any-Level Semantic Image Synthesis

OPENALEX - Publications

Yu Zeng Zhe Lin Jianming Zhang Qing Liu John Collomosse and 2 more

We propose a new framework for conditional image synthesis from semantic layouts of any precision levels, ranging pure text to 2D canvas with precise shapes. More specifically, the input layout consists one or more regions free-form descriptions and adjustable which can be set based on desired controllability. The naturally reduces text-to-image (T2I) at lowest level no shape information, it becomes segmentation-to-image (S2I) highest level. By supporting levels in-between, our is flexible...

10.1109/cvpr52729.2023.02152 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Open World Entity Segmentation

OPENALEX - Publications

Lu Qi Jason Kuen Yi Wang Jiuxiang Gu Hengshuang Zhao and 3 more

We introduce a new image segmentation task, called Entity Segmentation (ES), which aims to segment all visual entities (objects and stuffs) in an without predicting their semantic labels. By removing the need of class label prediction, models trained for such task can focus more on improving quality. It has many practical applications as manipulation editing where quality masks is crucial but labels are less important. conduct first-ever study investigate feasibility convolutional...

10.1109/tpami.2022.3227513 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2022-01-01

Dual Attention Matching Network for Context-Aware Feature Sequence based Person Re-Identification

OPENALEX - Publications

Jianlou Si Honggang Zhang Chun-Guang Li Jason Kuen Xiangfei Kong and 2 more

Typical person re-identification (ReID) methods usually describe each pedestrian with a single feature vector and match them in task-specific metric space. However, the based on are not sufficient enough to overcome visual ambiguity, which frequently occurs real scenario. In this paper, we propose novel end-to-end trainable framework, called Dual ATtention Matching network (DuATM), learn context-aware sequences perform attentive sequence comparison simultaneously. The core component of our...

10.48550/arxiv.1803.09937 preprint EN other-oa arXiv (Cornell University) 2018-01-01

High Quality Segmentation for Ultra High-resolution Images

OPENALEX - Publications

Tiancheng Shen Yuechen Zhang Lu Qi Jason Kuen Xingyu Xie and 3 more

To segment 4K or 6K ultra high-resolution images needs extra computation consideration in image segmentation. Common strategies, such as downsampling, patch cropping, and cascade model, cannot address well the balance issue between accuracy cost. Motivated by fact that humans distinguish among objects continuously from coarse to precise levels, we propose Continuous Refinement Model (CRM) for segmentation refinement task. CRM aligns feature map with target aggregates features reconstruct...

10.1109/cvpr52688.2022.00137 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

High Quality Entity Segmentation

OPENALEX - Publications

Lu Qi Jason Kuen Tiancheng Shen Jiuxiang Gu Wenbo Li and 4 more

Dense image segmentation tasks (e.g., semantic, panoptic) are useful for editing, but existing methods can hardly generalize well in an in-the-wild setting where there unrestricted domains, classes, and resolution & quality variations. Motivated by these observations, we construct a new entity dataset, with strong focus on high-quality dense the wild. The dataset contains images spanning diverse domains entities, along plentiful high-resolution mask annotations training testing. Given...

10.1109/iccv51070.2023.00374 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

Recurrent Attentional Networks for Saliency Detection

OPENALEX - Publications

Jason Kuen Zhenhua Wang Gang Wang

Convolutional-deconvolution networks can be adopted to perform end-to-end saliency detection. But, they do not work well with objects of multiple scales. To overcome such a limitation, in this work, we propose recurrent attentional convolutional-deconvolution network (RACDNN). Using spatial transformer and units, RACDNN is able iteratively attend selected image sub-regions refinement progressively. Besides tackling the scale problem, also learn context-aware features from past iterations...

10.48550/arxiv.1604.03227 preprint EN other-oa arXiv (Cornell University) 2016-01-01

Self-taught learning of a deep invariant representation for visual tracking via temporal slowness principle

OPENALEX - Publications

Jason Kuen Kian Ming Lim Chin Poo Lee

10.1016/j.patcog.2015.02.012 article EN Pattern Recognition 2015-02-26

Motion-Guided Cascaded Refinement Network for Video Object Segmentation

OPENALEX - Publications

Ping Hu Gang Wang Xiangfei Kong Jason Kuen Yap‐Peng Tan

In this work, we propose a motion-guided cascaded refinement network for video object segmentation. By assuming the foreground objects show different motion patterns from background, each frame apply an active contour model on optical flow to coarsely segment foreground. The proposed Cascaded Refinement Network (CRN) then takes as guidance coarse segmentation generate accurate in full resolution. way, information and deep CNNs can complement other well accurately frames. To deal with...

10.1109/tpami.2019.2906175 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2019-03-19

Unified Pretraining Framework for Document Understanding

OPENALEX - Publications

Jiuxiang Gu Jason Kuen Vlad I. Morariu Handong Zhao Nikolaos Barmpalios and 3 more

Document intelligence automates the extraction of information from documents and supports many business applications. Recent self-supervised learning methods on large-scale unlabeled document datasets have opened up promising directions towards reducing annotation efforts by training models with objectives. However, most existing pretraining are still language-dominated. We present UDoc, a new unified framework for understanding. UDoc is designed to support understanding tasks, extending...

10.48550/arxiv.2204.10939 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Scaling Object Detection by Transferring Classification Weights

OPENALEX - Publications

Jason Kuen Federico Perazzi Zhe Lin Jianming Zhang Yap‐Peng Tan

Large scale object detection datasets are constantly increasing their size in terms of the number classes and annotations count. Yet, object-level categories annotated is an order magnitude smaller than image-level classification labels. State-of-the art models trained a supervised fashion this limits they can detect. In paper, we propose novel weight transfer network (WTN) to effectively efficiently knowledge from network's weights allow without box supervision. We first introduce input...

10.1109/iccv.2019.00614 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

DelugeNets: Deep Networks with Efficient and Flexible Cross-Layer Information Inflows

OPENALEX - Publications

Jason Kuen Xiangfei Kong Gang Wang Yap‐Peng Tan

Deluge Networks (DelugeNets) are deep neural networks which efficiently facilitate massive cross-layer information inflows from preceding layers to succeeding layers. The connections between in DelugeNets established through depthwise convolutional with learnable filters, acting as a flexible yet efficient selection mechanism. can propagate across many greater flexibility and utilize network parameters more effectively compared ResNets, whilst being than DenseNets. Remarkably, DelugeNet...

10.1109/iccvw.2017.117 article EN 2017-10-01

Stochastic Downsampling for Cost-Adjustable Inference and Improved Regularization in Convolutional Networks

OPENALEX - Publications

Jason Kuen Xiangfei Kong Zhe Lin Gang Wang Jianxiong Yin and 2 more

It is desirable to train convolutional networks (CNNs) run more efficiently during inference. In many cases however, the computational budget that system has for inference cannot be known beforehand training, or dependent on changing real-time resource availability. Thus, it inadequate just inference-efficient CNNs, whose costs are not adjustable and adapt varied budgets. We propose a novel approach cost-adjustable in CNNs - Stochastic Downsampling Point (SDPoint). During SDPoint applies...

10.1109/cvpr.2018.00827 preprint EN 2018-06-01