NFDI4DS | UHH-SEMS - Publication Details

Rethinking and Improving Relative Position Encoding for Vision Transformer

OPENALEX - Publications

Kan Wu Houwen Peng Minghao Chen Jianlong Fu Hongyang Chao

Relative position encoding (RPE) is important for transformer to capture sequence ordering of input tokens. General efficacy has been proven in natural language processing. However, computer vision, its not well studied and even remains controversial, e.g., whether relative can work equally as absolute position? In order clarify this, we first review existing methods analyze their pros cons when applied vision transformers. We then propose new dedicated 2D images, called image RPE (iRPE)....

10.1109/iccv48922.2021.00988 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

LightTrack: Finding Lightweight Neural Networks for Object Tracking via One-Shot Architecture Search

OPENALEX - Publications

Bin Yan Houwen Peng Kan Wu Dong Wang Jianlong Fu and 1 more

Object tracking has achieved significant progress over the past few years. However, state-of-the-art trackers become increasingly heavy and expensive, which limits their deployments in resource-constrained applications. In this work, we present LightTrack, uses neural architecture search (NAS) to design more lightweight efficient object trackers. Comprehensive experiments show that our LightTrack is effective. It can find achieve superior performance compared handcrafted SOTA trackers, such...

10.1109/cvpr46437.2021.01493 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

MiniViT: Compressing Vision Transformers with Weight Multiplexing

OPENALEX - Publications

Jinnian Zhang Houwen Peng Kan Wu Mengchen Liu Bin Xiao and 2 more

Vision Transformer (ViT) models have recently drawn much attention in computer vision due to their high model capability. However, ViT suffer from huge number of parameters, restricting applicability on devices with limited memory. To alleviate this problem, we propose MiniViT, a new compression framework, which achieves parameter reduction transformers while retaining the same performance. The central idea MiniViT is multiplex weights consecutive transformer blocks. More specifically, make...

10.1109/cvpr52688.2022.01183 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

Tooth segmentation on dental meshes using morphologic skeleton

OPENALEX - Publications

Kan Wu Li Chen Jing Li Yanheng Zhou

10.1016/j.cag.2013.10.028 article EN Computers & Graphics 2013-11-05

Automatic object extraction from images using deep neural networks and the level‐set method

OPENALEX - Publications

Kan Wu Yizhou Yu

The authors propose an automatic method for extracting objects with fine quality from photographs. authors’ starts finding bounding boxes that enclose potential objects, which is achievable by state‐of‐the‐art object proposal methods. To further segment within obtained boxes, the a new multi‐pass level‐set based on saliency detection and foreground pixel classification. function initially constructed respect to automatically detected salient parts box, eliminates user interaction predicts...

10.1049/iet-ipr.2017.1144 article EN IET Image Processing 2018-02-15

Searching the Search Space of Vision Transformer

OPENALEX - Publications

Minghao Chen Kan Wu Bolin Ni Houwen Peng Bei Liu and 3 more

Vision Transformer has shown great visual representation power in substantial vision tasks such as recognition and detection, thus been attracting fast-growing efforts on manually designing more effective architectures. In this paper, we propose to use neural architecture search automate process, by searching not only the but also space. The central idea is gradually evolve different dimensions guided their E-T Error computed using a weight-sharing supernet. Moreover, provide design...

10.48550/arxiv.2111.14725 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Weakly Learning to Match Experts in Online Community

OPENALEX - Publications

Yujie Qian Jie Tang Kan Wu

In online question-and-answer (QA) websites like Quora, one central issue is to find (invite) users who are able provide answers a given question and at the same time would be unlikely say "no" invitation. The challenge how trade off matching degree between users’ expertise topic, likelihood of positive response from invited users. this paper, we formally formulate problem develop weakly supervised factor graph (WeakFG) model address problem. explicitly captures questions To that an user...

10.24963/ijcai.2018/534 article EN 2018-07-01

RollBin: reducing code-size via loop rerolling at binary level

OPENALEX - Publications

Tianao Ge Zewei Mo Kan Wu Xianwei Zhang Yutong Lu

Code size is an increasing concern on resource constrained systems, ranging from embedded devices to cloud servers. To address the issue, lowering memory occupancy has become a priority in developing and deploying applications, accordingly compiler-based optimizations have been proposed reduce program footprint. However, prior arts are generally dealing with source codes or intermediate representations, thus very limited scope real scenarios where only binary files commonly provided. fill...

10.1145/3519941.3535072 article EN 2022-06-10

Where Have You Been? Inferring Career Trajectory from Academic Social Network

OPENALEX - Publications

Kan Wu Jie Tang Chenhui Zhang

A person’s career trajectory is composed of her/his past work or educational affiliations (institutions) at different points times. Knowing people’s, especially scholars’, trajectories can help the government make more scientific strategies to allocate resources and attract talent companies smart recruiting plans. It could also support individuals find appropriate co-researchers job opportunities. The paper focuses on inferring in academic social network. For about 1/3 authors not having any...

10.24963/ijcai.2018/499 article EN 2018-07-01

Rethinking and Improving Relative Position Encoding for Vision Transformer

OPENALEX - Publications

Kan Wu Houwen Peng Minghao Chen Jianlong Fu Hongyang Chao

Relative position encoding (RPE) is important for transformer to capture sequence ordering of input tokens. General efficacy has been proven in natural language processing. However, computer vision, its not well studied and even remains controversial, e.g., whether relative can work equally as absolute position? In order clarify this, we first review existing methods analyze their pros cons when applied vision transformers. We then propose new dedicated 2D images, called image RPE (iRPE)....

10.48550/arxiv.2107.14222 preprint EN cc-by arXiv (Cornell University) 2021-01-01

Weakly Learning to Match Experts in Online Community

OPENALEX - Publications

Yujie Qian Jie Tang Kan Wu

In online question-and-answer (QA) websites like Quora, one central issue is to find (invite) users who are able provide answers a given question and at the same time would be unlikely say "no" invitation. The challenge how trade off matching degree between users' expertise topic, likelihood of positive response from invited users. this paper, we formally formulate problem develop weakly supervised factor graph (WeakFG) model address problem. explicitly captures questions To that an user...

10.48550/arxiv.1611.04363 preprint EN other-oa arXiv (Cornell University) 2016-01-01

Dance of the Dragonfly: A Vision-Based Agile Aerial Touch Solution for IARC Mission 7

OPENALEX - Publications

Ziliang Lai Rui Yang Hui Cheng Wenjun Deng Kan Wu and 1 more

The International Aerial Robotics Competition (IARC) aims to move the state-of-the-art in aerial robotics forward through mission challenges. In IARC Mission 7, robot will navigate without external navigation aids, interact with autonomous ground robots, and avoid dynamic obstacles herding problem. 2017 competition, our team firstly accomplished interactively herd one iRobot end of arena accurate rapid touch won first place system control. This paper presents self-localization, control...

10.1109/rcar.2018.8621785 article EN 2022 IEEE International Conference on Real-time Computing and Robotics (RCAR) 2018-08-01

Harvesting Visual Objects from Internet Images via Deep Learning Based Objectness Assessment

OPENALEX - Publications

Kan Wu Guanbin Li Haofeng Li Jianjun Zhang Yizhou Yu

The collection of internet images has been growing in an astonishing speed. It is undoubted that these contain rich visual information can be useful many applications, such as media creation and data-driven image synthesis. In this paper, we focus on the methodologies for building a object database from images. Such built to large number high-quality objects help with various applications. Our method based dense proposal generation objectness-based re-ranking. A novel deep convolutional...

10.48550/arxiv.1904.00641 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Harvesting Visual Objects from Internet Images via Deep-Learning-Based Objectness Assessment

OPENALEX - Publications

Kan Wu Guanbin Li Haofeng Li Jianjun Zhang Yizhou Yu

The collection of internet images has been growing in an astonishing speed. It is undoubted that these contain rich visual information can be useful many applications, such as media creation and data-driven image synthesis. In this article, we focus on the methodologies for building a object database from images. Such built to large number high-quality objects help with various applications. Our method based dense proposal generation objectness-based re-ranking. A novel deep convolutional...

10.1145/3318463 article EN ACM Transactions on Multimedia Computing Communications and Applications 2019-08-08

LightTrack: Finding Lightweight Neural Networks for Object Tracking via One-Shot Architecture Search

OPENALEX - Publications

Bin Yan Houwen Peng Kan Wu Dong Wang Jianlong Fu and 1 more

Object tracking has achieved significant progress over the past few years. However, state-of-the-art trackers become increasingly heavy and expensive, which limits their deployments in resource-constrained applications. In this work, we present LightTrack, uses neural architecture search (NAS) to design more lightweight efficient object trackers. Comprehensive experiments show that our LightTrack is effective. It can find achieve superior performance compared handcrafted SOTA trackers, such...

10.48550/arxiv.2104.14545 preprint EN other-oa arXiv (Cornell University) 2021-01-01