NFDI4DS | UHH-SEMS - Publication Details

ResNeSt: Split-Attention Networks

OPENALEX - Publications

Hang Zhang Chongruo Wu Zhongyue Zhang Yi Zhu Haibin Lin and 7 more

The ability to learn richer network representations generally boosts the performance of deep learning models. To improve representation-learning in convolutional neural networks, we present a multi-branch architecture, which applies channel-wise attention across different branches leverage complementary strengths both feature-map and multi-path representation. Our proposed Split-Attention module provides simple modular computation block that can serve as drop-in replacement for popular...

10.1109/cvprw56347.2022.00309 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2022-06-01

ResNeSt: Split-Attention Networks

OPENALEX - Publications

Hang Zhang Chongruo Wu Zhongyue Zhang Yi Zhu Haibin Lin and 7 more

It is well known that featuremap attention and multi-path representation are important for visual recognition. In this paper, we present a modularized architecture, which applies the channel-wise on different network branches to leverage their success in capturing cross-feature interactions learning diverse representations. Our design results simple unified computation block, can be parameterized using only few variables. model, named ResNeSt, outperforms EfficientNet accuracy latency...

10.48550/arxiv.2004.08955 preprint EN other-oa arXiv (Cornell University) 2020-01-01

HPLFlowNet: Hierarchical Permutohedral Lattice FlowNet for Scene Flow Estimation on Large-Scale Point Clouds

OPENALEX - Publications

Xiuye Gu Yijie Wang Chongruo Wu Yong Jae Lee Panqu Wang

We present a novel deep neural network architecture for end-to-end scene flow estimation that directly operates on large-scale 3D point clouds. Inspired by Bilateral Convolutional Layers (BCL), we propose DownBCL, UpBCL, and CorrBCL operations restore structural information from unstructured clouds, fuse two consecutive Operating discrete sparse permutohedral lattice points, our architectural design is parsimonious in computational cost. Our model can efficiently process pair of cloud frames...

10.1109/cvpr.2019.00337 preprint EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

A Comprehensive Study of Deep Video Action Recognition

OPENALEX - Publications

Yi Zhu Xinyu Li Chunhui Liu Mohammadreza Zolfaghari Yuanjun Xiong and 5 more

Video action recognition is one of the representative tasks for video understanding. Over last decade, we have witnessed great advancements in thanks to emergence deep learning. But also encountered new challenges, including modeling long-range temporal information videos, high computation costs, and incomparable results due datasets evaluation protocol variances. In this paper, provide a comprehensive survey over 200 existing papers on learning recognition. We first introduce 17 that...

10.48550/arxiv.2012.06567 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Improving Semantic Segmentation via Efficient Self-Training

OPENALEX - Publications

Yi Zhu Zhongyue Zhang Chongruo Wu Zhi Zhang Tong He and 4 more

Starting from the seminal work of Fully Convolutional Networks (FCN), there has been significant progress on semantic segmentation. However, deep learning models often require large amounts pixelwise annotations to train accurate and robust models. Given prohibitively expensive annotation cost segmentation masks, we introduce a self-training framework in this paper leverage pseudo labels generated unlabeled data. In order handle data imbalance problem segmentation, propose centroid sampling...

10.1109/tpami.2021.3138337 article EN publisher-specific-oa IEEE Transactions on Pattern Analysis and Machine Intelligence 2021-12-24

Adv-BNN: Improved Adversarial Defense through Robust Bayesian Neural Network

OPENALEX - Publications

Xuanqing Liu Yao Li Chongruo Wu Cho‐Jui Hsieh

We present a new algorithm to train robust neural network against adversarial attacks. Our is motivated by the following two ideas. First, although recent work has demonstrated that fusing randomness can improve robustness of networks (Liu 2017), we noticed adding noise blindly all layers not optimal way incorporate randomness. Instead, model under framework Bayesian Neural Network (BNN) formally learn posterior distribution models in scalable way. Second, formulate mini-max problem BNN best...

10.48550/arxiv.1810.01279 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Not All Areas Are Equal: Transfer Learning for Semantic Segmentation via Hierarchical Region Selection

OPENALEX - Publications

Ruoqi Sun Xinge Zhu Chongruo Wu Chen Huang Jianping Shi and 1 more

The success of deep neural networks for semantic segmentation heavily relies on large-scale and well-labeled datasets, which are hard to collect in practice. Synthetic data offers an alternative obtain ground-truth labels free. However, models directly trained synthetic often struggle generalize real images. In this paper, we consider transfer learning that aims mitigate the gap between abundant (source domain) limited (target domain). Unlike previous approaches either learn mappings target...

10.1109/cvpr.2019.00449 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

Predicting ASD diagnosis in children with synthetic and image-based eye gaze data

OPENALEX - Publications

Sidrah Liaqat Chongruo Wu Prashanth Reddy Duggirala Sen-ching S. Cheung Chen‐Nee Chuah and 2 more

10.1016/j.image.2021.116198 article EN Signal Processing Image Communication 2021-02-17

Improving Semantic Segmentation via Self-Training

OPENALEX - Publications

Yi Zhu Zhongyue Zhang Chongruo Wu Zhi Zhang Tong He and 4 more

Deep learning usually achieves the best results with complete supervision. In case of semantic segmentation, this means that large amounts pixelwise annotations are required to learn accurate models. paper, we show can obtain state-of-the-art using a semi-supervised approach, specifically self-training paradigm. We first train teacher model on labeled data, and then generate pseudo labels set unlabeled data. Our robust training framework digest human-annotated jointly achieve top...

10.48550/arxiv.2004.14960 preprint EN other-oa arXiv (Cornell University) 2020-01-01

A Unified Efficient Pyramid Transformer for Semantic Segmentation

OPENALEX - Publications

Fangrui Zhu Yi Zhu Li Zhang Chongruo Wu Yanwei Fu and 1 more

Semantic segmentation is a challenging problem due to difficulties in modeling context complex scenes and class confusions along boundaries. Most literature either focuses on or boundary refinement, which less generalizable open-world scenarios. In this work, we advocate unified framework (UN-EPT) segment objects by considering both information artifacts. We first adapt sparse sampling strategy incorporate the transformer-based attention mechanism for efficient modeling. addition, separate...

10.1109/iccvw54120.2021.00301 article EN 2021-10-01

Machine Learning Based Autism Spectrum Disorder Detection from Videos

OPENALEX - Publications

Chongruo Wu Sidrah Liaqat Halil Ismail Helvaci Sen-ching Samson Chcung Chen‐Nee Chuah and 2 more

Early diagnosis of Autism Spectrum Disorder (ASD) is crucial for best outcomes to interventions. In this paper, we present a machine learning (ML) approach ASD based on identifying specific behaviors from videos infants ages 6 through 36 months. The interest include directed gaze towards faces or objects interest, positive affect, and vocalization. dataset consists 2000 3-minute duration with these manually coded by expert raters. Moreover, the has statistical features including frequency...

10.1109/healthcom49281.2021.9398924 article EN 2021-03-01

Predicting Autism Diagnosis using Image with Fixations and Synthetic Saccade Patterns

OPENALEX - Publications

Chongruo Wu Sidrah Liaqat Sen-ching S. Cheung Chen‐Nee Chuah Sally Ozonoff

Signs of autism spectrum disorder (ASD) emerge in the first year life many children, but diagnosis is typically made much later, at an average age 4 years United States. Early intervention highly effective for young children with ASD, reserved a formal diagnosis, making accurate identification as early possible imperative. A screening tool that could identify ASD risk during infancy offers opportunity before full set symptoms present. In this paper, we propose two machine learning methods,...

10.1109/icmew.2019.00125 article EN 2019-07-01

Recognizing road from satellite images by structured neural network

OPENALEX - Publications

Guangliang Cheng Chongruo Wu Qingqing Huang Yu Meng Jianping Shi and 2 more

10.1016/j.neucom.2019.05.007 article EN Neurocomputing 2019-05-11

PRAL: A Tailored Pre-Training Model for Task-Oriented Dialog Generation

OPENALEX - Publications

Jing Gu Qingyang Wu Chongruo Wu Weiyan Shi Zhou Yu

Jing Gu, Qingyang Wu, Chongruo Weiyan Shi, Zhou Yu. Proceedings of the 59th Annual Meeting Association for Computational Linguistics and 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 2021.

10.18653/v1/2021.acl-short.40 article EN cc-by 2021-01-01

A Tailored Pre-Training Model for Task-Oriented Dialog Generation

OPENALEX - Publications

Jing Gu Qingyang Wu Chongruo Wu Weiyan Shi Zhou Yu

The recent success of large pre-trained language models such as BERT and GPT-2 has suggested the effectiveness incorporating priors in downstream dialog generation tasks. However, performance on task is not optimal expected. In this paper, we propose a Pre-trained Role Alternating Language model (PRAL), designed specifically for task-oriented conversational systems. We adopted (Wu et al., 2019) that two speakers separately. also design several techniques, start position randomization,...

10.48550/arxiv.2004.13835 preprint EN other-oa arXiv (Cornell University) 2020-01-01

HPLFlowNet: Hierarchical Permutohedral Lattice FlowNet for Scene Flow Estimation on Large-scale Point Clouds

OPENALEX - Publications

Xiuye Gu Yijie Wang Chongruo Wu Yong-Jae lee Panqu Wang

We present a novel deep neural network architecture for end-to-end scene flow estimation that directly operates on large-scale 3D point clouds. Inspired by Bilateral Convolutional Layers (BCL), we propose DownBCL, UpBCL, and CorrBCL operations restore structural information from unstructured clouds, fuse two consecutive Operating discrete sparse permutohedral lattice points, our architectural design is parsimonious in computational cost. Our model can efficiently process pair of cloud frames...

10.48550/arxiv.1906.05332 preprint EN other-oa arXiv (Cornell University) 2019-01-01

VideoSAM: Open-World Video Segmentation

OPENALEX - Publications

Pinxue Guo Zixu Zhao Jianxiong Gao Chongruo Wu Tong He and 3 more

Video segmentation is essential for advancing robotics and autonomous driving, particularly in open-world settings where continuous perception object association across video frames are critical. While the Segment Anything Model (SAM) has excelled static image segmentation, extending its capabilities to poses significant challenges. We tackle two major hurdles: a) SAM's embedding limitations associating objects frames, b) granularity inconsistencies segmentation. To this end, we introduce...

10.48550/arxiv.2410.08781 preprint EN arXiv (Cornell University) 2024-10-11

A Unified Efficient Pyramid Transformer for Semantic Segmentation

OPENALEX - Publications

Fangrui Zhu Yi Zhu Li Zhang Chongruo Wu Yanwei Fu and 1 more

Semantic segmentation is a challenging problem due to difficulties in modeling context complex scenes and class confusions along boundaries. Most literature either focuses on or boundary refinement, which less generalizable open-world scenarios. In this work, we advocate unified framework(UN-EPT) segment objects by considering both information artifacts. We first adapt sparse sampling strategy incorporate the transformer-based attention mechanism for efficient modeling. addition, separate...

10.48550/arxiv.2107.14209 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Cruciform: Solving Crosswords with Natural Language Processing

OPENALEX - Publications

Dragomir Radev Rui Zhang Steve Wilson Derek Van Assche Henrique Spyra Gubert and 6 more

Crossword puzzles are popular word games that require not only a large vocabulary, but also broad knowledge of topics. Answering each clue is natural language task on its own as many clues contain nuances, puns, or counter-intuitive definitions. Additionally, it can be extremely difficult to ascertain definitive answers without the constraints crossword grid itself. This challenging for both humans and computers. We describe here new solving system, Cruciform. employ group components, which...

10.48550/arxiv.1611.02360 preprint EN other-oa arXiv (Cornell University) 2016-01-01