NFDI4DS | UHH-SEMS - Publication Details

MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems

OPENALEX - Publications

Tianqi Chen Mu Li Yutian Li Min Lin Naiyan Wang and 5 more

MXNet is a multi-language machine learning (ML) library to ease the development of ML algorithms, especially for deep neural networks. Embedded in host language, it blends declarative symbolic expression with imperative tensor computation. It offers auto differentiation derive gradients. computation and memory efficient runs on various heterogeneous systems, ranging from mobile devices distributed GPU clusters. This paper describes both API design system implementation MXNet, explains how...

10.48550/arxiv.1512.01274 preprint EN cc-by arXiv (Cornell University) 2015-01-01

The application of two-level attention models in deep convolutional neural network for fine-grained image classification

OPENALEX - Publications

Tianjun Xiao Yichong Xu Kuiyuan Yang Jiaxing Zhang Yuxin Peng and 1 more

Fine-grained classification is challenging because categories can only be discriminated by subtle and local differences. Variances in the pose, scale or rotation usually make problem more difficult. Most fine-grained systems follow pipeline of finding foreground object parts (where) to extract discriminative features (what). In this paper, we propose apply visual attention task using deep neural network. Our integrates three types attention: bottom-up that candidate patches, object-level...

10.1109/cvpr.2015.7298685 preprint EN 2015-06-01

Deep Graph Library: A Graph-Centric, Highly-Performant Package for Graph Neural Networks

OPENALEX - Publications

Minjie Wang Da Zheng Zihao Ye Quan Gan Mufei Li and 10 more

Advancing research in the emerging field of deep graph learning requires new tools to support tensor computation over graphs. In this paper, we present design principles and implementation Deep Graph Library (DGL). DGL distills computational patterns GNNs into a few generalized sparse operations suitable for extensive parallelization. By advocating as central programming abstraction, can perform optimizations transparently. cautiously adopting framework-neutral design, allows users easily...

10.48550/arxiv.1909.01315 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Error-Driven Incremental Learning in Deep Convolutional Neural Network for Large-Scale Image Classification

OPENALEX - Publications

Tianjun Xiao Jiaxing Zhang Kuiyuan Yang Yuxin Peng Zheng Zhang

Supervised learning using deep convolutional neural network has shown its promise in large-scale image classification task. As a building block, it is now well positioned to be part of larger system that tackles real-life multimedia tasks. An unresolved issue such model trained on static snapshot data. Instead, this paper positions the training as continuous process new classes data arrive. A with capability useful practical scenarios, gradually expands capacity predict increasing number...

10.1145/2647868.2654926 article EN 2014-10-31

Scale-Invariant Convolutional Neural Networks

OPENALEX - Publications

Yichong Xu Tianjun Xiao Jiaxing Zhang Kuiyuan Yang Zheng Zhang

Even though convolutional neural networks (CNN) has achieved near-human performance in various computer vision tasks, its ability to tolerate scale variations is limited. The popular practise making the model bigger first, and then train it with data augmentation using extensive scale-jittering. In this paper, we propose a scaleinvariant network (SiCNN), modeldesigned incorporate multi-scale feature exaction classification into structure. SiCNN uses multi-column architecture, each column...

10.48550/arxiv.1411.6369 preprint EN other-oa arXiv (Cornell University) 2014-01-01

Hallucination of Multimodal Large Language Models: A Survey

OPENALEX - Publications

Zechen Bai Pichao Wang Tianjun Xiao Tong He Zongbo Han and 2 more

This survey presents a comprehensive analysis of the phenomenon hallucination in multimodal large language models (MLLMs), also known as Large Vision-Language Models (LVLMs), which have demonstrated significant advancements and remarkable abilities tasks. Despite these promising developments, MLLMs often generate outputs that are inconsistent with visual content, challenge hallucination, poses substantial obstacles to their practical deployment raises concerns regarding reliability...

10.48550/arxiv.2404.18930 preprint EN arXiv (Cornell University) 2024-04-29

OpenVIS: Open-vocabulary Video Instance Segmentation

OPENALEX - Publications

Pinxue Guo Hao Huang Peiyang He Xuefeng Liu Tianjun Xiao and 1 more

Open-vocabulary Video Instance Segmentation (OpenVIS) can simultaneously detect, segment, and track arbitrary object categories in a video, without being constrained to seen during training. In this work, we propose InstFormer, carefully designed framework for the OpenVIS task that achieves powerful open-vocabulary capabilities through lightweight fine-tuning with limited-category data. InstFormer begins open-world mask proposal network, encouraged all potential instance class-agnostic masks...

10.1609/aaai.v39i3.32338 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2025-04-11

Learning Hierarchical Graph Neural Networks for Image Clustering

OPENALEX - Publications

Yifan Xing Tong He Tianjun Xiao Yongxin Wang Yuanjun Xiong and 4 more

We propose a hierarchical graph neural network (GNN) model that learns how to cluster set of images into an unknown number identities using training annotated with labels belonging disjoint identities. Our GNN uses novel approach merge connected components predicted at each level the hierarchy form new next level. Unlike fully unsupervised clustering, choice grouping and complexity criteria stems naturally from supervision in set. The resulting method, Hi-LANDER, achieves average 49%...

10.1109/iccv48922.2021.00345 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

Bridging the Gap to Real-World Object-Centric Learning

OPENALEX - Publications

Maximilian Seitzer Max Horn Andrii Zadaianchuk Dominik Zietlow Tianjun Xiao and 6 more

Humans naturally decompose their environment into entities at the appropriate level of abstraction to act in world. Allowing machine learning algorithms derive this decomposition an unsupervised way has become important line research. However, current methods are restricted simulated data or require additional information form motion depth order successfully discover objects. In work, we overcome limitation by showing that reconstructing features from models trained a self-supervised manner...

10.48550/arxiv.2209.14860 preprint EN other-oa arXiv (Cornell University) 2022-01-01

LayoutDiffuse: Adapting Foundational Diffusion Models for Layout-to-Image Generation

OPENALEX - Publications

Jiaxin Cheng Xiao Liang Xingjian Shi Tong He Tianjun Xiao and 1 more

Layout-to-image generation refers to the task of synthesizing photo-realistic images based on semantic layouts. In this paper, we propose LayoutDiffuse that adapts a foundational diffusion model pretrained large-scale image or text-image datasets for layout-to-image generation. By adopting novel neural adaptor layout attention and task-aware prompts, our method trains efficiently, generates with both high perceptual quality alignment, needs less data. Experiments three show significantly...

10.48550/arxiv.2302.08908 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Coarse-to-Fine Amodal Segmentation with Shape Prior

OPENALEX - Publications

Jianxiong Gao Xuelin Qian Yikai Wang Tianjun Xiao Tong He and 2 more

Amodal object segmentation is a challenging task that involves segmenting both visible and occluded parts of an object. In this paper, we propose novel approach, called Coarse-to-Fine Segmentation (C2F-Seg), addresses problem by progressively modeling the amodal segmentation. C2F-Seg initially reduces learning space from pixel-level image to vector-quantized latent space. This enables us better handle long-range dependencies learn coarse-grained segment visual features segments. However,...

10.1109/iccv51070.2023.00122 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

Bag-of-Words Based Deep Neural Network for Image Retrieval

OPENALEX - Publications

Yalong Bai Wei Yu Tianjun Xiao Chang Xu Kuiyuan Yang and 2 more

This work targets image retrieval task hold by MSR-Bing Grand Challenge. Image is considered as a challenge because of the gap between low-level representation and high-level textual query representation. Recently further developed deep neural network sheds light on narrowing learning from raw pixels. In this paper, we proposed bag-of-words based for task, which learns maps images into space. The DNN model trained large scale clickthrough data, relevance measured cosine similarity query's...

10.1145/2647868.2656402 article EN 2014-10-31

Object-Centric Multiple Object Tracking

OPENALEX - Publications

Zixu Zhao Jiaze Wang Max Horn Yizhuo Ding Tong He and 11 more

Unsupervised object-centric learning methods allow the partitioning of scenes into entities without additional localization information and are excellent candidates for reducing annotation burden multiple-object tracking (MOT) pipelines. Unfortunately, they lack two key properties: objects often split parts not consistently tracked over time. In fact, state-of-the-art models achieve pixel-level accuracy temporal consistency by relying on supervised object detection with ID labels association...

10.1109/iccv51070.2023.01522 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

The Application of Two-level Attention Models in Deep Convolutional Neural Network for Fine-grained Image Classification

OPENALEX - Publications

Tianjun Xiao Yichong Xu Kuiyuan Yang Jiaxing Zhang Yuxin Peng and 1 more

Fine-grained classification is challenging because categories can only be discriminated by subtle and local differences. Variances in the pose, scale or rotation usually make problem more difficult. Most fine-grained systems follow pipeline of finding foreground object parts (where) to extract discriminative features (what). In this paper, we propose apply visual attention task using deep neural network. Our integrates three types attention: bottom-up that candidate patches, object-level...

10.48550/arxiv.1411.6447 preprint EN other-oa arXiv (Cornell University) 2014-01-01

OpenVIS: Open-vocabulary Video Instance Segmentation

OPENALEX - Publications

Pinxue Guo Tony Jun Huang Peiyang He Xuefeng Liu Tianjun Xiao and 2 more

Open-vocabulary Video Instance Segmentation (OpenVIS) can simultaneously detect, segment, and track arbitrary object categories in a video, without being constrained to seen during training. In this work, we propose an OpenVIS framework called InstFormer that achieves powerful open vocabulary capability through lightweight fine-tuning on limited-category labeled dataset. Specifically, comes three steps a) Open-world Mask Proposal: utilize query-based transformer, which is encouraged all...

10.48550/arxiv.2305.16835 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Rethinking Amodal Video Segmentation from Learning Supervised Signals with Object-centric Representation

OPENALEX - Publications

Ke Fan Jingshi Lei Xuelin Qian Miaopeng Yu Tianjun Xiao and 3 more

Video amodal segmentation is a particularly challenging task in computer vision, which requires to deduce the full shape of an object from visible parts it. Recently, some studies have achieved promising performance by using motion flow integrate information across frames under self-supervised setting. However, has clear limitation two factors moving cameras and deformation. This paper presents rethinking previous works. We leverage supervised signals with object-centric representation...

10.1109/iccv51070.2023.00123 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

Unsupervised Open-Vocabulary Object Localization in Videos

OPENALEX - Publications

Ke Fan Zechen Bai Tianjun Xiao Dominik Zietlow Max Horn and 9 more

In this paper, we show that recent advances in video representation learning and pre-trained vision-language models allow for substantial improvements self-supervised object localization. We propose a method first localizes objects videos via slot attention approach then assigns text to the obtained slots. The latter is achieved by an unsupervised way read localized semantic information from CLIP model. resulting localization entirely apart implicit annotation contained CLIP, it effectively...

10.1109/iccv51070.2023.01264 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

Self-supervised Amodal Video Object Segmentation

OPENALEX - Publications

Jian Yao Yuxin Hong Chiyu Wang Tianjun Xiao Tong He and 4 more

Amodal perception requires inferring the full shape of an object that is partially occluded. This task particularly challenging on two levels: (1) it more information than what contained in instant retina or imaging sensor, (2) difficult to obtain enough well-annotated amodal labels for supervision. To this end, paper develops a new framework Self-supervised Video segmentation (SaVos). Our method efficiently leverages visual video temporal sequences infer mask objects. The key intuition...

10.48550/arxiv.2210.12733 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Adaptive Slot Attention: Object Discovery with Dynamic Slot Number

OPENALEX - Publications

Ke Fan Zechen Bai Tianjun Xiao Tong He Max Horn and 3 more

Object-centric learning (OCL) extracts the representation of objects with slots, offering an exceptional blend flexibility and interpretability for abstracting low-level perceptual features. A widely adopted method within OCL is slot attention, which utilizes attention mechanisms to iteratively refine representations. However, a major drawback most object-centric models, including their reliance on predefining number slots. This not only necessitates prior knowledge dataset but also...

10.48550/arxiv.2406.09196 preprint EN arXiv (Cornell University) 2024-06-13

GQE: Generalized Query Expansion for Enhanced Text-Video Retrieval

OPENALEX - Publications

Zechen Bai Tianjun Xiao Tong He Pichao Wang Zheng Zhang and 2 more

In the rapidly expanding domain of web video content, task text-video retrieval has become increasingly critical, bridging semantic gap between textual queries and data. This paper introduces a novel data-centric approach, Generalized Query Expansion (GQE), to address inherent information imbalance text video, enhancing effectiveness systems. Unlike traditional model-centric methods that focus on designing intricate cross-modal interaction mechanisms, GQE aims expand associated with videos...

10.48550/arxiv.2408.07249 preprint EN arXiv (Cornell University) 2024-08-13

Adaptive Slot Attention: Object Discovery with Dynamic Slot Number

OPENALEX - Publications

Ke Fan Zechen Bai Tianjun Xiao Tong He Max Horn and 3 more

10.1109/cvpr52733.2024.02176 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

Learning for Transductive Threshold Calibration in Open-World Recognition

OPENALEX - Publications

Qin Zhang Dongsheng An Tianjun Xiao Tong He Qingming Tang and 3 more

10.1109/cvpr52733.2024.01618 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16