NFDI4DS | UHH-SEMS - Publication Details

Dongfang Liu

ORCID: 0000-0001-6995-4775

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5101979289

Research Areas

Advanced Neural Network Applications
Advanced Image and Video Retrieval Techniques
Video Surveillance and Tracking Methods
Multimodal Machine Learning Applications
Robotics and Sensor-Based Localization
Domain Adaptation and Few-Shot Learning
Autonomous Vehicle Technology and Safety
Adversarial Robustness in Machine Learning
Human Pose and Action Recognition
Neural Networks and Applications
Advanced Vision and Imaging
Visual Attention and Saliency Detection
Cell Image Analysis Techniques
Face recognition and analysis
Neural Networks Stability and Synchronization
Anomaly Detection Techniques and Applications
Integrated Circuits and Semiconductor Failure Analysis
Translation Studies and Practices
Higher Education and Teaching Methods
Video Analysis and Summarization
Advanced Memory and Neural Computing
Advanced Graph Neural Networks
Image Processing Techniques and Applications
Text and Document Classification Technologies
Medical Image Segmentation Techniques

Rochester Institute of Technology
2021-2025

Southwest University
2019-2023

University of Michigan
2022

Baxter (United States)
2022

Michigan Department of Transportation
2022

Purdue University West Lafayette
2019-2022

Shandong Institute for Product Quality Inspection
2021

Purdue University System
2020

Tianjin Tianhe Hospital
2017-2019

Tianjin Medical University
2017-2019

SG-Net: Spatial Granularity Network for One-Stage Video Instance Segmentation

OPENALEX - Publications

Dongfang Liu Yiming Cui Wenbo Tan Yingjie Chen

Video instance segmentation (VIS) is a new and critical task in computer vision. To date, top-performing VIS methods extend the two-stage Mask R-CNN by adding tracking branch, leaving plenty of room for improvement. In contrast, we approach from perspective propose one-stage spatial granularity network (SG-Net). Compared to conventional methods, SG-Net demonstrates four advantages: 1) Our method has compact architecture each head (detection, segmentation, tracking) crafted interdependently...

10.1109/cvpr46437.2021.00969 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

TF-Blender: Temporal Feature Blender for Video Object Detection

OPENALEX - Publications

Yiming Cui Liqi Yan Zhiwen Cao Dongfang Liu

Video objection detection is a challenging task because isolated video frames may encounter appearance deterioration, which introduces great confusion for detection. One of the popular solutions to exploit temporal information and enhance per-frame representation through aggregating features from neighboring frames. Despite achieving improvements in detection, existing methods focus on selection higher-level aggregation rather than modeling lower-level relations increase feature...

10.1109/iccv48922.2021.00803 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

DenserNet: Weakly Supervised Visual Localization Using Multi-Scale Feature Aggregation

OPENALEX - Publications

Dongfang Liu Yiming Cui Liqi Yan Christos Mousas Baijian Yang and 1 more

In this work, we introduce a Denser Feature Network(DenserNet) for visual localization. Our work provides three principal contributions. First, develop convolutional neural network (CNN) architecture which aggregates feature maps at different semantic levels image representations. Using denser maps, our method can produce more key point features and increase retrieval accuracy. Second, model is trained end-to-end without pixel-level an-notation other than positive negative GPS-tagged pairs....

10.1609/aaai.v35i7.16760 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2021-05-18

Video Captioning Using Global-Local Representation

OPENALEX - Publications

Liqi Yan Siqi Ma Qifan Wang Yingjie Chen Xiangyu Zhang and 2 more

Video captioning is a challenging task as it needs to accurately transform visual understanding into natural language description. To date, state-of-the-art methods inadequately model global-local vision representation for sentence generation, leaving plenty of room improvement. In this work, we approach the video from new perspective and propose GLR framework, namely granularity. Our demonstrates three advantages over prior efforts. First, simple solution, which exploits extensive...

10.1109/tcsvt.2022.3177320 article EN IEEE Transactions on Circuits and Systems for Video Technology 2022-05-23

Coarse-to-Fine Video Instance Segmentation With Factorized Conditional Appearance Flows

OPENALEX - Publications

Zheyun Qin Xiankai Lu Xiushan Nie Dongfang Liu Yilong Yin and 1 more

We introduce a novel method using new generative model that automatically learns effective representations of the target and background appearance to detect, segment track each instance in video sequence. Differently from current discriminative tracking-by-detection solutions, our proposed hierarchical structural embedding learning can predict more high-quality masks with accurate boundary details over spatio-temporal space via normalizing flows. formulate inference procedure as embedded...

10.1109/jas.2023.123456 article EN IEEE/CAA Journal of Automatica Sinica 2023-05-01

TransFlow: Transformer as Flow Learner

OPENALEX - Publications

Yawen Lu Qifan Wang Siqi Ma Tong Geng Yingjie Chen and 2 more

Optical flow is an indispensable building block for various important computer vision tasks, including motion estimation, object tracking, and disparity measurement. In this work, we propose TransFlow, a pure transformer architecture optical estimation. Compared to dominant CNN-based methods, TransFlow demonstrates three advantages. First, it provides more accurate correlation trustworthy matching in estimation by utilizing spatial self-attention crossattention mechanisms between adjacent...

10.1109/cvpr52729.2023.01732 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Tripartite Feature Enhanced Pyramid Network for Dense Prediction

OPENALEX - Publications

Dongfang Liu James Liang Tony Geng Alexander C. Loui Tianfei Zhou

Learning pyramidal feature representations is important for many dense prediction tasks (e.g., object detection, semantic segmentation) that demand multi-scale visual understanding. Feature Pyramid Network (FPN) a well-known architecture learning, however, intrinsic weaknesses in extraction and fusion impede the production of informative features. This work addresses FPN through novel tripartite enhanced pyramid network (TFPN), with three distinct effective designs. First, we develop...

10.1109/tip.2023.3272826 article EN IEEE Transactions on Image Processing 2023-01-01

Video object detection for autonomous driving: Motion-aid feature calibration

OPENALEX - Publications

Dongfang Liu Yiming Cui Yingjie Chen Jiyong Zhang Bin Fan

10.1016/j.neucom.2020.05.027 article EN Neurocomputing 2020-05-27

A Vector-based Representation to Enhance Head Pose Estimation

OPENALEX - Publications

Zhiwen Cao Zongcheng Chu Dongfang Liu Yingjie Chen

This paper proposes to use the three vectors in a rotation matrix as representation head pose estimation and develops new neural network based on characteristic of such representation. We address two potential issues existed current works: 1. Public datasets for either Euler angles or quaternions annotate data samples. However, both these annotations have issue discontinuity thus could result some performance training. 2. Most research works report Mean Absolute Error (MAE) measurement...

10.1109/wacv48630.2021.00123 article EN 2021-01-01

Real-time detection of particleboard surface defects based on improved YOLOV5 target detection

OPENALEX - Publications

Ziyu Zhao Xiaoxia Yang Yucheng Zhou Qinqian Sun Zhedong Ge and 1 more

Particleboard surface defect detection technology is of great significance to the automation particleboard detection, but current has disadvantages such as low accuracy and poor real-time performance. Therefore, this paper proposes an improved lightweight method You Only Live Once v5 (YOLOv5), namely PB-YOLOv5 (Particle Board-YOLOv5). Firstly, gamma-ray transform image difference are combined deal with uneven illumination acquired images, so that well corrected. Secondly, Ghost Bottleneck...

10.1038/s41598-021-01084-x article EN cc-by Scientific Reports 2021-11-05

WebFormer: The Web-page Transformer for Structure Information Extraction

OPENALEX - Publications

Qifan Wang Yi Fang Anirudh Ravula Fuli Feng Xiaojun Quan and 1 more

Structure information extraction refers to the task of extracting structured text fields from web pages, such as a product offer shopping page including title, description, brand and price. It is an important research topic which has been widely studied in document understanding search. Recent natural language models with sequence modeling have demonstrated state-of-the-art performance on extraction. However, effectively serializing tokens unstructured pages challenging practice due variety...

10.1145/3485447.3512032 article EN Proceedings of the ACM Web Conference 2022 2022-04-25

Visual Recognition with Deep Nearest Centroids

OPENALEX - Publications

Wenguan Wang Han Cheng Tianfei Zhou Dongfang Liu

We devise deep nearest centroids (DNC), a conceptually elegant yet surprisingly effective network for large-scale visual recognition, by revisiting Nearest Centroids, one of the most classic and simple classifiers. Current models learn classifier in fully parametric manner, ignoring latent data structure lacking simplicity explainability. DNC instead conducts nonparametric, case-based reasoning; it utilizes sub-centroids training samples to describe class distributions clearly explains...

10.48550/arxiv.2209.07383 preprint EN other-oa arXiv (Cornell University) 2022-01-01

GL-RG: Global-Local Representation Granularity for Video Captioning

OPENALEX - Publications

Liqi Yan Qifan Wang Yiming Cui Fuli Feng Xiaojun Quan and 2 more

Video captioning is a challenging task as it needs to accurately transform visual understanding into natural language description. To date, state-of-the-art methods inadequately model global-local representation across video frames for caption generation, leaving plenty of room improvement. In this work, we approach the from new perspective and propose GL-RG framework captioning, namely Global-Local Representation Granularity. Our demonstrates three advantages over prior efforts: 1)...

10.24963/ijcai.2022/384 article EN Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence 2022-07-01

GeneSegNet: a deep learning framework for cell segmentation by integrating gene expression and imaging

OPENALEX - Publications

Yuxing Wang Wenguan Wang Dongfang Liu Wenpin Hou Tianfei Zhou and 1 more

When analyzing data from in situ RNA detection technologies, cell segmentation is an essential step identifying boundaries, assigning reads to cells, and studying the gene expression morphological features of cells. We developed a deep-learning-based method, GeneSegNet, that integrates both imaging information perform segmentation. GeneSegNet also employs recursive training strategy deal with noisy labels. show significantly improves performances over existing methods either ignore or...

10.1186/s13059-023-03054-0 article EN cc-by Genome biology 2023-10-19

Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval

OPENALEX - Publications

Jiamian Wang Pichao Wang Guohao Sun Dongfang Liu Sohail A. Dianat and 3 more

10.1109/cvpr52733.2024.01566 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

Image Translation as Diffusion Visual Programmers

OPENALEX - Publications

Cheng Han James C. Liang Qifan Wang Majid Rabbani Sohail A. Dianat and 3 more

We introduce the novel Diffusion Visual Programmer (DVP), a neuro-symbolic image translation framework. Our proposed DVP seamlessly embeds condition-flexible diffusion model within GPT architecture, orchestrating coherent sequence of visual programs (i.e., computer vision models) for various pro-symbolic steps, which span RoI identification, style transfer, and position manipulation, facilitating transparent controllable processes. Extensive experiments demonstrate DVP's remarkable...

10.48550/arxiv.2401.09742 preprint EN other-oa arXiv (Cornell University) 2024-01-01

Self-supervised Adversarial Training of Monocular Depth Estimation against Physical-World Attacks

OPENALEX - Publications

Zhiyuan Cheng Cheng Han James Liang Qifan Wang Xiangyu Zhang and 1 more

Monocular Depth Estimation (MDE) plays a vital role in applications such as autonomous driving.However, various attacks target MDE models, with physical posing significant threats to system security.Traditional adversarial training methods, which require ground-truth labels, are not directly applicable models that lack depth.Some self-supervised model hardening techniques (e.g., contrastive learning) overlook the domain knowledge of MDE, resulting suboptimal performance.In this work, we...

10.1109/tpami.2024.3412632 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2024-06-17

Rethinking Copy-Paste for Consistency Learning in Medical Image Segmentation

OPENALEX - Publications

Senlong Huang Yongxin Ge Dongfang Liu Mingjian Hong Junhan Zhao and 1 more

Semi-supervised learning based on consistency offers significant promise for enhancing medical image segmentation. Current approaches use copy-paste as an effective data perturbation technique to facilitate weak-to-strong learning. However, these techniques often lead a decrease in the accuracy of synthetic labels corresponding and introduce excessive perturbations distribution training data. Such over-perturbation causes stray from its true distribution, thereby impairing model's...

10.1109/tip.2025.3536208 article EN IEEE Transactions on Image Processing 2025-01-01

Deep Robotic Grasping Prediction with Hierarchical RGB-D Fusion

OPENALEX - Publications

Yaoxian Song Jun Wen Dongfang Liu Changbin Yu

10.1007/s12555-020-0197-z article EN International Journal of Control Automation and Systems 2022-01-01

CLUSTSEG: Clustering for Universal Segmentation

OPENALEX - Publications

James Liang Tianfei Zhou Dongfang Liu Wenguan Wang

We present CLUSTSEG, a general, transformer-based framework that tackles different image segmentation tasks (i.e., superpixel, semantic, instance, and panoptic) through unified neural clustering scheme. Regarding queries as cluster centers, CLUSTSEG is innovative in two aspects:1) centers are initialized heterogeneous ways so to pointedly address task-specific demands (e.g., instance- or category-level distinctiveness), yet without modifying the architecture; 2) pixel-cluster assignment,...

10.48550/arxiv.2305.02187 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Prompt Learns Prompt: Exploring Knowledge-Aware Generative Prompt Collaboration For Video Captioning

OPENALEX - Publications

Liqi Yan Cheng Han Zenglin Xu Dongfang Liu Qifan Wang

Fine-tuning large vision-language models is a challenging task. Prompt tuning approaches have been introduced to learn fixed textual or visual prompts while freezing the pre-trained model in downstream tasks. Despite effectiveness of prompt tuning, what do those learnable remains unexplained. In this work, we explore whether fine-tuning can knowledge-aware from pre-training, by designing two different sets pre-training and phases respectively. Specifically, present Video-Language (VL-Prompt)...

10.24963/ijcai.2023/180 article EN 2023-08-01

E2VPT: An Effective and Efficient Approach for Visual Prompt Tuning

OPENALEX - Publications

Cheng Han Qifan Wang Yiming Cui Zhiwen Cao Wenguan Wang and 2 more

As the size of transformer-based, models continues to grow, fine-tuning these large-scale pretrained vision for new tasks has become increasingly parameter-intensive. Parameter-efficient learning been developed reduce number tunable parameters during fine-tuning. Although methods show promising results, there is still a significant performance gap compared full To address this challenge, we propose an Effective and Efficient Visual Prompt Tuning (E <sup...

10.1109/iccv51070.2023.01604 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

Reformulating Graph Kernels for Self-Supervised Space-Time Correspondence Learning

OPENALEX - Publications

Zheyun Qin Xiankai Lu Dongfang Liu Xiushan Nie Yilong Yin and 2 more

Self-supervised space-time correspondence learning utilizing unlabeled videos holds great potential in computer vision. Most existing methods rely on contrastive with mining negative samples or adapting reconstruction from the image domain, which requires dense affinity across multiple frames optical flow constraints. Moreover, video prediction models need to uncover more inherent properties of video, such as structural information. In this work, we propose HiGraph+, a sophisticated...

10.1109/tip.2023.3328485 article EN IEEE Transactions on Image Processing 2023-01-01

MUSTIE: Multimodal Structural Transformer for Web Information Extraction

OPENALEX - Publications

Qifan Wang Jingang Wang Xiaojun Quan Fuli Feng Zenglin Xu and 5 more

Qifan Wang, Jingang Xiaojun Quan, Fuli Feng, Zenglin Xu, Shaoliang Nie, Sinong Madian Khabsa, Hamed Firooz, Dongfang Liu. Proceedings of the 61st Annual Meeting Association for Computational Linguistics (Volume 1: Long Papers). 2023.

10.18653/v1/2023.acl-long.135 article EN cc-by 2023-01-01

Coming Soon ...