NFDI4DS | UHH-SEMS - Publication Details

Zhiwu Lu

ORCID: 0000-0003-0280-7724

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5085349794

Research Areas

Advanced Image and Video Retrieval Techniques
Image Retrieval and Classification Techniques
Multimodal Machine Learning Applications
Domain Adaptation and Few-Shot Learning
Remote-Sensing Image Classification
Human Pose and Action Recognition
Face and Expression Recognition
Video Analysis and Summarization
Bayesian Methods and Mixture Models
Sparse and Compressive Sensing Techniques
Text and Document Classification Technologies
Advanced Clustering Algorithms Research
Gaussian Processes and Bayesian Inference
Cancer-related molecular mechanisms research
Medical Image Segmentation Techniques
Gait Recognition and Analysis
Topic Modeling
Gene expression and cancer classification
Neural Networks and Applications
Target Tracking and Data Fusion in Sensor Networks
Generative Adversarial Networks and Image Synthesis
Image Processing Techniques and Applications
Music and Audio Processing
Machine Learning and ELM
Advanced Image Processing Techniques

Renmin University of China
2014-2024

Beijing Institute of Big Data Research
2021-2023

Peking University
2005-2014

Ministry of Education of the People's Republic of China
2014

King University
2013

City University of Hong Kong
2009-2011

Counterfactual VQA: A Cause-Effect Look at Language Bias

OPENALEX - Publications

Yulei Niu Kaihua Tang Hanwang Zhang Zhiwu Lu Xian‐Sheng Hua and 1 more

VQA models may tend to rely on language bias as a shortcut and thus fail sufficiently learn the multi-modal knowledge from both vision language. Recent debiasing methods proposed exclude prior during inference. However, they disentangle "good" context "bad" whole. In this paper, we investigate how mitigate in VQA. Motivated by causal effects, novel counterfactual inference framework, which enables us capture direct effect of questions answers reduce subtracting total effect. Experiments...

10.1109/cvpr46437.2021.01251 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

Towards artificial general intelligence via a multimodal foundation model

OPENALEX - Publications

Nanyi Fei Zhiwu Lu Yizhao Gao Guoxing Yang Yuqi Huo and 7 more

Abstract The fundamental goal of artificial intelligence (AI) is to mimic the core cognitive activities human. Despite tremendous success in AI research, most existing methods have only single-cognitive ability. To overcome this limitation and take a solid step towards general (AGI), we develop foundation model pre-trained with huge multimodal data, which can be quickly adapted for various downstream tasks. achieve goal, propose pre-train our by self-supervised learning weak semantic...

10.1038/s41467-022-30761-2 article EN cc-by Nature Communications 2022-06-02

Learning from Weak and Noisy Labels for Semantic Segmentation

OPENALEX - Publications

Zhiwu Lu Zhenyong Fu Tao Xiang Peng Han Liwei Wang and 1 more

A weakly supervised semantic segmentation (WSSS) method aims to learn a model from weak (image-level) as opposed strong (pixel-level) labels. By avoiding the tedious pixel-level annotation process, it can exploit unlimited supply of user-tagged images media-sharing sites such Flickr for large scale applications. However, these `free' tags/labels are often noisy and few existing works address problem learning with both In this work, we cast WSSS into label noise reduction problem....

10.1109/tpami.2016.2552172 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2016-04-09

Multi-Modal Multi-Scale Deep Learning for Large-Scale Image Annotation

OPENALEX - Publications

Yulei Niu Zhiwu Lu Ji-Rong Wen Tao Xiang Shih‐Fu Chang

Image annotation aims to annotate a given image with variable number of class labels corresponding diverse visual concepts. In this paper, we address two main issues in large-scale annotation: 1) how learn rich feature representation suitable for predicting set concepts ranging from object, scene abstract concept and 2) an the optimal labels. To first issue, propose novel multi-scale deep model extracting discriminative features capable representing wide range Specifically, two-branch neural...

10.1109/tip.2018.2881928 article EN IEEE Transactions on Image Processing 2018-11-16

COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval

OPENALEX - Publications

Haoyu Lu Nanyi Fei Yuqi Huo Yizhao Gao Zhiwu Lu and 1 more

Large-scale single-stream pre-training has shown dramatic performance in image-text retrieval. Regrettably, it faces low inference efficiency due to heavy attention layers. Recently, two-stream methods like CLIP and ALIGN with high have also promising performance, however, they only consider instance-level alignment between the two streams (thus there is still room for improvement). To overcome these limitations, we propose a novel COllaborative Two-Stream vision-language pretraining model...

10.1109/cvpr52688.2022.01524 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

Zero-Shot Scene Classification for High Spatial Resolution Remote Sensing Images

OPENALEX - Publications

Aoxue Li Zhiwu Lu Liwei Wang Tao Xiang Ji-Rong Wen

Due to the rapid technological development of various sensors, a huge volume high spatial resolution (HSR) image data can now be acquired. How efficiently recognize scenes from such HSR has become critical task. Conventional approaches remote sensing scene classification only utilize information images. Therefore, they always need large amount labeled and cannot images an unseen class without any visual sample in data. To overcome this drawback, we propose novel approach for recognizing...

10.1109/tgrs.2017.2689071 article EN IEEE Transactions on Geoscience and Remote Sensing 2017-04-17

CMsearch: simultaneous exploration of protein sequence space and structure space improves not only protein homology detection but also protein structure prediction

OPENALEX - Publications

Xuefeng Cui Zhiwu Lu Sheng Wang Jim Jing-Yan Wang Xin Gao

Abstract Motivation : Protein homology detection, a fundamental problem in computational biology, is an indispensable step toward predicting protein structures and understanding functions. Despite the advances recent decades on sequence alignment, threading alignment-free methods, detection remains challenging open problem. Recently, network methods that try to find transitive paths structure space demonstrate importance of incorporating information space. Yet, current merge into single...

10.1093/bioinformatics/btw271 article EN cc-by-nc Bioinformatics 2016-06-11

Zero and Few Shot Learning With Semantic Feature Synthesis and Competitive Learning

OPENALEX - Publications

Jiechao Guan Zhiwu Lu Tao Xiang Aoxue Li An Zhao and 1 more

Zero-shot learning (ZSL) is made possible by a projection function between feature space and semantic (e.g., an attribute space). Key to ZSL thus learn that robust against the often large domain gap seen unseen class domains. In this work, achieved data synthesis learning. Specifically, novel strategy proposed, which prototypes vectors) are used simply perturb for generating ones. As in any synthesis/hallucination approach, there ambiguities uncertainties on how well synthesised can capture...

10.1109/tpami.2020.2965534 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2020-01-10

A Molecular Multimodal Foundation Model Associating Molecule Graphs with Natural Language

OPENALEX - Publications

Bing Su Dazhao Du Yang Zhao Zhou Yujie Jiangmeng Li and 4 more

Although artificial intelligence (AI) has made significant progress in understanding molecules a wide range of fields, existing models generally acquire the single cognitive ability from molecular modality. Since hierarchy knowledge is profound, even humans learn different modalities including both intuitive diagrams and professional texts to assist their understanding. Inspired by this, we propose multimodal foundation model which pretrained graphs semantically related textual data (crawled...

10.48550/arxiv.2209.05481 preprint EN other-oa arXiv (Cornell University) 2022-01-01

UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-modal Modeling

OPENALEX - Publications

Haoyu Lu Mingyu Ding Yuqi Huo Guoxing Yang Zhiwu Lu and 2 more

Large-scale vision-language pre-trained models have shown promising transferability to various downstream tasks. As the size of these foundation and number tasks grow, standard full fine-tuning paradigm becomes unsustainable due heavy computational storage costs. This paper proposes UniAdapter, which unifies unimodal multimodal adapters for parameter-efficient cross-modal adaptation on models. Specifically, are distributed different modalities their interactions, with total tunable...

10.48550/arxiv.2302.06605 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Exhaustive and Efficient Constraint Propagation: A Graph-Based Learning Approach and Its Applications

OPENALEX - Publications

Zhiwu Lu Yuxin Peng

10.1007/s11263-012-0602-z article EN International Journal of Computer Vision 2012-12-13

Cross-modal Contrastive Learning for Generalizable and Efficient Image-text Retrieval

OPENALEX - Publications

Haoyu Lu Yuqi Huo Mingyu Ding Nanyi Fei Zhiwu Lu

Cross-modal image-text retrieval is a fundamental task in bridging vision and language. It faces two main challenges that are typically not well addressed previous works. 1) Generalizability: Existing methods often assume strong semantic correlation between each text-image pair, which thus difficult to generalize real-world scenarios where the weak dominates. 2) Efficiency: Many latest works adopt single-tower architecture with heavy detectors, inefficient during inference stage because...

10.1007/s11633-022-1386-4 article EN Deleted Journal 2023-05-02

Leveraging Large Vision-Language Model as User Intent-Aware Encoder for Composed Image Retrieval

OPENALEX - Publications

Zelong Sun Jing Dong Guoxing Yang Nanyi Fei Zhiwu Lu

Composed Image Retrieval (CIR) aims to retrieve target images from candidate set using a hybrid-modality query consisting of reference image and relative caption that describes the user intent. Recent studies attempt utilize Vision-Language Pre-training Models (VLPMs) with various fusion strategies for addressing task. However, these methods typically fail simultaneously meet two key requirements CIR: comprehensively extracting visual information faithfully following In this work, we propose...

10.1609/aaai.v39i7.32768 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2025-04-11

Noise-robust semi-supervised learning via fast sparse coding

OPENALEX - Publications

Zhiwu Lu Liwei Wang

10.1016/j.patcog.2014.08.019 article EN Pattern Recognition 2014-08-29

Image categorization with spatial mismatch kernels

OPENALEX - Publications

Zhiwu Lu Horace H. S. Ip

This paper presents a new class of 2D string kernels, called spatial mismatch for use with support vector machine (SVM) in discriminative approach to the image categorization problem. We first represent images as sequences those visual keywords obtained by clustering all blocks that we divide into on regular grid. Through decomposing each sequence two parallel 1D (i.e. row-wise and column-wise ones), our kernels can then measure similarity based shared occurrences k-length subsequences,...

10.1109/cvpr.2009.5206861 article EN 2009 IEEE Conference on Computer Vision and Pattern Recognition 2009-06-01

Multi-modal constraint propagation for heterogeneous image clustering

OPENALEX - Publications

Zhenyong Fu Horace H. S. Ip Hongtao Lu Zhiwu Lu

This paper presents a multi-modal constraint propagation approach to exploiting pairwise constraints for constrained clustering tasks on datasets. Pairwise methods have previously been designed primarily single modality data and cannot be directly applied or dataset with multiple representations. In this paper, we provide an effective solution the problem by decomposing it into set of independent multi-graph based two-class label subproblems which are then merged unified solved quadratic...

10.1145/2072298.2072318 article EN Proceedings of the 30th ACM International Conference on Multimedia 2011-11-28

Image categorization by learning with context and consistency

OPENALEX - Publications

Zhiwu Lu Horace H. S. Ip

This paper presents a novel semi-supervised learning method which can make use of intra-image semantic context and inter-image cluster consistency for image categorization with less labeled data. The representation is first formed the visual keywords generated by clustering all blocks that we divide images into. 2D spatial Markov chain model then proposed to capture across these within an image. To develop graph-based approach categorization, incorporate into kind kernel be used as affinity...

10.1109/cvpr.2009.5206851 article EN 2009 IEEE Conference on Computer Vision and Pattern Recognition 2009-06-01

Contextual Kernel and Spectral Methods for Learning the Semantics of Images

OPENALEX - Publications

Zhiwu Lu Horace H. S. Ip Yuxin Peng

This paper presents contextual kernel and spectral methods for learning the semantics of images that allow us to automatically annotate an image with keywords. First, exploit context visual words within automatic annotation, we define a novel spatial string quantify similarity between images. Specifically, represent each as 2-D sequence measure two sequences using shared occurrences <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">s</i> -length 1-D...

10.1109/tip.2010.2103082 article EN IEEE Transactions on Image Processing 2011-01-03

Image classification by visual bag-of-words refinement and reduction

OPENALEX - Publications

Zhiwu Lu Liwei Wang Ji-Rong Wen

10.1016/j.neucom.2015.01.098 article EN Neurocomputing 2015-09-03

Transferrable Feature and Projection Learning with Class Hierarchy for Zero-Shot Learning

OPENALEX - Publications

Aoxue Li Zhiwu Lu Jiechao Guan Tao Xiang Liwei Wang and 1 more

10.1007/s11263-020-01342-x article EN International Journal of Computer Vision 2020-06-03

Self-Supervised Video Representation Learning with Constrained Spatiotemporal Jigsaw

OPENALEX - Publications

Yuqi Huo Mingyu Ding Haoyu Lu Ziyuan Huang Mingqian Tang and 2 more

This paper proposes a novel pretext task for self-supervised video representation learning by exploiting spatiotemporal continuity in videos. It is motivated the fact that videos are nature and learned detecting continuity/discontinuity thus beneficial downstream content analysis tasks. A natural choice of such to construct (3D) jigsaw puzzles learn solve them. However, as we demonstrate experiments, this turns out be intractable. We propose Constrained Spatiotemporal Jigsaw (CSJ) whereby 3D...

10.24963/ijcai.2021/104 article EN 2021-08-01

Image categorization via robust pLSA

OPENALEX - Publications

Zhiwu Lu Yuxin Peng Horace H. S. Ip

10.1016/j.patrec.2009.09.003 article EN Pattern Recognition Letters 2009-09-09

Coming Soon ...