NFDI4DS | UHH-SEMS - Publication Details

Jianwei Yang

ORCID: 0000-0002-2022-6002

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5100632859

Research Areas

Advanced Image and Video Retrieval Techniques
Image Retrieval and Classification Techniques
Multimodal Machine Learning Applications
Domain Adaptation and Few-Shot Learning
Image and Signal Denoising Methods
Medical Image Segmentation Techniques
Image and Object Detection Techniques
Advanced Neural Network Applications
Topic Modeling
Advanced Data Compression Techniques
Advanced Vision and Imaging
Railway Engineering and Dynamics
Digital Media Forensic Detection
Natural Language Processing Techniques
Digital Filter Design and Implementation
Advanced Steganography and Watermarking Techniques
Advanced Image Fusion Techniques
Optical measurement and interference techniques
Image Processing and 3D Reconstruction
COVID-19 diagnosis using AI
Generative Adversarial Networks and Image Synthesis
Face and Expression Recognition
Radiomics and Machine Learning in Medical Imaging
Chaos-based Image/Signal Encryption
Remote Sensing and Land Use

Xiamen University of Technology
2025

Xi'an University of Technology
2024

Hebei Eye Hospital
2024

Microsoft (United States)
2024

Nanjing University of Information Science and Technology
2010-2023

Nanyang Normal University
2009-2023

Dhurakij Pundit University
2023

Southwest Jiaotong University
2010-2023

Microsoft Research (United Kingdom)
2023

Shenzhen University
2022

GLIGEN: Open-Set Grounded Text-to-Image Generation

OPENALEX - Publications

Yuheng Li Haotian Liu Qingyang Wu Fangzhou Mu Jianwei Yang and 3 more

Large-scale text-to-image diffusion models have made amazing advances. However, the status quo is to use text input alone, which can impede controllability. In this work, we propose GLIGEN, Grounded-Language-to-Image Generation, a novel approach that builds upon and extends functionality of existing pre-trained by enabling them also be conditioned on grounding inputs. To preserve vast concept knowledge model, freeze all its weights inject information into new trainable layers via gated...

10.1109/cvpr52729.2023.02156 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

OPENALEX - Publications

Shilong Liu Zhaoyang Zeng Tianhe Ren Feng Li Hao Zhang and 6 more

In this paper, we present an open-set object detector, called Grounding DINO, by marrying Transformer-based detector DINO with grounded pre-training, which can detect arbitrary objects human inputs such as category names or referring expressions. The key solution of detection is introducing language to a closed-set for concept generalization. To effectively fuse and vision modalities, conceptually divide into three phases propose tight fusion solution, includes feature enhancer,...

10.48550/arxiv.2303.05499 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Segment Everything Everywhere All at Once

OPENALEX - Publications

Xueyan Zou Jianwei Yang Hao Zhang Feng Li Linjie Li and 2 more

In this work, we present SEEM, a promptable and interactive model for segmenting everything everywhere all at once in an image, as shown Fig.1. propose novel decoding mechanism that enables diverse prompting types of segmentation tasks, aiming universal interface behaves like large language models (LLMs). More specifically, SEEM is designed with four desiderata: i) Versatility. We introduce new visual prompt to unify different spatial queries including points, boxes, scribbles masks, which...

10.48550/arxiv.2304.06718 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Focal Modulation Networks

OPENALEX - Publications

Jianwei Yang Chunyuan Li Jianfeng Gao

We propose focal modulation networks (FocalNets in short), where self-attention (SA) is completely replaced by a mechanism for modeling token interactions vision. Focal comprises three components: (i) hierarchical contextualization, implemented using stack of depth-wise convolutional layers, to encode visual contexts from short long ranges, (ii) gated aggregation selectively gather each query based on its content, and (iii) element-wise or affine transformation inject the aggregated context...

10.48550/arxiv.2203.11926 preprint EN other-oa arXiv (Cornell University) 2022-01-01

A whole-slide foundation model for digital pathology from real-world data

OPENALEX - Publications

Hanwen Xu Naoto Usuyama Jaspreet Bagga Sheng Zhang Rajesh C. Rao and 23 more

Abstract Digital pathology poses unique computational challenges, as a standard gigapixel slide may comprise tens of thousands image tiles 1–3 . Prior models have often resorted to subsampling small portion for each slide, thus missing the important slide-level context 4 Here we present Prov-GigaPath, whole-slide foundation model pretrained on 1.3 billion 256 × in 171,189 whole slides from Providence, large US health network comprising 28 cancer centres. The originated more than 30,000...

10.1038/s41586-024-07441-w article EN cc-by Nature 2024-05-22

A Simple Framework for Open-Vocabulary Segmentation and Detection

OPENALEX - Publications

Hao Zhang Feng Li Xueyan Zou Shilong Liu Chunyuan Li and 2 more

We present OpenSeeD, a simple Open-vocabulary Segmentation and Detection framework that jointly learns from different segmentation detection datasets. To bridge the gap of vocabulary annotation granularity, we first introduce pre-trained text encoder to encode all visual concepts in two tasks learn common semantic space for them. This gives us reasonably good results compared with counterparts trained on task only. further reconcile them, identify discrepancies: i) discrepancy – requires...

10.1109/iccv51070.2023.00100 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

Semantic-SAM: Segment and Recognize Anything at Any Granularity

OPENALEX - Publications

Feng Li Hao Zhang Peize Sun Xueyan Zou Shilong Liu and 4 more

In this paper, we introduce Semantic-SAM, a universal image segmentation model to enable segment and recognize anything at any desired granularity. Our offers two key advantages: semantic-awareness granularity-abundance. To achieve semantic-awareness, consolidate multiple datasets across three granularities decoupled classification for objects parts. This allows our capture rich semantic information. For the multi-granularity capability, propose multi-choice learning scheme during training,...

10.48550/arxiv.2307.04767 preprint EN other-oa arXiv (Cornell University) 2023-01-01

ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models

OPENALEX - Publications

Chunyuan Li Haotian Liu Liunian Harold Li Pengchuan Zhang Jyoti Aneja and 6 more

Learning visual representations from natural language supervision has recently shown great promise in a number of pioneering works. In general, these language-augmented models demonstrate strong transferability to variety datasets and tasks. However, it remains challenging evaluate the transferablity due lack easy-to-use evaluation toolkits public benchmarks. To tackle this, we build ELEVATER (Evaluation Language-augmented Visual Task-level Transfer), first benchmark toolkit for...

10.48550/arxiv.2204.08790 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Effect of volatile compounds produced by the cotton endophytic bacterial strain Bacillus sp. T6 against Verticillium wilt

OPENALEX - Publications

Lin Zhang Yu Wang Shengwei Lei Hongxin Zhang Ziyang Liu and 2 more

Abstract Background Verticillium wilt, caused by the fungus dahliae , leads to significant losses in cotton yield worldwide. Biocontrol management is a promising means of suppressing verticillium wilt. The purpose study was obtain and analyze endophytic bacteria with wilt-resistant activities from roots Gossypium barbadense ‘Xinhai15’ explore interactions between soil plants. Results An bacterium Bacillus sp. T6 obtained G. ‘Xinhai15’, which showed antagonistic abilities against bioassay...

10.1186/s12866-022-02749-x article EN cc-by BMC Microbiology 2023-01-10

K-LITE: Learning Transferable Visual Models with External Knowledge

OPENALEX - Publications

Sheng Shen Chunyuan Li Xiaowei Hu Yujia Xie Jianwei Yang and 9 more

The new generation of state-of-the-art computer vision systems are trained from natural language supervision, ranging simple object category names to descriptive captions. This form supervision ensures high generality and usability the learned visual models, due broad concept coverage achieved via large-scale data collection process. Alternatively, we argue that learning with external knowledge is a promising way which leverages much more structured source offers sample efficiency. We...

10.48550/arxiv.2204.09222 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Parameter-Efficient Model Adaptation for Vision Transformers

OPENALEX - Publications

Xuehai He Chunyuan Li Pengchuan Zhang Jianwei Yang Xin Wang

In computer vision, it has achieved great transfer learning performance via adapting large-scale pretrained vision models (e.g., transformers) to downstream tasks. Common approaches for model adaptation either update all parameters or leverage linear probes. this paper, we aim study parameter-efficient strategies transformers on the image classification task. We formulate efficient as a subspace training problem and perform comprehensive benchmarking over different methods. conduct an...

10.1609/aaai.v37i1.25160 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2023-06-26

Polystyrene microplastics sunlight-induce oxidative dissolution, chemical transformation and toxicity enhancement of silver nanoparticles

OPENALEX - Publications

Ling Tong Peng Duan Xiang Tian Jiaolong Huang Jun Ji and 4 more

10.1016/j.scitotenv.2022.154180 article EN The Science of The Total Environment 2022-02-26

Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V

OPENALEX - Publications

Jianwei Yang Hao Zhang Feng Li Xueyan Zou Chunyuan Li and 1 more

We present Set-of-Mark (SoM), a new visual prompting method, to unleash the grounding abilities of large multimodal models (LMMs), such as GPT-4V. As illustrated in Fig. 1 (right), we employ off-the-shelf interactive segmentation models, SEEM/SAM, partition an image into regions at different levels granularity, and overlay these with set marks e.g., alphanumerics, masks, boxes. Using marked input, GPT-4V can answer questions that require grounding. perform comprehensive empirical study...

10.48550/arxiv.2310.11441 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Genome-Wide Identification and Characterization of microRNAs in Developing Grains of Zea mays L.

OPENALEX - Publications

Dandan Li Zongcai Liu Lei Gao Lifang Wang Meijuan Gao and 7 more

The development and maturation of maize kernel involves meticulous fine gene regulation at transcriptional post-transcriptional levels, miRNAs play important roles during this process. Although a number have been identified in seed, the ones involved early grains different lines not well studied. Here, we profiled four small RNA libraries, each constructed from groups immature Zea mays inbred line Chang 7–2 collected 4–6, 7–9, 12–14, 18–23 days after pollination (DAP). A total 40 known...

10.1371/journal.pone.0153168 article EN cc-by PLoS ONE 2016-04-15

Zinc oxide nanoparticles dissolution and toxicity enhancement by polystyrene microplastics under sunlight irradiation

OPENALEX - Publications

Ling Tong Ke Song Yingqi Wang Jianwei Yang Jun Ji and 3 more

10.1016/j.chemosphere.2022.134421 article EN Chemosphere 2022-03-25

Learning Customized Visual Models with Retrieval-Augmented Knowledge

OPENALEX - Publications

Haotian Liu Kilho Son Jianwei Yang Ce Liu Jianfeng Gao and 2 more

Image-text contrastive learning models such as CLIP have demonstrated strong task transfer ability. The high generality and usability of these visual is achieved via a web-scale data collection process to ensure broad concept coverage, followed by expensive pre-training feed all the knowledge into model weights. Alternatively, we propose React,REtrieval-Augmented CusTomization, framework acquire relevant web build customized for target domains. We retrieve most image-text pairs <tex...

10.1109/cvpr52729.2023.01454 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

detrex: Benchmarking Detection Transformers

OPENALEX - Publications

Tianhe Ren Shilong Liu Feng Li Hao Zhang Ailing Zeng and 11 more

The DEtection TRansformer (DETR) algorithm has received considerable attention in the research community and is gradually emerging as a mainstream approach for object detection other perception tasks. However, current field lacks unified comprehensive benchmark specifically tailored DETR-based models. To address this issue, we develop unified, highly modular, lightweight codebase called detrex, which supports majority of instance recognition algorithms, covering various fundamental tasks,...

10.48550/arxiv.2306.07265 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Dual-Triple Transformer Networks for Accurate CT Pleural Effusion Segmentation

OPENALEX - Publications

Jianwei Yang Wenkang Fan Hao Fang Zirui Zhu Xiongbiao Luo

10.1109/icassp49660.2025.10888778 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

Integration of College Students' Management and Ideological and Political Education from the Perspective of Moral Cultivation Path analysis

OPENALEX - Publications

Jianwei Yang

10.62989/jes2025.2.2.47 article EN 2025-02-28

CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks

OPENALEX - Publications

Zhecan Wang Noel Codella Yen-Chun Chen Luowei Zhou Jianwei Yang and 5 more

Contrastive language-image pretraining (CLIP) links vision and language modalities into a unified embedding space, yielding the tremendous potential for vision-language (VL) tasks. While early concurrent works have begun to study this on subset of tasks, important questions remain: 1) What is benefit CLIP unstudied VL tasks? 2) Does provide in low-shot or domain-shifted scenarios? 3) Can improve existing approaches without impacting inference complexity? In work, we seek answer these through...

10.48550/arxiv.2201.05729 preprint EN cc-by-nc-sa arXiv (Cornell University) 2022-01-01

Image segmentation and bias correction via an improved level set method

OPENALEX - Publications

Yunjie Chen Jianwei Zhang Arabinda Mishra Jianwei Yang

10.1016/j.neucom.2011.06.006 article EN Neurocomputing 2011-07-04

Grounded Language-Image Pre-training

OPENALEX - Publications

Liunian Harold Li Pengchuan Zhang Haotian Zhang Jianwei Yang Chunyuan Li and 7 more

This paper presents a grounded language-image pre-training (GLIP) model for learning object-level, language-aware, and semantic-rich visual representations. GLIP unifies object detection phrase grounding pre-training. The unification brings two benefits: 1) it allows to learn from both data improve tasks bootstrap good model; 2) can leverage massive image-text pairs by generating boxes in self-training fashion, making the learned representation semantic-rich. In our experiments, we pre-train...

10.48550/arxiv.2112.03857 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Hierarchical and distributed optimization of distribution network considering spatial and temporal distribution of electric vehicle charging load

OPENALEX - Publications

Shu Liu Fang Chen Shicheng Jia Yueping Xiang Jianwei Yang

With the increasing penetration of electric vehicles (EV) and distributed generations (DG) in distribution networks, operation control networks (DN) is faced with many new challenges. Considering remarkable characteristics network layered by voltage level spatial temporal EV charging load, a hierarchical optimization method for DN proposed. Firstly, prediction model load established, which composed three parts: resident travel probability model, vehicle mobility traffic model. Secondly,...

10.1016/j.egyr.2023.04.086 article EN cc-by-nc-nd Energy Reports 2023-04-20

Quasi Fourier-Mellin Transform for Affine Invariant Features

OPENALEX - Publications

Jianwei Yang Zhengda Lu Yuan Yan Tang Zhou Yuan Yunjie Chen

Fourier-Mellin transform (FMT) has been widely used for the extraction of rotation- and scale-invariant features. However, affine is a more reasonable approximation model real viewpoint change. Due to shearing, integral along angular direction in calculation FMT cannot be extract inherent features an image undergoing transform. To eliminate effect whitening should conducted on radial direction. can hardly modified by conventional whitening-based methods with low computational cost due...

10.1109/tip.2020.2967578 article EN IEEE Transactions on Image Processing 2020-01-01

Coming Soon ...