- Multimodal Machine Learning Applications
- Domain Adaptation and Few-Shot Learning
- Advanced Neural Network Applications
- Medical Image Segmentation Techniques
- Retinal Imaging and Analysis
- COVID-19 diagnosis using AI
- Topic Modeling
- Cancer-related molecular mechanisms research
- Advanced Memory and Neural Computing
- Digital Imaging for Blood Diseases
- Advanced Vision and Imaging
- Natural Language Processing Techniques
- Brain Tumor Detection and Classification
- Video Surveillance and Tracking Methods
- Optical Coherence Tomography Applications
- Face recognition and analysis
- Advanced Image and Video Retrieval Techniques
- Medical Imaging and Analysis
- Glaucoma and retinal disorders
- Radiomics and Machine Learning in Medical Imaging
- Computer Graphics and Visualization Techniques
- Neural dynamics and brain function
- Ferroelectric and Negative Capacitance Devices
- Neural Networks and Reservoir Computing
- Video Analysis and Summarization
Peng Cheng Laboratory
2023-2024
Peking University
2023-2024
Tencent (China)
2020-2023
Peking University Shenzhen Hospital
2023
Tencent Healthcare (China)
2022
National University of Defense Technology
2020-2021
Wuhan University
2017-2018
For clustering-guided fully unsupervised person reidentification (re-ID) methods, the quality of pseudo labels generated by clustering directly decides model performance. In order to improve in existing we propose HCT method which combines hierarchical with hard-batch triplet loss. The key idea is make full use similarity among samples target dataset through clustering, reduce influence hard examples loss, so as generate high and Specifically, (1) labels, (2) PK sampling each iteration a new...
For Large Vision-Language Models (LVLMs), scaling the model can effectively improve performance. However, expanding parameters significantly increases training and inferring costs, as all are activated for each token in calculation. In this work, we propose a novel strategy MoE-tuning LVLMs, which constructing sparse with an outrageous number of parameter but constant computational cost, addresses performance degradation typically associated multi-modal learning sparsity. Furthermore,...
Unsupervised domain adaption (UDA), which aims to enhance the segmentation performance of deep models on unlabeled data, has recently drawn much attention. In this paper, we propose a novel UDA method (namely DLaST) for medical image via disentanglement learning and self-training. Disentanglement factorizes an into domain-invariant anatomy domain-specific modality components. To make best learning, shape constraint boost adaptation performance. The self-training strategy further adaptively...
Large Language Models (LLMs), including GPT-3.5, LLaMA, and PaLM, seem to be knowledgeable able adapt many tasks. However, we still can not completely trust their answer, since LLMs suffer from hallucination--fabricating non-existent facts cheat users without perception. And the reasons for existence pervasiveness remain unclear. In this paper, demonstrate that non-sense prompts composed of random tokens also elicit respond with hallucinations. This phenomenon forces us revisit hallucination...
Unsupervised domain adaption has proven to be an effective approach for alleviating the intensive workload of manual annotation by aligning synthetic source-domain data and real-world target-domain samples. Unfortunately, mapping distribution unconditionally may distort essential structural information data. To this end, we firstly propose introduce a novel multi-anchor based active learning strategy assist adaptation regarding semantic segmentation task. By innovatively adopting multiple...
The Large Vision-Language Model (LVLM) has enhanced the performance of various downstream tasks in visual-language understanding. Most existing approaches encode images and videos into separate feature spaces, which are then fed as inputs to large language models. However, due lack unified tokenization for videos, namely misalignment before projection, it becomes challenging a Language (LLM) learn multi-modal interactions from several poor projection layers. In this work, we unify visual...
The video-language (VL) pretraining has achieved remarkable improvement in multiple downstream tasks. However, the current VL framework is hard to extend modalities (N modalities, N>=3) beyond vision and language. We thus propose LanguageBind, taking language as bind across different because modality well-explored contains rich semantics. Specifically, we freeze encoder acquired by pretraining, then train encoders for other with contrastive learning. As a result, all are mapped shared...
Radiology report generation (RRG) is crucial to save the valuable time of radiologists in drafting report, therefore increasing their work efficiency. Compared typical methods that directly transfer image captioning technologies RRG, our approach incorporates organ-wise priors into generation. Specifically, this paper, we propose Organ-aware Diagnosis (OaD) generate diagnostic reports containing descriptions each physiological organ. During training, first develop a task distillation (TD)...
While recent progress in multimodal large language models tackles various modality tasks, they posses limited integration capabilities for complex multi-modality consequently constraining the development of field. In this work, we take initiative to explore and propose LLMBind, a unified framework task integration, which binds Large Language Models corresponding pre-trained with task-specific tokens. Consequently, LLMBind can interpret inputs produce outputs versatile combinations image,...
Purpose Using deep learning (DL)-based technique, we identify risk factors and create a prediction model for refractory neovascular age-related macular degeneration (nAMD) characterized by persistent disease activity (PDA) in spectral domain optical coherence tomography (SD-OCT) images. Materials methods A total of 671 typical B-scans were collected from 186 eyes patients with nAMD. Spectral images analyzed using classification convolutional neural network (CNN) fully (FCN) algorithm to...
Unsupervised domain adaption has been widely adopted in tasks with scarce annotated data. Unfortunately, mapping the target-domain distribution to source-domain unconditionally may distort essential structural information of data, leading inferior performance. To address this issue, we first propose introduce active sample selection assist adaptation regarding semantic segmentation task. By innovatively adopting multiple anchors instead a single centroid, both source and target domains can...
Video-based large language models (Video-LLMs) have been recently introduced, targeting both fundamental improvements in perception and comprehension, a diverse range of user inquiries. In pursuit the ultimate goal achieving artificial general intelligence, truly intelligent Video-LLM model should not only see understand surroundings, but also possess human-level commonsense, make well-informed decisions for users. To guide development such model, establishment robust comprehensive...