- Advanced Image and Video Retrieval Techniques
- Multimodal Machine Learning Applications
- Domain Adaptation and Few-Shot Learning
- Face and Expression Recognition
- Advanced Multi-Objective Optimization Algorithms
- Video Analysis and Summarization
- Human Pose and Action Recognition
- Metaheuristic Optimization Algorithms Research
- COVID-19 diagnosis using AI
- Advanced Neural Network Applications
- Video Surveillance and Tracking Methods
- Image Retrieval and Classification Techniques
- Evolutionary Algorithms and Applications
- Remote-Sensing Image Classification
- Privacy-Preserving Technologies in Data
- Advanced Computing and Algorithms
- Speech and Audio Processing
- AI in cancer detection
- Retinal Imaging and Analysis
- Blind Source Separation Techniques
- Cryptography and Data Security
- Digital Media Forensic Detection
- Anomaly Detection Techniques and Applications
- Lung Cancer Diagnosis and Treatment
- Medical Image Segmentation Techniques
Institute of High Performance Computing
2017-2025
Agency for Science, Technology and Research
2017-2025
Sichuan University
2014-2019
University of Birmingham
2017-2019
Chengdu University
2016-2017
Cross-modal retrieval aims to enable flexible across different modalities. The core of cross-modal is how measure the content similarity between types data. In this paper, we present a novel method, called Deep Supervised Retrieval (DSCMR). It find common representation space, in which samples from modalities can be compared directly. Specifically, DSCMR minimises discrimination loss both label space and supervise model learning discriminative features. Furthermore, it simultaneously...
Cross-modal retrieval takes one type of data as the query to retrieve relevant another type. Most existing cross-modal approaches were proposed learn a common subspace in joint manner, where from all modalities have be involved during whole training process. For these approaches, optimal parameters different modality-specific transformations are dependent on each other and model has retrained when handling samples new modalities. In this paper, we present novel method, called Scalable Deep...
Rapid development of evolutionary algor ithms in handling many-objective optimization problems requires viable methods visualizing a high-dimensional solution set. The parallel coordinates plot which scales well to data is such method, and has been frequently used optimization. However, the not as straightforward classic scatter present information contained In this paper, we make some observations plot, terms comparing quality sets, understanding shape distribution set, reflecting relation...
In an underdetermined mixture system with n unknown sources, it is a challenging task to separate these sources from their m observed signals, where . n. By exploiting the technique of sparse coding, we propose effective approach discover some 1-D subspaces set consisting all time-frequency (TF) representation vectors signals. We show that are associated TF points only single source possesses dominant energy. grouping in via hierarchical clustering algorithm, obtain estimation mixing matrix....
Recently, cross-modal retrieval is emerging with the help of deep multimodal learning. However, even for unimodal data, collecting large-scale well-annotated data expensive and time-consuming, not to mention additional challenges from multiple modalities. Although crowd-sourcing annotation, e.g., Amazon's Mechanical Turk, can be utilized mitigate labeling cost, but leading unavoidable noise in labels non-expert annotating. To tackle challenge, this paper presents a general Multi-modal Robust...
Natural Language Video Localization (NLVL) aims to locate a target moment from an untrimmed video that semantically corresponds text query. Existing approaches mainly solve the NLVL problem perspective of computer vision by formulating it as ranking, anchor, or regression tasks. These methods suffer large performance degradation when localizing on long videos. In this work, we address new perspective, i.e., span-based question answering (QA), treating input passage. We propose span network...
Deep neural networks have demonstrated impressive results in medical image analysis, but designing suitable architectures for each specific task is expertise-dependent and time-consuming. Neural architecture search (NAS) offers an effective means of discovering architectures. It has been highly successful numerous applications, particularly natural classification. Yet, images possess unique characteristics, such as small regions a wide variety lesion sizes, that differentiate them from...
Cross-modal retrieval (CMR) enables flexible experience across different modalities (e.g., texts versus images), which maximally benefits us from the abundance of multimedia data. Existing deep CMR approaches commonly require a large amount labeled data for training to achieve high performance. However, it is time-consuming and expensive annotate manually. Thus, how transfer valuable knowledge existing annotated new data, especially known categories categories, becomes attractive real-world...
Color fundus photography (CFP) and Optical coherence tomography (OCT) images are two of the most widely used modalities in clinical diagnosis management retinal diseases. Despite widespread use multimodal imaging practice, few methods for automated eye diseases utilize correlated complementary information from multiple effectively. This paper explores how to leverage CFP OCT improve We propose a novel learning method, named geometric correspondence-based network (GeCoM-Net), achieve fusion...
Multi-party computation (MPC) allows distributed machine learning to be performed in a privacy-preserving manner so that end-hosts are unaware of the true models on clients. However, standard MPC algorithm also triggers additional communication and costs, due those expensive cryptography operations protocols. In this paper, instead applying heavy over entire local for secure model aggregation, we propose encrypt critical part (gradients) parameters reduce cost, while maintaining MPC's...
Pneumonia is one of the most common treatable causes death, and early diagnosis allows for intervention. Automated pneumonia can therefore improve outcomes. However, it challenging to develop high-performance deep learning models due lack well-annotated data training. This paper proposes a novel method, called Deep Supervised Domain Adaptation (DSDA), automatically diagnose from chest X-ray images. Specifically, we propose transfer knowledge publicly available large-scale source dataset...
Multi-Party Computation (MPC) provides an effective cryptographic solution for distributed computing systems so that local models with sensitive information are encrypted before sending to the centralized servers aggregation. Though direct knowledge leakages eliminated in MPC-based algorithms, we observe server can still obtain indirectly many scenarios, or even reveal groundtruth images through methods like Deep Leakage from Gradients (DLG). To eliminate such possibilities and provide...
Generative adversarial networks (GANs) are a powerful generative technique but frequently face challenges with training stability. Network architecture plays significant role in determining the final output of GANs, designing fine demands extensive domain expertise. This paper aims to address this issue by searching for high-performance generator's architectures through neural search (NAS). The proposed approach, called evolutionary weight sharing (EWSGAN), is based on and comprises two...
In high-speed free-space optical communication systems, the received laser beam must be coupled into a single-mode fiber at input of receiver module. However, propagation through atmospheric turbulence degrades spatial coherence and poses challenges for coupling. this paper, we propose novel method, called as adaptive stochastic parallel gradient descent (ASPGD), to achieve efficient To specific, formulate coupling problem model-free optimization solve it using ASPGD in parallel. avoid...
Multimodal large language models (MLLMs) have demonstrated significant potential in medical Visual Question Answering (VQA). Yet, they remain prone to hallucinations-incorrect responses that contradict input images, posing substantial risks clinical decision-making. Detecting these hallucinations is essential for establishing trust MLLMs among clinicians and patients, thereby enabling their real-world adoption. Current hallucination detection methods, especially semantic entropy (SE),...
Cross-modal hashing provides an efficient solution for retrieval tasks across various modalities, such as images and text. However, most existing methods are deterministic models, which overlook the reliability associated with retrieved results. This omission renders them unreliable determining matches between data pairs based solely on Hamming distance. To bridge gap, in this paper, we propose a novel method called Deep Evidential Hashing (DECH). equips models ability to quantify level of...
Cross-modal hashing, due to its low storage cost and high query speed, has been successfully used for similarity search in multimedia retrieval applications. It projects high-dimensional data into a shared isomorphic Hamming space with similar binary codes semantically-similar data. In some applications, all modalities may not be obtained or trained simultaneously reasons, such as privacy, secret, limitation, computational resource limitation. However, most existing cross-modal hashing...
Given a video, video grounding aims to retrieve temporal moment that semantically corresponds language query. In this work, we propose Parallel Attention Network with Sequence matching (SeqPAN) address the challenges in task: multi-modal representation learning, and target boundary prediction. We design self-guided parallel attention module effectively capture self-modal contexts cross-modal attentive information between text. Inspired by sequence labeling tasks natural processing, split...
To develop artificial intelligence (AI) models for automated detection of center-involved diabetic macular edema (CI-DME) with visual impairment using color fundus photographs (CFP) and optical coherence tomography (OCT) scans. AI effort pooled data from multi-center studies. Datasets consisted participants or without CI-DME, who had CFP, OCT, best corrected acuity (BCVA) obtained after manifest refraction. The development dataset was DRCR Retina Network clinical trials, external testing 1...
Deep learning models achieve remarkable accuracy in computer vision tasks yet remain vulnerable to adversarial examples-carefully crafted perturbations input images that can deceive these into making confident but incorrect predictions. This vulnerability poses significant risks high-stakes applications such as autonomous vehicles, security surveillance, and safety-critical inspection systems. While the existing literature extensively covers attacks image classification, comprehensive...