Liangli Zhen

ORCID: 0000-0003-0481-3298
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Image and Video Retrieval Techniques
  • Multimodal Machine Learning Applications
  • Domain Adaptation and Few-Shot Learning
  • Face and Expression Recognition
  • Advanced Multi-Objective Optimization Algorithms
  • Video Analysis and Summarization
  • Human Pose and Action Recognition
  • Metaheuristic Optimization Algorithms Research
  • COVID-19 diagnosis using AI
  • Advanced Neural Network Applications
  • Video Surveillance and Tracking Methods
  • Image Retrieval and Classification Techniques
  • Evolutionary Algorithms and Applications
  • Remote-Sensing Image Classification
  • Privacy-Preserving Technologies in Data
  • Advanced Computing and Algorithms
  • Speech and Audio Processing
  • AI in cancer detection
  • Retinal Imaging and Analysis
  • Blind Source Separation Techniques
  • Cryptography and Data Security
  • Digital Media Forensic Detection
  • Anomaly Detection Techniques and Applications
  • Lung Cancer Diagnosis and Treatment
  • Medical Image Segmentation Techniques

Institute of High Performance Computing
2017-2025

Agency for Science, Technology and Research
2017-2025

Sichuan University
2014-2019

University of Birmingham
2017-2019

Chengdu University
2016-2017

Cross-modal retrieval aims to enable flexible across different modalities. The core of cross-modal is how measure the content similarity between types data. In this paper, we present a novel method, called Deep Supervised Retrieval (DSCMR). It find common representation space, in which samples from modalities can be compared directly. Specifically, DSCMR minimises discrimination loss both label space and supervise model learning discriminative features. Furthermore, it simultaneously...

10.1109/cvpr.2019.01064 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

Cross-modal retrieval takes one type of data as the query to retrieve relevant another type. Most existing cross-modal approaches were proposed learn a common subspace in joint manner, where from all modalities have be involved during whole training process. For these approaches, optimal parameters different modality-specific transformations are dependent on each other and model has retrained when handling samples new modalities. In this paper, we present novel method, called Scalable Deep...

10.1145/3331184.3331213 article EN 2019-07-18

Rapid development of evolutionary algor ithms in handling many-objective optimization problems requires viable methods visualizing a high-dimensional solution set. The parallel coordinates plot which scales well to data is such method, and has been frequently used optimization. However, the not as straightforward classic scatter present information contained In this paper, we make some observations plot, terms comparing quality sets, understanding shape distribution set, reflecting relation...

10.1109/mci.2017.2742869 article EN IEEE Computational Intelligence Magazine 2017-10-11

In an underdetermined mixture system with n unknown sources, it is a challenging task to separate these sources from their m observed signals, where . n. By exploiting the technique of sparse coding, we propose effective approach discover some 1-D subspaces set consisting all time-frequency (TF) representation vectors signals. We show that are associated TF points only single source possesses dominant energy. grouping in via hierarchical clustering algorithm, obtain estimation mixing matrix....

10.1109/tnnls.2016.2610960 article EN IEEE Transactions on Neural Networks and Learning Systems 2016-10-05

Recently, cross-modal retrieval is emerging with the help of deep multimodal learning. However, even for unimodal data, collecting large-scale well-annotated data expensive and time-consuming, not to mention additional challenges from multiple modalities. Although crowd-sourcing annotation, e.g., Amazon's Mechanical Turk, can be utilized mitigate labeling cost, but leading unavoidable noise in labels non-expert annotating. To tackle challenge, this paper presents a general Multi-modal Robust...

10.1109/cvpr46437.2021.00536 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

Natural Language Video Localization (NLVL) aims to locate a target moment from an untrimmed video that semantically corresponds text query. Existing approaches mainly solve the NLVL problem perspective of computer vision by formulating it as ranking, anchor, or regression tasks. These methods suffer large performance degradation when localizing on long videos. In this work, we address new perspective, i.e., span-based question answering (QA), treating input passage. We propose span network...

10.1109/tpami.2021.3060449 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2021-01-01

Deep neural networks have demonstrated impressive results in medical image analysis, but designing suitable architectures for each specific task is expertise-dependent and time-consuming. Neural architecture search (NAS) offers an effective means of discovering architectures. It has been highly successful numerous applications, particularly natural classification. Yet, images possess unique characteristics, such as small regions a wide variety lesion sizes, that differentiate them from...

10.1109/tevc.2024.3352641 article EN IEEE Transactions on Evolutionary Computation 2024-01-11

Cross-modal retrieval (CMR) enables flexible experience across different modalities (e.g., texts versus images), which maximally benefits us from the abundance of multimedia data. Existing deep CMR approaches commonly require a large amount labeled data for training to achieve high performance. However, it is time-consuming and expensive annotate manually. Thus, how transfer valuable knowledge existing annotated new data, especially known categories categories, becomes attractive real-world...

10.1109/tnnls.2020.3029181 article EN IEEE Transactions on Neural Networks and Learning Systems 2020-10-22

10.1016/j.autcon.2020.103509 article EN publisher-specific-oa Automation in Construction 2020-12-17

Color fundus photography (CFP) and Optical coherence tomography (OCT) images are two of the most widely used modalities in clinical diagnosis management retinal diseases. Despite widespread use multimodal imaging practice, few methods for automated eye diseases utilize correlated complementary information from multiple effectively. This paper explores how to leverage CFP OCT improve We propose a novel learning method, named geometric correspondence-based network (GeCoM-Net), achieve fusion...

10.1109/tmi.2024.3352602 article EN IEEE Transactions on Medical Imaging 2024-01-11

Multi-party computation (MPC) allows distributed machine learning to be performed in a privacy-preserving manner so that end-hosts are unaware of the true models on clients. However, standard MPC algorithm also triggers additional communication and costs, due those expensive cryptography operations protocols. In this paper, instead applying heavy over entire local for secure model aggregation, we propose encrypt critical part (gradients) parameters reduce cost, while maintaining MPC's...

10.1109/ccgrid51090.2021.00101 article EN 2021-05-01

Pneumonia is one of the most common treatable causes death, and early diagnosis allows for intervention. Automated pneumonia can therefore improve outcomes. However, it challenging to develop high-performance deep learning models due lack well-annotated data training. This paper proposes a novel method, called Deep Supervised Domain Adaptation (DSDA), automatically diagnose from chest X-ray images. Specifically, we propose transfer knowledge publicly available large-scale source dataset...

10.1109/jbhi.2021.3100119 article EN IEEE Journal of Biomedical and Health Informatics 2021-07-27

Multi-Party Computation (MPC) provides an effective cryptographic solution for distributed computing systems so that local models with sensitive information are encrypted before sending to the centralized servers aggregation. Though direct knowledge leakages eliminated in MPC-based algorithms, we observe server can still obtain indirectly many scenarios, or even reveal groundtruth images through methods like Deep Leakage from Gradients (DLG). To eliminate such possibilities and provide...

10.1109/tbdata.2022.3208736 article EN IEEE Transactions on Big Data 2022-09-22

Generative adversarial networks (GANs) are a powerful generative technique but frequently face challenges with training stability. Network architecture plays significant role in determining the final output of GANs, designing fine demands extensive domain expertise. This paper aims to address this issue by searching for high-performance generator's architectures through neural search (NAS). The proposed approach, called evolutionary weight sharing (EWSGAN), is based on and comprises two...

10.1109/tevc.2023.3338371 article EN IEEE Transactions on Evolutionary Computation 2023-12-01

In high-speed free-space optical communication systems, the received laser beam must be coupled into a single-mode fiber at input of receiver module. However, propagation through atmospheric turbulence degrades spatial coherence and poses challenges for coupling. this paper, we propose novel method, called as adaptive stochastic parallel gradient descent (ASPGD), to achieve efficient To specific, formulate coupling problem model-free optimization solve it using ASPGD in parallel. avoid...

10.1364/oe.390762 article EN cc-by Optics Express 2020-04-06

Multimodal large language models (MLLMs) have demonstrated significant potential in medical Visual Question Answering (VQA). Yet, they remain prone to hallucinations-incorrect responses that contradict input images, posing substantial risks clinical decision-making. Detecting these hallucinations is essential for establishing trust MLLMs among clinicians and patients, thereby enabling their real-world adoption. Current hallucination detection methods, especially semantic entropy (SE),...

10.48550/arxiv.2503.20504 preprint EN arXiv (Cornell University) 2025-03-26

Cross-modal hashing provides an efficient solution for retrieval tasks across various modalities, such as images and text. However, most existing methods are deterministic models, which overlook the reliability associated with retrieved results. This omission renders them unreliable determining matches between data pairs based solely on Hamming distance. To bridge gap, in this paper, we propose a novel method called Deep Evidential Hashing (DECH). equips models ability to quantify level of...

10.1609/aaai.v39i17.34043 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2025-04-11

Cross-modal hashing, due to its low storage cost and high query speed, has been successfully used for similarity search in multimedia retrieval applications. It projects high-dimensional data into a shared isomorphic Hamming space with similar binary codes semantically-similar data. In some applications, all modalities may not be obtained or trained simultaneously reasons, such as privacy, secret, limitation, computational resource limitation. However, most existing cross-modal hashing...

10.1145/3343031.3351078 article EN Proceedings of the 30th ACM International Conference on Multimedia 2019-10-15

Given a video, video grounding aims to retrieve temporal moment that semantically corresponds language query. In this work, we propose Parallel Attention Network with Sequence matching (SeqPAN) address the challenges in task: multi-modal representation learning, and target boundary prediction. We design self-guided parallel attention module effectively capture self-modal contexts cross-modal attentive information between text. Inspired by sequence labeling tasks natural processing, split...

10.18653/v1/2021.findings-acl.69 preprint EN cc-by 2021-01-01

To develop artificial intelligence (AI) models for automated detection of center-involved diabetic macular edema (CI-DME) with visual impairment using color fundus photographs (CFP) and optical coherence tomography (OCT) scans. AI effort pooled data from multi-center studies. Datasets consisted participants or without CI-DME, who had CFP, OCT, best corrected acuity (BCVA) obtained after manifest refraction. The development dataset was DRCR Retina Network clinical trials, external testing 1...

10.1016/j.oret.2025.04.016 article EN cc-by-nc-nd Ophthalmology Retina 2025-04-01

Deep learning models achieve remarkable accuracy in computer vision tasks yet remain vulnerable to adversarial examples-carefully crafted perturbations input images that can deceive these into making confident but incorrect predictions. This vulnerability poses significant risks high-stakes applications such as autonomous vehicles, security surveillance, and safety-critical inspection systems. While the existing literature extensively covers attacks image classification, comprehensive...

10.1109/tnnls.2025.3561225 article EN IEEE Transactions on Neural Networks and Learning Systems 2025-01-01
Coming Soon ...