- Advanced Image and Video Retrieval Techniques
- Multimodal Machine Learning Applications
- Domain Adaptation and Few-Shot Learning
- Adversarial Robustness in Machine Learning
- Image Retrieval and Classification Techniques
- Video Surveillance and Tracking Methods
- Anomaly Detection Techniques and Applications
- Advanced Malware Detection Techniques
- Text and Document Classification Technologies
- Topic Modeling
- Bacillus and Francisella bacterial research
- Advanced Text Analysis Techniques
- Fault Detection and Control Systems
- Mental Health via Writing
- Robotics and Sensor-Based Localization
- Advanced Graph Neural Networks
- Video Analysis and Summarization
- Generative Adversarial Networks and Image Synthesis
- Face and Expression Recognition
- Service-Oriented Architecture and Web Services
- Security and Verification in Computing
- Machine Learning in Healthcare
- Integrated Circuits and Semiconductor Failure Analysis
Sichuan University
2019-2024
Chengdu University
2020-2023
Cross-modal hashing (CMH) has gained much attention due to its effectiveness and efficiency in facilitating efficient retrieval between different modalities. Whereas, most existing methods unconsciously ignore the hierarchical structural information of data, often learn a single-layer hash function directly transform cross-modal data into common low-dimensional codes one step. This sudden drop dimension huge semantic gap can cause discriminative loss. To this end, we adopt coarse-to-fine...
With the development of video network, image set classification (ISC) has received a lot attention and can be used for various practical applications, such as based recognition, action so on. Although existing ISC methods have obtained promising performance, they often extreme high complexity. Due to superiority in storage space complexity cost, learning hash becomes powerful solution scheme. However, hashing ignore complex structural information hierarchical semantics original features....
In this paper, we study a challenging but less-touched problem in cross-modal retrieval, i.e., partially mismatched pairs (PMPs). Specifically, real-world scenarios, huge number of multimedia data (e.g., the Conceptual Captions dataset) are collected from Internet, and thus it is inevitable to wrongly treat some irrelevant as matched. Undoubtedly, such PMP will remarkably degrade retrieval performance. To tackle problem, derive unified theoretical Robust Cross-modal Learning framework (RCL)...
Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) is a rising theme with broad application prospects. Given the sketch image as query, goal of ZS-SBIR to correctly retrieve semantically similar images under zero-shot scenario. The key project from photo and domains into shared space, where domain gap semantic are effectively bridged. Most previous studies have approached classification problem used loss obtain discriminative features. However, these methods do not explicitly encourage...
Cross-modal hashing, due to its low storage cost and high query speed, has been successfully used for similarity search in multimedia retrieval applications. It projects high-dimensional data into a shared isomorphic Hamming space with similar binary codes semantically-similar data. In some applications, all modalities may not be obtained or trained simultaneously reasons, such as privacy, secret, limitation, computational resource limitation. However, most existing cross-modal hashing...
This paper presents a novel method for supervised multi-view representation learning, which projects multiple views into latent common space while preserving the discrimination and intrinsic structure of each view. Specifically, an apriori discriminant similarity graph is first constructed based on labels pairwise relationships inputs. Then, view-specific networks progressively map inputs to representations whose affinity approximates graph. To achieve consistency, discrimination, cross-view...
In many computer vision applications, an object can be represented by multiple different views. Due to the heterogeneous gap triggered views' inconsistent distributions, it is challenging exploit these multiview data for cross-view retrieval and classification. Motivated fact that both labeled unlabeled enhance relations among views, this article proposes a deep learning framework called semisupervised classes- correlation-collapsed (DSC <sup xmlns:mml="http://www.w3.org/1998/Math/MathML"...
Electronic Health Record (EHR) is the digital form of patient visits containing various medical data, including diagnosis, treatment, and lab events. Representation learning EHR with deep methods has been beneficial for patient-related prediction tasks. Recently, studies have focused on revealing inherent graph structure between events in EHR. Graph neural network (GNN) are prevalent perform well However, relationships must be marked, which complicated time-consuming. Most research works...
Document-level event extraction (DEE) aims to extract structured information from a document. Previous document-level methods relied on trigger annotation, which was very expensive and time-consuming. In addition, entity representation plays an important role in the overall task, but we found that previous work could not effectively use abbreviations coreference information, limited ability of event-related entities documents. Based above two aspects, propose new model named TFECI. our...
Visual grounding aims at localizing objects in images using natural language expressions. This task can be challenging when there are significant differences between the distributions of training and testing sets. Existing methods tend to excessively focus on sets, which could lead overfitting, especially small-sample scenarios. To address this issue, letter, we present a novel meta-learning-based framework called MetaVG, for visual grounding. Our approach leverages bi-level optimization...