- Multimodal Machine Learning Applications
- Advanced Image and Video Retrieval Techniques
- Generative Adversarial Networks and Image Synthesis
- Video Surveillance and Tracking Methods
- Human Pose and Action Recognition
- Face recognition and analysis
- Image Retrieval and Classification Techniques
- Video Analysis and Summarization
- Recommender Systems and Techniques
- Domain Adaptation and Few-Shot Learning
- Advanced Graph Neural Networks
- Topic Modeling
- 3D Shape Modeling and Analysis
- Anomaly Detection Techniques and Applications
- Advanced Vision and Imaging
- Adversarial Robustness in Machine Learning
- Advanced Neural Network Applications
- Text and Document Classification Technologies
- Natural Language Processing Techniques
- Handwritten Text Recognition Techniques
- Advanced Image Processing Techniques
- Caching and Content Delivery
- Sentiment Analysis and Opinion Mining
- Digital Media Forensic Detection
- Speech and Audio Processing
Harbin Institute of Technology
2023-2025
Monash University
2019-2024
Shenzhen Institute of Information Technology
2024
Guiyang Medical University
2024
Nanning Normal University
2015-2023
Australian Regenerative Medicine Institute
2023
Peng Cheng Laboratory
2023
Singapore-HUJ Alliance for Research and Enterprise
2019-2020
Beijing Fengtai Hospital
2011
Southerners on New Ground
1996
The prevailing characteristics of micro-videos result in the less descriptive power each modality. micro-video representations, several pioneer efforts proposed, are limited implicitly exploring consistency between different modality information but ignore complementarity. In this paper, we focus on how to explicitly separate consistent features and complementary from mixed harness their combination improve expressiveness Toward end, present a neural multimodal cooperative learning (NMCL)...
In recent years, remarkable progress in zero-shot learning (ZSL) has been achieved by generative adversarial networks (GAN). To compensate for the lack of training samples ZSL, a surge GAN architectures have developed human experts through trial-and-error testing. Despite their efficacy, however, there is still no guarantee that these hand-crafted models can consistently achieve good performance across diversified datasets or scenarios. Accordingly, this paper, we turn to neural architecture...
With the explosive growth of multimedia contents, retrieval is facing unprecedented challenges on both storage cost and speed. Hashing technique can project high-dimensional data into compact binary hash codes. it, most time-consuming semantic similarity computation during process be significantly accelerated with fast Hamming distance computation, meanwhile reduced greatly by embedding. In light this, multi-modal hashing has recently received considerable attention to support large-scale...
Session-based recommendation (SBR) has drawn increasingly research attention in recent years, due to its great practical value by only exploiting the limited user behavior history current session. The key of SBR is accurately infer anonymous purpose a session which typically represented as embedding, and then match it with item embeddings for next prediction. Existing methods learn embedding at level, namely, aggregating items or without assigned weights items. However, they ignore fact that...
Unsupervised cross-modal hashing has attracted considerable attention to support large-scale retrieval. Although promising progresses have been made so far, existing methods still suffer from limited capability on excavating and preserving the intrinsic multi-modal semantics. In this paper, we propose a Correlation-Identity Reconstruction Hashing (CIRH) method alleviate challenging problem. We develop new unsupervised deep hash learning framework model preserve heterogeneous correlation...
- Action recognition is a popular research topic in the computer vision and machine learning domains. Although many action methods have been proposed, only few researchers focused on cross-domain few-shot recognition, which must often be performed real security surveillance. Since problems of domain adaptation, need to simultaneously solved, task challenging problem. To solve these issues, this work, we develop novel end-to-end pairwise attentive adversarial spatiotemporal network (PASTN)...
Linmei Hu, Luhao Zhang, Chuan Shi, Liqiang Nie, Weili Guan, Cheng Yang. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint (EMNLP-IJCNLP). 2019.
Automatic image captioning is to conduct the cross-modal conversion from visual content natural language text. Involving computer vision (CV) and processing (NLP), it has become one of most sophisticated research issues in artificial-intelligence area. Based on deep neural network, caption (NIC) model achieved remarkable performance captioning, yet there still remain some essential challenges, such as deviation between descriptive sentences generated by intrinsic expressed image, low...
Temporal action localization is currently an active research topic in computer vision and machine learning due to its usage smart surveillance. It a challenging problem since the categories of actions must be classified untrimmed videos start end need accurately found. Although many temporal methods have been proposed, they require substantial amounts computational resources for training inference processes. To solve these issues, this work, novel temporal-aware relation attention network...
Fashion Compatibility Modeling (FCM) is a new yet challenging task, which aims to automatically access the matching degree among set of complementary items. Most existing methods evaluate fashion compatibility from common perspective, but overlook user's personal preference. Inspired by this, few pioneers study Personalized (PFCM). Despite their significance, these PFCM mainly concentrate on user and item entities, as well interactions, ignore attribute contain rich semantics. To address...
Graphs are widely used to model various practical applications. In recent years, graph convolution networks (GCNs) have attracted increasing attention due the extension of operation from traditional grid data one. However, representation ability current GCNs is undoubtedly limited because existing work fails consider feature interactions. Toward this end, we propose a Dual Feature Interaction-based GCN. Specifically, it models interaction in aspects 1) node features where use Newton's...
Personalized outfit recommendation, which aims to recommend the outfits a given user according his/her preference, has gained increasing research attention due its economic value. Nevertheless, majority of existing methods mainly focus on improving recommendation effectiveness, while overlooking efficiency. Inspired by this, we devise novel bi-directional heterogeneous graph hashing scheme, called BiHGH, towards efficient personalized recommendation. In particular, this scheme consists three...
Fashion compatibility modeling, which is used to estimate the matching degree of a given set fashion items, has received increasing attention in recent years. However, existing studies often fail fully leverage multimodal information or ignore semantic guidance clothing categories elevating reliability information. In this paper, we propose modeling approach with category-aware network, termed as FCM-CMAN. FCM-CMAN, focus on enriching and aggregating representations items by means dynamic...
With the rapid development of science and technology, better living standard people, Internet having features low cost, large information source speed, plays an important role in common people's life.Internet brings a lot convenience while at same time it causes series social problems.Under new situation, teen addiction index is higher higher.To solve problem needs every side to intervene. 1.Introduction
Abstract - Finding tampered regions in images is a common research topic machine learning and computer vision. Although many image manipulation location algorithms have been proposed, most of them only focus on RGB with different color spaces, the frequency information that contains potential tampering clues often ignored. Moreover, among operations, splicing copy-move are two frequently used methods, but as their characteristics quite different, specific methods individually designed for...
Fashion Compatibility Modeling (FCM), which aims to automatically evaluate whether a given set of fashion items makes compatible outfit, has attracted increasing research attention. Recent studies have demonstrated the benefits conducting item representation disentanglement towards FCM. Although these efforts achieved prominent progress, they still perform unsatisfactorily, as mainly investigate visual content items, while overlooking semantic attributes (e.g., color and pattern), could...
Pre-trained vision-language models, e.g., CLIP, working with manually designed prompts have demonstrated great capacity of transfer learning. Recently, learnable achieve state-of-the-art performance, which however are prone to overfit seen classes, failing generalize unseen classes. In this paper, we propose a Knowledge-Aware Prompt Tuning (KAPT) framework for models. Our approach takes the inspiration from human intelligence in external knowledge is usually incorporated into recognizing...
Vision-language pre-training (VLP) models have shown vulnerability to adversarial examples in multimodal tasks. Furthermore, malicious adversaries can be deliberately transferred attack other black-box models. However, existing work has mainly focused on investigating white-box attacks. In this paper, we present the first study investigate transferability of recent VLP We observe that methods exhibit much lower transferability, compared strong performance settings. The degradation is partly...
Cloth-changing person re-identification is a subject closer to the real world, which focuses on solving problem of after pedestrians change clothes. The primary challenge in this field overcome complex interplay between intra-class and inter-class variations identify features that remain unaffected by changes appearance. Sufficient data collection for model training would significantly aid addressing problem. However, it challenging gather diverse datasets practice. Current methods focus...
The incorporation of high-resolution visual input equips multimodal large language models (MLLMs) with enhanced perception capabilities for real-world tasks. However, most existing MLLMs rely on a cropping-based approach to process images, which leads fragmented encoding and sharp increase in redundant tokens. To tackle these issues, we propose the FALCON model. introduces novel register technique simultaneously: 1) Eliminate tokens at stage encoding. directly address redundancy present...