Weili Guan

ORCID: 0000-0002-5658-5509
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Multimodal Machine Learning Applications
  • Advanced Image and Video Retrieval Techniques
  • Generative Adversarial Networks and Image Synthesis
  • Video Surveillance and Tracking Methods
  • Human Pose and Action Recognition
  • Face recognition and analysis
  • Image Retrieval and Classification Techniques
  • Video Analysis and Summarization
  • Recommender Systems and Techniques
  • Domain Adaptation and Few-Shot Learning
  • Advanced Graph Neural Networks
  • Topic Modeling
  • 3D Shape Modeling and Analysis
  • Anomaly Detection Techniques and Applications
  • Advanced Vision and Imaging
  • Adversarial Robustness in Machine Learning
  • Advanced Neural Network Applications
  • Text and Document Classification Technologies
  • Natural Language Processing Techniques
  • Handwritten Text Recognition Techniques
  • Advanced Image Processing Techniques
  • Caching and Content Delivery
  • Sentiment Analysis and Opinion Mining
  • Digital Media Forensic Detection
  • Speech and Audio Processing

Harbin Institute of Technology
2023-2025

Monash University
2019-2024

Shenzhen Institute of Information Technology
2024

Guiyang Medical University
2024

Nanning Normal University
2015-2023

Australian Regenerative Medicine Institute
2023

Peng Cheng Laboratory
2023

Singapore-HUJ Alliance for Research and Enterprise
2019-2020

Beijing Fengtai Hospital
2011

Southerners on New Ground
1996

The prevailing characteristics of micro-videos result in the less descriptive power each modality. micro-video representations, several pioneer efforts proposed, are limited implicitly exploring consistency between different modality information but ignore complementarity. In this paper, we focus on how to explicitly separate consistent features and complementary from mixed harness their combination improve expressiveness Toward end, present a neural multimodal cooperative learning (NMCL)...

10.1109/tip.2019.2923608 article EN IEEE Transactions on Image Processing 2019-07-12

In recent years, remarkable progress in zero-shot learning (ZSL) has been achieved by generative adversarial networks (GAN). To compensate for the lack of training samples ZSL, a surge GAN architectures have developed human experts through trial-and-error testing. Despite their efficacy, however, there is still no guarantee that these hand-crafted models can consistently achieve good performance across diversified datasets or scenarios. Accordingly, this paper, we turn to neural architecture...

10.1109/tpami.2021.3127346 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2021-11-11

With the explosive growth of multimedia contents, retrieval is facing unprecedented challenges on both storage cost and speed. Hashing technique can project high-dimensional data into compact binary hash codes. it, most time-consuming semantic similarity computation during process be significantly accelerated with fast Hamming distance computation, meanwhile reduced greatly by embedding. In light this, multi-modal hashing has recently received considerable attention to support large-scale...

10.1109/tkde.2023.3282921 article EN IEEE Transactions on Knowledge and Data Engineering 2023-06-05

10.1016/j.ipm.2019.102178 article EN Information Processing & Management 2019-12-12

Session-based recommendation (SBR) has drawn increasingly research attention in recent years, due to its great practical value by only exploiting the limited user behavior history current session. The key of SBR is accurately infer anonymous purpose a session which typically represented as embedding, and then match it with item embeddings for next prediction. Existing methods learn embedding at level, namely, aggregating items or without assigned weights items. However, they ignore fact that...

10.1109/tkde.2022.3208782 article EN IEEE Transactions on Knowledge and Data Engineering 2022-09-22

Unsupervised cross-modal hashing has attracted considerable attention to support large-scale retrieval. Although promising progresses have been made so far, existing methods still suffer from limited capability on excavating and preserving the intrinsic multi-modal semantics. In this paper, we propose a Correlation-Identity Reconstruction Hashing (CIRH) method alleviate challenging problem. We develop new unsupervised deep hash learning framework model preserve heterogeneous correlation...

10.1109/tkde.2022.3218656 article EN IEEE Transactions on Knowledge and Data Engineering 2022-11-04

- Action recognition is a popular research topic in the computer vision and machine learning domains. Although many action methods have been proposed, only few researchers focused on cross-domain few-shot recognition, which must often be performed real security surveillance. Since problems of domain adaptation, need to simultaneously solved, task challenging problem. To solve these issues, this work, we develop novel end-to-end pairwise attentive adversarial spatiotemporal network (PASTN)...

10.1109/tip.2020.3038372 article EN IEEE Transactions on Image Processing 2020-11-25

Linmei Hu, Luhao Zhang, Chuan Shi, Liqiang Nie, Weili Guan, Cheng Yang. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint (EMNLP-IJCNLP). 2019.

10.18653/v1/d19-1395 article EN cc-by 2019-01-01

Automatic image captioning is to conduct the cross-modal conversion from visual content natural language text. Involving computer vision (CV) and processing (NLP), it has become one of most sophisticated research issues in artificial-intelligence area. Based on deep neural network, caption (NIC) model achieved remarkable performance captioning, yet there still remain some essential challenges, such as deviation between descriptive sentences generated by intrinsic expressed image, low...

10.1109/tcyb.2020.2997034 article EN IEEE Transactions on Cybernetics 2020-06-22

Temporal action localization is currently an active research topic in computer vision and machine learning due to its usage smart surveillance. It a challenging problem since the categories of actions must be classified untrimmed videos start end need accurately found. Although many temporal methods have been proposed, they require substantial amounts computational resources for training inference processes. To solve these issues, this work, novel temporal-aware relation attention network...

10.1109/tip.2022.3182866 article EN IEEE Transactions on Image Processing 2022-01-01

Fashion Compatibility Modeling (FCM) is a new yet challenging task, which aims to automatically access the matching degree among set of complementary items. Most existing methods evaluate fashion compatibility from common perspective, but overlook user's personal preference. Inspired by this, few pioneers study Personalized (PFCM). Despite their significance, these PFCM mainly concentrate on user and item entities, as well interactions, ignore attribute contain rich semantics. To address...

10.1145/3477495.3532038 article EN Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval 2022-07-06

Graphs are widely used to model various practical applications. In recent years, graph convolution networks (GCNs) have attracted increasing attention due the extension of operation from traditional grid data one. However, representation ability current GCNs is undoubtedly limited because existing work fails consider feature interactions. Toward this end, we propose a Dual Feature Interaction-based GCN. Specifically, it models interaction in aspects 1) node features where use Newton's...

10.1109/tkde.2022.3220789 article EN IEEE Transactions on Knowledge and Data Engineering 2022-11-09

Personalized outfit recommendation, which aims to recommend the outfits a given user according his/her preference, has gained increasing research attention due its economic value. Nevertheless, majority of existing methods mainly focus on improving recommendation effectiveness, while overlooking efficiency. Inspired by this, we devise novel bi-directional heterogeneous graph hashing scheme, called BiHGH, towards efficient personalized recommendation. In particular, this scheme consists three...

10.1145/3503161.3548020 article EN Proceedings of the 30th ACM International Conference on Multimedia 2022-10-10

Fashion compatibility modeling, which is used to estimate the matching degree of a given set fashion items, has received increasing attention in recent years. However, existing studies often fail fully leverage multimodal information or ignore semantic guidance clothing categories elevating reliability information. In this paper, we propose modeling approach with category-aware network, termed as FCM-CMAN. FCM-CMAN, focus on enriching and aggregating representations items by means dynamic...

10.1109/tmm.2023.3246796 article EN IEEE Transactions on Multimedia 2023-01-01

With the rapid development of science and technology, better living standard people, Internet having features low cost, large information source speed, plays an important role in common people's life.Internet brings a lot convenience while at same time it causes series social problems.Under new situation, teen addiction index is higher higher.To solve problem needs every side to intervene. 1.Introduction

10.2991/nceece-15.2016.51 article EN cc-by-nc 2016-01-01

Abstract - Finding tampered regions in images is a common research topic machine learning and computer vision. Although many image manipulation location algorithms have been proposed, most of them only focus on RGB with different color spaces, the frequency information that contains potential tampering clues often ignored. Moreover, among operations, splicing copy-move are two frequently used methods, but as their characteristics quite different, specific methods individually designed for...

10.1109/tkde.2022.3187091 article EN IEEE Transactions on Knowledge and Data Engineering 2022-01-01

Fashion Compatibility Modeling (FCM), which aims to automatically evaluate whether a given set of fashion items makes compatible outfit, has attracted increasing research attention. Recent studies have demonstrated the benefits conducting item representation disentanglement towards FCM. Although these efforts achieved prominent progress, they still perform unsatisfactorily, as mainly investigate visual content items, while overlooking semantic attributes (e.g., color and pattern), could...

10.1109/tip.2022.3187290 article EN IEEE Transactions on Image Processing 2022-01-01

Pre-trained vision-language models, e.g., CLIP, working with manually designed prompts have demonstrated great capacity of transfer learning. Recently, learnable achieve state-of-the-art performance, which however are prone to overfit seen classes, failing generalize unseen classes. In this paper, we propose a Knowledge-Aware Prompt Tuning (KAPT) framework for models. Our approach takes the inspiration from human intelligence in external knowledge is usually incorporated into recognizing...

10.1109/iccv51070.2023.01436 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

Vision-language pre-training (VLP) models have shown vulnerability to adversarial examples in multimodal tasks. Furthermore, malicious adversaries can be deliberately transferred attack other black-box models. However, existing work has mainly focused on investigating white-box attacks. In this paper, we present the first study investigate transferability of recent VLP We observe that methods exhibit much lower transferability, compared strong performance settings. The degradation is partly...

10.1109/iccv51070.2023.00016 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

Cloth-changing person re-identification is a subject closer to the real world, which focuses on solving problem of after pedestrians change clothes. The primary challenge in this field overcome complex interplay between intra-class and inter-class variations identify features that remain unaffected by changes appearance. Sufficient data collection for model training would significantly aid addressing problem. However, it challenging gather diverse datasets practice. Current methods focus...

10.1109/tip.2025.3531217 article EN IEEE Transactions on Image Processing 2025-01-01

The incorporation of high-resolution visual input equips multimodal large language models (MLLMs) with enhanced perception capabilities for real-world tasks. However, most existing MLLMs rely on a cropping-based approach to process images, which leads fragmented encoding and sharp increase in redundant tokens. To tackle these issues, we propose the FALCON model. introduces novel register technique simultaneously: 1) Eliminate tokens at stage encoding. directly address redundancy present...

10.48550/arxiv.2501.16297 preprint EN arXiv (Cornell University) 2025-01-27
Coming Soon ...