- Advanced Image and Video Retrieval Techniques
- Human Pose and Action Recognition
- Video Analysis and Summarization
- Image Retrieval and Classification Techniques
- Advanced Steganography and Watermarking Techniques
- Digital Media Forensic Detection
- Multimodal Machine Learning Applications
- 3D Shape Modeling and Analysis
- Video Surveillance and Tracking Methods
- Generative Adversarial Networks and Image Synthesis
- Advanced Image Processing Techniques
- Advanced Vision and Imaging
- Glioma Diagnosis and Treatment
- Image Enhancement Techniques
- Visual Attention and Saliency Detection
- Anomaly Detection Techniques and Applications
- Text and Document Classification Technologies
- Face and Expression Recognition
- Domain Adaptation and Few-Shot Learning
- Image Processing Techniques and Applications
- Chaos-based Image/Signal Encryption
- Image Processing and 3D Reconstruction
- Cell Image Analysis Techniques
- Image and Video Quality Assessment
- Advanced Neural Network Applications
Affiliated Hospital of Jiangsu University
2015-2025
Tianjin University
2016-2025
Wayne State University
2023-2025
Fujian Agriculture and Forestry University
2025
Xijing Hospital
2024-2025
Air Force Medical University
2024-2025
Jiangsu University
2015-2025
Henan University
2025
Kaohsiung Chang Gung Memorial Hospital
2022-2024
Chang Gung University
2022-2024
This paper proposes a hierarchical clustering multi-task learning (HC-MTL) method for joint human action grouping and recognition. Specifically, we formulate the objective function into group-wise least square loss regularized by low rank sparsity with respect to two latent variables, model parameters information, optimization. To handle this non-convex optimization, decompose it sub-tasks, task relatedness discovery. First, convert convex formulation fixing information. new focuses on...
Multi-Class Incremental Learning (MCIL) aims to learn new concepts by incrementally updating a model trained on previous concepts. However, there is an inherent trade-off effectively learning without catastrophic forgetting of ones. To alleviate this issue, it has been proposed keep around few examples the but effectiveness approach heavily depends representativeness these examples. This paper proposes novel and automatic framework we call mnemonics, where parameterize exemplars make them...
Compared with conventional targets, small objects often face challenges such as smaller size, lower resolution, weaker contrast, and more background interference, making their detection difficult. To address this issue, paper proposes an improved object method based on the YOLO11 model-PC-YOLO11s. The core innovation of PC-YOLO11s lies in optimization network structure, which includes following aspects: Firstly, has adjusted hierarchical structure added a P2 layer specifically for detection....
Multi-view matching is an important but a challenging task in view-based 3D model retrieval. To address this challenge, we propose original multi-modal clique graph (MCG) method paper. We systematically present for MCG generation that composed of cliques, which consist neighbor nodes feature space and hyper-edges link pairwise cliques. Moreover, image set-based clique/edgewise similarity measure to the issue set-to-set distance measure, core problem matching. The proposed provides following...
Recently, a prevailing trend of user generated content (UGC) on social media sites is the emerging micro-videos. Microvideos afford many potential opportunities ranging from network caching to online advertising, yet there are still little efforts dedicated research micro-video understanding. In this paper, we focus popularity prediction micro-videos by presenting novel low-rank multi-view embedding learning framework. We name it as transductive regression (TLRMVR), and capable boosting...
Recent progress in using recurrent neural networks (RNNs) for video description has attracted an increasing interest, due to its capability encode a sequence of frames caption generation. While existing methods have studied various features (e.g., CNN, 3D and semantic attributes) visual encoding, the representation fusion heterogeneous information from multi-modal spaces not fully explored. Consider that different modalities are often asynchronous, frame-level concatenation linear fusion)...
This paper proposes a unified framework for multiple/single-view human action recognition. First, we propose the hierarchical partwise bag-of-words representation which encodes both local and global visual saliency based on body structure cue. Then, formulate recognition as part-regularized multitask structural learning (MTSL) problem has two advantages model feature selection: 1) preserving consistence between body-based classification part-based with complementary information among...
Human action recognition is an active research area in both computer vision and machine learning communities. In the past decades, problem has evolved from conventional single-view problem, to cross-view learning, cross-domain multitask where a large number of algorithms have been proposed literature. Despite having datasets, most them are designed for subset four problems, comparisons between can further limited by variances within experimental configurations, other factors. To best our...
Image captioning is one of the most challenging tasks in AI because it requires an understanding both complex visuals and natural language. Because image essentially a sequential prediction task, recent advances have used reinforcement learning (RL) to better explore dynamics word-by-word generation. However, existing RL-based methods rely primarily on single policy network reward function-an approach that not well matched multi-level (word sentence) multi-modal (vision language) nature...
View-based 3-D model retrieval is one of the most important techniques in numerous applications computer vision. While many methods have been proposed recent years, to best our knowledge, there no benchmark evaluate state-of-the-art methods. To tackle this problem, we systematically investigate and related by: 1) proposing a clique graph-based method 2) reimplementing six representative Moreover, concurrently both hand-crafted visual features deep on four popular datasets (NTU60, NTU216,...
Image and text matching plays a crucial role in bridging the cross-modal gap between vision language, has achieved great progress due to deep learning. However, existing methods still suffer from long-tail problem, where only small proportion contains highly frequent semantics long tail is constructed by rare semantics. In this paper, we propose novel Dual-path Rare Content Enhancement Network (DRCE) tackle issue. Specifically, Cross-modal Representation (CRE) Association (CAE) are proposed...
Cross-Domain Recommendation (CDR) has been proven to effectively alleviate the data sparsity problem in Recommender System (RS). Recent CDR methods often disentangle user features into domain-invariant and domain-specific for efficient cross-domain knowledge transfer. Despite showcasing robust performance, three crucial aspects remain unexplored existing disentangled approaches: i) The significance nuances of interaction behaviors are ignored generating features; ii) irrelevant individual...
Abstract Background Mounting evidences shows that the ubiquitin‒proteasome pathway plays a pivotal role in tumor progression. The expression of 26S proteasome non-ATPase regulatory subunit 9 (PSMD9) is correlated with recurrence and radiotherapy resistance several types. However, mechanism PSMD9 hepatocellular carcinoma (HCC) progression remain largely unclear. Methods was identified as prognosis-related biomarker for HCC based on analysis clinical characteristics RNA-seq data from Cancer...
To study the factors affecting Penaeus vannamei production in small-scale greenhouse ponds, four ponds Jiangmen, Guangdong Province, China were selected. This investigated variation characteristics of bacterial communities and pathogens pond water shrimp intestines, as well quality during culture stage. Multivariate linear regression equations used to analyse potential production. The nitrite concentration reached its peak mid-culture stage, with a maximum 16.3 mg·L−1, whereas total nitrogen...
Copy-Move attack is a special type of image forgery, in which part digital copied and pasted to another the same order cover an important feature. This paper describes new blind forensics approach for detecting forgery. Our technique works by first applying DWT (Discrete Wavelet Transform) input yield reduced dimension representation. Then phase correlation computed estimate spatial offset between region region. The regions can be easily located idea pixel-matching, shifting according...
With the rapid development of wireless sensor network and continuous improvement its key technologies, concept Internet Things has been encouraged extended due to wide applications in scenarios, such as smart homes healthcare. Under background, human activity recognition drawn great attention recent years. In this paper, we present a discriminant approach recognize daily activities recorded through accelerometer sensor. proposed approach, first use S transform (ST) extract features, then...
Remembering images is an innate human capability. Camera are captured by different people under varying environmental conditions, which leads to highly diverse image memorability scores. However, the factors that make more or less memorable unclear, and it remains unknown how we can accurately predict using such factors. In this paper, propose a novel framework called multiview transfer learning from external sources (MTLES) memorability. framework, simultaneously leverage types of visual...