NFDI4DS | UHH-SEMS - Publication Details

Xuanhan Wang

ORCID: 0000-0002-3881-9658

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5078148851

Research Areas

Human Pose and Action Recognition
Multimodal Machine Learning Applications
Advanced Neural Network Applications
Video Surveillance and Tracking Methods
Anomaly Detection Techniques and Applications
Gait Recognition and Analysis
Salivary Gland Tumors Diagnosis and Treatment
Adversarial Robustness in Machine Learning
Advanced Image and Video Retrieval Techniques
Domain Adaptation and Few-Shot Learning
Stroke Rehabilitation and Recovery
Oral Health Pathology and Treatment
Generative Adversarial Networks and Image Synthesis
Topic Modeling
Salivary Gland Disorders and Functions
Natural Language Processing Techniques
Hand Gesture Recognition Systems
Video Analysis and Summarization
Lipid metabolism and disorders
Reproductive System and Pregnancy
Human Motion and Animation
Advanced Vision and Imaging
AI in cancer detection
Context-Aware Activity Recognition Systems
Cancer-related molecular mechanisms research

Yangzhou University
2022-2024

University of Electronic Science and Technology of China
2016-2023

Beyond Frame-level CNN: Saliency-Aware 3-D CNN With LSTM for Video Action Recognition

OPENALEX - Publications

Xuanhan Wang Lianli Gao Jingkuan Song Hengtao Shen

Human activity recognition in videos with convolutional neural network (CNN) features has received increasing attention multimedia understanding. Taking as a sequence of frames, new record was recently set on several benchmark datasets by feeding frame-level CNN to long short-term memory (LSTM) model for video recognition. This recurrent model-based visual pipeline is natural choice perceptual problems time-varying input or sequential outputs. However, the above-mentioned takes LSTM, which...

10.1109/lsp.2016.2611485 article EN IEEE Signal Processing Letters 2016-09-20

Two-Stream 3-D convNet Fusion for Action Recognition in Videos With Arbitrary Size and Length

OPENALEX - Publications

Xuanhan Wang Lianli Gao Peng Wang Xiaoshuai Sun Xianglong Liu

3-D convolutional neural networks (3-D-convNets) have been very recently proposed for action recognition in videos, and promising results are achieved. However, existing 3-D-convNets has two "artificial" requirements that may reduce the quality of video analysis: 1) It requires a fixed-sized (e.g., 112 $\times$ 112) input video; 2) most require fixed-length (i.e., shots with fixed number frames). To tackle these issues, we propose an end-to-end pipeline named Two-stream 3-D-convNet Fusion,...

10.1109/tmm.2017.2749159 article EN IEEE Transactions on Multimedia 2017-09-04

From General to Specific: Informative Scene Graph Generation via Balance Adjustment

OPENALEX - Publications

Yuyu Guo Lianli Gao Xuanhan Wang Yuxuan Hu Xing Xu and 3 more

The scene graph generation (SGG) task aims to detect visual relationship triplets, i.e., subject, predicate, object, in an image, providing a structural vision layout for understanding. However, current models are stuck common predicates, e.g., "on" and "at", rather than informative ones, "standing on" "looking at", resulting the loss of precise information overall performance. If model only uses "stone on road" "blocking" describe it is easy misunderstand scene. We argue that this...

10.1109/iccv48922.2021.01607 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

Learnable Aggregating Net with Diversity Learning for Video Question Answering

OPENALEX - Publications

Xiangpeng Li Lianli Gao Xuanhan Wang Wu Liu Xing Xu and 2 more

Video visual question answering (V-VQA) remains challenging at the intersection of vision and language, where it requires joint comprehension video natural language question. Image-Question co-attention mechanism, which aims generating a spatial map highlighting image regions relevant to vice versa, has obtained impressive results. Despite success, simply applying results in unsatisfactory performance due complexity temporal nature videos. In this paper, we proposed novel architecture,...

10.1145/3343031.3350971 article EN Proceedings of the 30th ACM International Conference on Multimedia 2019-10-15

Deep appearance and motion learning for egocentric activity recognition

OPENALEX - Publications

Xuanhan Wang Lianli Gao Jingkuan Song Xiantong Zhen Nicu Sebe and 1 more

10.1016/j.neucom.2017.08.063 article EN Neurocomputing 2017-09-08

Fused GRU with semantic-temporal attention for video captioning

OPENALEX - Publications

Lianli Gao Xuanhan Wang Jingkuan Song Yang Liu

10.1016/j.neucom.2018.06.096 article EN Neurocomputing 2019-07-18

Skeleton-based Action Recognition via Adaptive Cross-Form Learning

OPENALEX - Publications

Xuanhan Wang Yan Dai Lianli Gao Jingkuan Song

Skeleton-based action recognition aims to project skeleton sequences categories, where are derived from multiple forms of pre-detected points. Compared with earlier methods that focus on exploring single-form skeletons via Graph Convolutional Networks (GCNs), existing tend improve GCNs by leveraging multi-form due their complementary cues. However, these (either adapting structure or model ensemble) require the co-existence all during both training and inference stages, while a typical...

10.1145/3503161.3547811 article EN Proceedings of the 30th ACM International Conference on Multimedia 2022-10-10

MKE-GCN: Multi-Modal Knowledge Embedded Graph Convolutional Network for Skeleton-Based Action Recognition in the Wild

OPENALEX - Publications

Sen Yang Xuanhan Wang Lianli Gao Jingkuan Song

The graph convolutional networks (GCNs), which model human body skeletons as several spatial-temporal graphs, have been widely used and become a key to representative feature extraction. However, existing methods limitations in recognizing action the wild, where are captured from real-world scenes with diversified view-points, obvious motion blurs, complex interactions fast varying resolutions of body. In this paper, we propose Multi-modal Knowledge Embedded Graph Convolutional Network...

10.1109/icme52920.2022.9859787 article EN 2022 IEEE International Conference on Multimedia and Expo (ICME) 2022-07-18

KTN: Knowledge Transfer Network for Learning Multiperson 2D-3D Correspondences

OPENALEX - Publications

Xuanhan Wang Lianli Gao Yixuan Zhou Jingkuan Song Meng Wang

Human densepose estimation, aiming at establishing dense correspondences between 2D pixels of human body and 3D template, is a key technique in enabling machines to have an understanding people images. It still poses several challenges due practical scenarios where real-world scenes are complex only partial annotations available, leading incompelete or false estimations. In this work, we present novel framework detect the multiple image. The proposed method, which refer Knowledge Transfer...

10.1109/tcsvt.2022.3181604 article EN IEEE Transactions on Circuits and Systems for Video Technology 2022-06-09

The causal effects of age at menarche, age at first live birth, and estradiol levels on systemic lupus erythematosus: A two-sample Mendelian randomization analysis

OPENALEX - Publications

Y Zhang Yuxuan Fang Nan Xu Longlong Tian Xingxing Min and 6 more

To determine whether age at menarche (AAM), first live birth (AFB), and estradiol levels are causally correlated with the development of systemic lupus erythematosus (SLE).A two-sample Mendelian randomization (MR) analysis was performed after data collected from a dataset genome-wide association studies (GWASs) related to SLE (as outcome), open access databases find statistics AAM, AFB, exposure).In our study, negative causal correlation between AAM confirmed by MR (MR egger: beta = 0.116,...

10.1177/09612033231180358 article EN Lupus 2023-05-29

RSGNet: Relation based Skeleton Graph Network for Crowded Scenes Pose Estimation

OPENALEX - Publications

Yan Dai Xuanhan Wang Lianli Gao Jingkuan Song Heng Tao Shen

Despite of the recent great progress on multi-person pose estimation, existing solutions still remain challenging under condition "crowded scenes'', where RGB images capture complex real-world scenes with highly-overlapped people, severe occlusions and diverse postures. In this work, we focus two main problems: 1) how to design an effective pipeline for crowded estimation; 2) equip ability relation modeling interference resolving. To tackle these problems, propose a new named Relation based...

10.1609/aaai.v35i2.16206 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2021-05-18

ReSParser: Fully Convolutional Multiple Human Parsing With Representative Sets

OPENALEX - Publications

Yan Dai Xiaojia Chen Xuanhan Wang Minghui Pang Lianli Gao and 1 more

Multiple human parsing (MHP) is typically treated as two sub-tasks, i.e., instance separation and body part segmentation. Existing methods usually tackle the sub-tasks by adopting a two-stage strategy, which regards MHP an ROI-based (i.e., detect-then-segment) or grouping-based segment-then-grouping) paradigm. However, strong dependence between limits potential of method, since it often requires qualified prior predictions. Besides, isolated models responsible for bring significant...

10.1109/tmm.2023.3281070 article EN IEEE Transactions on Multimedia 2023-05-29

KTN: Knowledge Transfer Network for Multi-person DensePose Estimation

OPENALEX - Publications

Xuanhan Wang Lianli Gao Jingkuan Song Heng Tao Shen

In this paper, we address the multi-person densepose estimation problem, which aims at learning dense correspondences between 2D pixels of human body and 3D surface. It still poses several challenges due to real-world scenes with scale variations, occlusion insufficient annotations. particular, two main problems: 1) how design a simple yet effective pipeline for estimation; 2) equip ability handling issues limited annotations class-imbalanced labels. To tackle these problems, develop novel...

10.1145/3394171.3414014 article EN Proceedings of the 30th ACM International Conference on Multimedia 2020-10-12

Semantic-aware Transfer with Instance-adaptive Parsing for Crowded Scenes Pose Estimation

OPENALEX - Publications

Xuanhan Wang Lianli Gao Yan Dai Yixuan Zhou Jingkuan Song

Crowded scenes human pose estimation remains challenging, which requires joint comprehension of multi-persons and their keypoints in a highly complex scenario. The top-down mechanism, is detect-then-estimate pipeline, has become the mainstream solution for general obtained impressive progress. However, simply applying this mechanism to crowded results unsatisfactory performance due several issues, particular involving missing crowds ambiguously labeling during training. To tackle above two...

10.1145/3474085.3475233 article EN Proceedings of the 30th ACM International Conference on Multimedia 2021-10-17

KE-RCNN: Unifying Knowledge-Based Reasoning Into Part-Level Attribute Parsing

OPENALEX - Publications

Xuanhan Wang Jingkuan Song Xiaojia Chen Lechao Cheng Lianli Gao and 1 more

Part-level attribute parsing is a fundamental but challenging task, which requires the region-level visual understanding to provide explainable details of body parts. Most existing approaches address this problem by adding regional convolutional neural network (RCNN) with an prediction head two-stage detector, in attributes parts are identified from localwise part boxes. However, boxes limit clues (i.e., appearance only) lead unsatisfying results, since highly dependent on comprehensive...

10.1109/tcyb.2022.3209653 article EN IEEE Transactions on Cybernetics 2022-11-03

X-HRNet: Towards Lightweight Human Pose Estimation with Spatially Unidimensional Self-Attention

OPENALEX - Publications

Yixuan Zhou Xuanhan Wang Xing Xu Lei Zhao Jingkuan Song

High-resolution representation is necessary for human pose estimation to achieve high performance, and the ensuing problem computational complexity. In particular, predominant methods estimate joints by 2D single-peak heatmaps. Each heatmap can be hori-zontally vertically projected reconstructed a pair of 1D heat vectors. Inspired this observation, we introduce lightweight powerful alternative, Spatially Unidimensional Self-Attention (SUSA), pointwise (1 x 1) convolution that main bottleneck...

10.1109/icme52920.2022.9859751 article EN 2022 IEEE International Conference on Multimedia and Expo (ICME) 2022-07-18

AMANet: Adaptive Multi-Path Aggregation for Learning Human 2D-3D Correspondences

OPENALEX - Publications

Xuanhan Wang Yuyu Guo Jingkuan Song Lianli Gao Heng Tao Shen

Learning human 2D-3D correspondences aims to map all 2D pixels a 3D template, namely densepose estimation, involving surface patch recognition (i.e., Index-to-Patch (I)) and regression of patch-specific UV coordinates. Despite recent progress, it remains challenging especially under the condition “in wild”, where RGB images capture real-world scenes with backgrounds, occlusions, scale variations, postural diversity. In this paper, we address three vital problems in task: 1) how perceive...

10.1109/tmm.2021.3135145 article EN IEEE Transactions on Multimedia 2021-12-14

Overcoming Data Deficiency for Multi-Person Pose Estimation

OPENALEX - Publications

Yan Dai Xuanhan Wang Lianli Gao Jingkuan Song Feng Zheng and 1 more

Building multi-person pose estimation (MPPE) models that can handle complex foreground and uncommon scenes is an important challenge in computer vision. Aside from designing novel models, strengthening training data a promising direction but remains largely unexploited for the MPPE task. In this article, we systematically identify key deficiencies of existing datasets prevent power well-designed being fully exploited propose corresponding solutions. Specifically, find traditional...

10.1109/tnnls.2023.3244957 article EN IEEE Transactions on Neural Networks and Learning Systems 2023-05-11

Value of magnetic resonance imaging and sialography of the parotid gland for diagnosis of primary Sjögren syndrome

OPENALEX - Publications

Yujun Rao Nan Xu Yongbin Zhang Yuxuan Fang Longlong Tian and 9 more

Abstract Aim To evaluate the utility of magnetic resonance imaging (MRI) and sialography (MRS) for diagnosis primary Sjögren syndrome (pSS) singly or integrated with 2016 American College Rheumatology (ACR)/European League Against Rheumatic Diseases (EULAR) classification criteria. Methods The diagnostic efficiencies MRI, MRS, labial salivary gland biopsy (LSGB) were evaluated. prediction model was established by multivariate analysis. Finally, performance ACR/EULAR criteria evaluated after...

10.1111/1756-185x.14528 article EN International Journal of Rheumatic Diseases 2022-12-11

ProposalVLAD with Proposal-Intra Exploring for Temporal Action Proposal Generation

OPENALEX - Publications

Kai Xing Tao Li Xuanhan Wang

Temporal action proposal generation aims to localize temporal segments of human activities in videos. Current boundary-based methods can generate proposals with precise boundary but often suffer from the inferior quality confidence scores used for retrieving. In this article, we propose an effective and end-to-end method, named ProposalVLAD, Proposal-Intra Exploring Network (PVPI-Net). We first a ProposalVLAD module dynamically global features entire video, then combine local final feature...

10.1145/3571747 article EN ACM Transactions on Multimedia Computing Communications and Applications 2022-11-24

Non-invasive imaging for predicting labial salivary gland biopsy outcomes in patients with suspected primary Sjögren syndrome

OPENALEX - Publications

Nan Xu Xuanhan Wang Tiantian Dai Nianxing Liu Yimin Ding and 5 more

10.1007/s10067-024-06949-w article EN Clinical Rheumatology 2024-04-03

CPI-Parser: Integrating Causal Properties into Multiple Human Parsing

OPENALEX - Publications

Xuanhan Wang Xiaojia Chen Lianli Gao Jingkuan Song Hengtao Shen

Existing methods of multiple human parsing (MHP) apply deep models to learn instance-level representations for segmenting each person into non-overlapped body parts. However, learned often contain many spurious correlations that degrade model generalization, leading be vulnerable visually contextual variations in images (e.g., unseen image styles/external interventions). To tackle this, we present a causal property integrated termed CPI-Parser, which is driven by fundamental principles...

10.1109/tip.2024.3469579 article EN IEEE Transactions on Image Processing 2024-01-01

Any Target Can be Offense: Adversarial Example Generation via Generalized Latent Infection

OPENALEX - Publications

Y. Sun Shengming Yuan Xuanhan Wang Lianli Gao Jingkuan Song

Targeted adversarial attack, which aims to mislead a model recognize any image as target object by imperceptible perturbations, has become mainstream tool for vulnerability assessment of deep neural networks (DNNs). Since existing targeted attackers only learn attack known classes, they cannot generalize well unknown classes. To tackle this issue, we propose $\bf{G}$eneralized $\bf{A}$dversarial attac$\bf{KER}$ ($\bf{GAKer}$), is able construct examples class. The core idea behind GAKer...

10.48550/arxiv.2407.12292 preprint EN arXiv (Cornell University) 2024-07-16

Realistic Corner Case Generation for Autonomous Vehicles with Multimodal Large Language Model

OPENALEX - Publications

Qiujing Lu Meng Ma X. C. Dai Xuanhan Wang Shuo Feng

To guarantee the safety and reliability of autonomous vehicle (AV) systems, corner cases play a crucial role in exploring system's behavior under rare challenging conditions within simulation environments. However, current approaches often fall short meeting diverse testing needs struggle to generalize novel, high-risk scenarios that closely mirror real-world conditions. tackle this challenge, we present AutoScenario, multimodal Large Language Model (LLM)-based framework for realistic case...

10.48550/arxiv.2412.00243 preprint EN arXiv (Cornell University) 2024-11-29

Coming Soon ...