Xun Jiang

ORCID: 0000-0003-2209-651X
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Multimodal Machine Learning Applications
  • Video Analysis and Summarization
  • Human Pose and Action Recognition
  • Advanced Image and Video Retrieval Techniques
  • Advanced Computational Techniques and Applications
  • Sentiment Analysis and Opinion Mining
  • Music and Audio Processing
  • Image Retrieval and Classification Techniques
  • Anomaly Detection Techniques and Applications
  • Data Mining Algorithms and Applications
  • Time Series Analysis and Forecasting
  • Scientific Computing and Data Management
  • Data Analysis with R
  • Technology and Security Systems
  • Embedded Systems and FPGA Design
  • Layered Double Hydroxides Synthesis and Applications
  • Research Data Management Practices
  • Embedded Systems Design Techniques
  • Software Engineering Techniques and Practices
  • Advanced Text Analysis Techniques
  • Quantum Computing Algorithms and Architecture
  • Diabetes Treatment and Management
  • Advanced Materials and Mechanics
  • Dynamics and Control of Mechanical Systems
  • Advanced Software Engineering Methodologies

University of Electronic Science and Technology of China
2012-2025

Collaborative Innovation Center of Advanced Microstructures
2024

Nanjing University
2012-2024

Liaoning University
2024

NARI Group (China)
2023

Amgen (United States)
2021-2022

Yale University
2021

Zhejiang Ocean University
2019

Chongqing University of Posts and Telecommunications
2017

University of Connecticut
2013

The materials discovery process can be significantly expedited and simplified if we learn effectively from available knowledge data. In the present contribution, show that efficient accurate prediction of a diverse set properties material systems is possible by employing machine (or statistical) learning methods trained on quantum mechanical computations in combination with notions chemical similarity. Using family one-dimensional chain systems, general formalism allows us to discover...

10.1038/srep02810 article EN cc-by Scientific Reports 2013-09-30

Video events grounding aims at retrieving the most relevant moments from an untrimmed video in terms of a given natural language query. Most previous works focus on Sentence Grounding (VSG), which localizes moment with sentence Recently, researchers extended this task to Paragraph (VPG) by multiple paragraph. However, we find existing VPG methods may not perform well context modeling and highly rely video-paragraph annotations. To tackle problem, propose novel method termed Semi-supervised...

10.1109/cvpr52688.2022.00250 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

10.1109/cvpr52733.2024.02538 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

The Weakly-Supervised Audio-Visual Video Parsing (AVVP) task aims to parse a video into temporal segments and predict their event categories in terms of modalities, labeling them as either audible, visible, or both. Since the boundaries modalities annotations are not provided, only video-level labels available, this is more challenging than conventional understanding tasks.Most previous works attempt analyze videos by jointly modeling audio data then learning information from segment-level...

10.1145/3503161.3548309 article EN Proceedings of the 30th ACM International Conference on Multimedia 2022-10-10

The <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Composed Query-Based Image Retrieval (CQBIR)</i> task aims to precisely obtain the preserved and modified parts, based on multi-grained semantics learned from composed query. Since query includes a reference image modification text, not just single modality, this is more challenging than general retrieval tasks. Most previous methods attempt learn parts via different attention modules fuse...

10.1109/tcsvt.2023.3306738 article EN IEEE Transactions on Circuits and Systems for Video Technology 2023-08-21

10.1109/icassp49660.2025.10889129 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

Artificial intelligence, particularly language models (LMs), is reshaping research paradigms across scientific domains. In the fields of chemistry and pharmacy, chemical (CLMs) have achieved remarkable success in two-dimensional (2D) molecular modeling tasks by leveraging one-dimensional (1D) representations molecules, such as SMILES SELFIES. However, extending these successes to three-dimensional (3D) remains a significant challenge, largely due absence effective 1D for capturing 3D...

10.1101/2025.05.07.652440 preprint EN cc-by-nc-nd 2025-05-12

Video Moment Retrieval (VMR) aims at retrieving the most relevant events from an untrimmed video with natural language queries. Existing VMR methods suffer two defects: (1) massive expensive temporal annotations are required to obtain satisfying performance; (2) complicated cross-modal interaction modules deployed, which lead high computational cost and low efficiency for retrieval process. To address these issues, we propose a novel method termed Cheaper Faster (CFMR), balances accuracy,...

10.1145/3581783.3612394 article EN 2023-10-26

Text-based video retrieval is a well-studied task aimed at retrieving relevant videos from large collection in response to given text query. Most existing TVR works assume that are already trimmed and fully the query thus ignoring most real-world scenarios untrimmed contain massive irrelevant content. Moreover, as users' queries only events rather than complete videos, it also more practical provide specific an list. In this paper, we introduce challenging but realistic called...

10.1145/3581783.3612349 article EN 2023-10-26

Temporal language grounding (TLG) is one of the most challenging cross-modal video understanding tasks, which aims at retrieving relevant segment from an untrimmed according to a natural sentence. The existing methods can be separated into two dominant types: 1) proposal-based and 2) proposal-free methods, where former conduct contextual interactions latter localizes timestamps flexibly. However, constant-scale candidates in limit localization precision bring extra computational costs. In...

10.1109/tnnls.2022.3211850 article EN IEEE Transactions on Neural Networks and Learning Systems 2022-11-03

In this article, we study the challenging cross-modal image retrieval task, Composed Query-Based Image Retrieval (CQBIR) , in which query is not a single text but composed query, i.e., reference image, and modification text. Compared with conventional image-text CQBIR more as it requires properly preserving modifying specific region according to multi-level semantic information learned from multi-modal query. Most recent works focus on extracting preserved modified compositing into unified...

10.1145/3639469 article EN ACM Transactions on Multimedia Computing Communications and Applications 2024-01-09

Multimodal Sentiment Analysis (MSA) aims at teaching computers or robotics to understand human sentiment with diverse multimodal signals, including audio, vision, and text. Current MSA approaches primarily concentrate on devising fusion strategies for signals trying learn better joint representations. However, employing directly is not appropriate since the psychological states are fuzzy can be categorized easily, which undermines effectiveness of existing methods. In this paper, we regard...

10.1109/tfuzz.2024.3405541 article EN IEEE Transactions on Fuzzy Systems 2024-01-01

Reproducible document standards, like R Markdown, facilitate the programmatic creation of documents whose content is itself programmatically generated. While alone may not be sufficient for a rendered since it does include prose (content generated by an author to provide context, narrative, etc.) generation can substantial efficiencies structuring and constructing documents. This paper explores reproducible distinguishing components that created computational means from those requiring...

10.18637/jss.v103.i08 article EN cc-by Journal of Statistical Software 2022-01-01

Video Paragraph Grounding aims at retrieving multiple relevant moments from an untrimmed video with a given natural language paragraph query. However, the complex query brings more challenges to multimodal fusion and context modeling, which limited performance of existing VPG methods. To this end, we propose novel framework for in paper, termed Graph-based Transformer Language Reconstruction (GTLR). It consists three components: (1) Multimodal Graph Encoder conducting graph reasoning...

10.1109/icme52920.2022.9859847 article EN 2022 IEEE International Conference on Multimedia and Expo (ICME) 2022-07-18

Compositional temporal grounding (CTG) aims to localize the most relevant segment from an untrimmed video based on a given natural language sentence, and test samples for this task contain novel components not seen in training. However, existing CTG methods suffer two shortcomings: (1) Most adopt transformers model global information only, thus failing balance long-range perception regional representation of sequences; (2) Due lack aligning videos sentences at fine-grained level, model's...

10.1145/3652583.3658113 article EN 2024-05-30

10.1109/ipdpsw63119.2024.00184 article EN 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) 2024-05-27
Coming Soon ...