Xuanya Li

ORCID: 0000-0002-2227-207X
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Image and Video Retrieval Techniques
  • 3D Shape Modeling and Analysis
  • Multimodal Machine Learning Applications
  • Domain Adaptation and Few-Shot Learning
  • Human Pose and Action Recognition
  • Advanced Neural Network Applications
  • Complex Network Analysis Techniques
  • Medical Image Segmentation Techniques
  • AI in cancer detection
  • Robotics and Sensor-Based Localization
  • 3D Surveying and Cultural Heritage
  • Image Retrieval and Classification Techniques
  • Radiomics and Machine Learning in Medical Imaging
  • Topic Modeling
  • Sentiment Analysis and Opinion Mining
  • Image Processing and 3D Reconstruction
  • COVID-19 diagnosis using AI
  • Imbalanced Data Classification Techniques
  • Advanced Graph Neural Networks
  • Opinion Dynamics and Social Influence
  • Medical Imaging and Analysis
  • Misinformation and Its Impacts
  • Video Analysis and Summarization
  • Digital Imaging for Blood Diseases
  • Image and Video Quality Assessment

Baidu (China)
2020-2024

China Electronics Technology Group Corporation
2023

Hangzhou Dianzi University
2021

Chinese Academy of Sciences
2014

Institute of Information Engineering
2014

Image-text retrieval is a fundamental and vital task in multi-media has received growing attention since it connects heterogeneous data. Previous methods that perform well on image-text mainly focus the interaction between image regions text words. But these approaches lack joint exploration of characteristics contexts words, which will cause semantic confusion similar objects loss contextual understanding. To address issues, dual-level representation enhancement network (DREN) proposed to...

10.1109/tcsvt.2022.3182426 article EN IEEE Transactions on Circuits and Systems for Video Technology 2022-06-13

Image and text matching plays a crucial role in bridging the cross-modal gap between vision language, has achieved great progress due to deep learning. However, existing methods still suffer from long-tail problem, where only small proportion contains highly frequent semantics long tail is constructed by rare semantics. In this paper, we propose novel Dual-path Rare Content Enhancement Network (DRCE) tackle issue. Specifically, Cross-modal Representation (CRE) Association (CAE) are proposed...

10.1109/tcsvt.2023.3254530 article EN IEEE Transactions on Circuits and Systems for Video Technology 2023-03-09

Image and sentence matching is a critical task to bridge the visual textual discrepancy due heterogeneous modalities. Great progress has been made by exploring coarse-grained relationships between images sentences or fine-grained regions words. However, how fully excavate exploit corresponding relations these two modalities still challenging. In this work, we propose novel Multi-scale Fine-grained Alignments Network (MFA), which can effectively explore multi-scale visual-textual...

10.1109/tmm.2021.3128744 article EN IEEE Transactions on Multimedia 2021-11-17

Multimodal emotion recognition in conversations (ERC) aims to identify the emotional state of constituent utterances expressed by multiple speakers dialogue from multimodal data. Existing ERC approaches focus on modeling global context and neglect mine characteristic information corresponding same speaker. Additionally, different modalities exhibits commonality diversity for expression. The are compensated each other but not effectively exploited previous works. To tackle these issues, we...

10.1109/tcsvt.2023.3273577 article EN IEEE Transactions on Circuits and Systems for Video Technology 2023-05-08

Optical coherence tomography angiography (OCTA) has been widely used in ophthalmology recent years due to its non-invasive and high resolution. In OCTA images, two biomarkers are extremely important for clinical diagnosis, <i>i.e</i>., foveal avascular zone (FAZ) retinal vessel (RV), RV an implicit constraint on FAZ position. previous studies, the segmentation of is naturally separated, which undoubtedly leads omission such constraints between them. this paper, we propose a joint framework...

10.1109/tim.2022.3193188 article EN IEEE Transactions on Instrumentation and Measurement 2022-01-01

10.1016/j.ipm.2020.102432 article EN Information Processing & Management 2020-11-23

Under the heavy management on increasing 3D models, topic of image-based model retrieval which organizes unlabeled models based abundant knowledge learned from labeled 2D images has drawn attention. However, prior methods are limited in aligning semantically at corresponding categories two domains due to lack label information domain. To this end, paper proposes an improved semantic representation learning by multiple clustering approach, improves reliability pseudo labels for so as achieve...

10.4018/ijswis.297033 article EN International Journal on Semantic Web and Information Systems 2022-02-16

Image-text retrieval, as a fundamental task in the cross-modal field, aims to explore relationship between visual and textual modalities. Recent methods address this only by learning conceptual syntactical correspondences fragments, but these inevitably contain noise without considering external knowledge. To solve issue, we propose novel <bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">C</b> ommonsense-Guided...

10.1109/tmm.2023.3289753 article EN IEEE Transactions on Multimedia 2023-06-30

2D image-based 3D shape retrieval (2D-to-3D) investigates the problem of matching relevant shapes from gallery dataset when given a query image. Recently, adversarial training and environmental style transfer learning have been successful applied to this task achieved state-of-the-art performance. However, there still exist two problems. First, previous works only concentrate on connection between label representation, where unique visual characteristics each instance are paid less...

10.1145/3394171.3413631 article EN Proceedings of the 30th ACM International Conference on Multimedia 2020-10-12

Interfacial architecture of the nanofillers is critical factor to achieve desirable dielectric properties in polymer nanocomposites. However, a basic understanding role interfacial polarization and crystallization on energy storage very seldom. Herein, we synthesized core–shell aromatic polythiourea@BaTiO3 nanoparticles (ArPTU@BT NPs) as prepare ferroelectric Remarkably, direct detections morphology, polarization, mechanism nanocomposites were revealed by combination atomic force microscopy...

10.1021/acsaem.0c02396 article EN ACS Applied Energy Materials 2021-01-08

Image–text retrieval is a vital task in computer vision and has received growing attention, since it connects cross-modality data. It comes with the critical challenges of learning unified representations eliminating large gap between visual textual domains. Over past few decades, although many works have made significant progress image–text retrieval, they are still confronted challenge incomplete text descriptions images, i.e., how to fully learn correlations relevant region–word pairs...

10.1145/3572844 article EN ACM Transactions on Multimedia Computing Communications and Applications 2022-11-23

Tax evasion usually refers to the false declaration of taxpayers reduce their tax obligations; this type behavior leads loss taxes and damage fair principle taxation. detection plays a crucial role in reducing revenue loss. Currently, efficient auditing methods mainly include traditional data-mining-oriented methods, which cannot be well adapted increasingly complicated transaction relationships between taxpayers. Driven by requirement, recent studies have been conducted establishing network...

10.1109/compsac48688.2020.00039 article EN 2022 IEEE 46th Annual Computers, Software, and Applications Conference (COMPSAC) 2020-07-01

Recent advances in 3-D sensors and modeling have led to the availability of massive amounts data. It is too onerous time consuming manually label a plentiful objects real applications. In this article, we address issue by transferring knowledge from existing labeled data (e.g., annotated 2-D images or objects) unlabeled objects. Specifically, propose domain-adversarial guided siamese network (DAGSN) for unsupervised cross-domain object retrieval (CD3DOR). mainly composed three key modules:...

10.1109/tcyb.2021.3139927 article EN IEEE Transactions on Cybernetics 2022-01-25

Existing research on the 2D image-based 3D model retrieval task focuses learning transferable representations directly to narrow domain discrepancy. However, it is not easy achieve in practice due significant variations across two domains. In addition, some methods design a discriminator distinguish feature arising from source or target domains for learning, which will lead an unexpected deterioration of discriminability. To settle these problems, we propose jointly and discriminative...

10.1109/tcsvt.2022.3168967 article EN IEEE Transactions on Circuits and Systems for Video Technology 2022-04-20

Recent advances in 3D modeling software and capture devices contribute to the availability of large-scale objects. Together with prevalence deep neural networks (DNNs), DNN-based object retrieval systems are widely applied, especially by inputting 2D images retrieve Although DNNs have shown vulnerable adversarial attacks classification, vulnerability system remains under-explored. In this paper, we formulate problem attacking against feature extractors image-based system. Specifically,...

10.1109/tmm.2022.3186740 article EN IEEE Transactions on Multimedia 2022-06-27

Unsupervised 2D image-based 3D shape retrieval aims to match the similar unlabeled shapes when given a labeled sample. Although lot of methods have made certain degree progress, performance this task is still restricted due lack target labels resulting in tremendous domain gap. In paper, we aim explore discriminative representation and facilitate procedure adaptation by taking full advantage multi-view information. To achieve above goals, propose an effective self-supervised auxiliary...

10.1109/tcsvt.2022.3191761 article EN IEEE Transactions on Circuits and Systems for Video Technology 2022-07-18
Coming Soon ...