Xun Yang

ORCID: 0000-0003-0201-1638
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Multimodal Machine Learning Applications
  • Human Pose and Action Recognition
  • Advanced Image and Video Retrieval Techniques
  • Domain Adaptation and Few-Shot Learning
  • Video Analysis and Summarization
  • Video Surveillance and Tracking Methods
  • Wireless Networks and Protocols
  • Anomaly Detection Techniques and Applications
  • Recommender Systems and Techniques
  • Generative Adversarial Networks and Image Synthesis
  • Advanced Wireless Network Optimization
  • Wireless Communication Networks Research
  • Gait Recognition and Analysis
  • Image Retrieval and Classification Techniques
  • Advanced MIMO Systems Optimization
  • Face recognition and analysis
  • Indoor and Outdoor Localization Technologies
  • 3D Shape Modeling and Analysis
  • IPv6, Mobility, Handover, Networks, Security
  • Topic Modeling
  • Consumer Market Behavior and Pricing
  • Intraocular Surgery and Lenses
  • Intelligent Tutoring Systems and Adaptive Learning
  • Stochastic processes and financial applications
  • Traumatic Ocular and Foreign Body Injuries

University of Science and Technology of China
2022-2025

Zhejiang University of Science and Technology
2023-2024

Zhejiang Cancer Hospital
2022-2024

Chinese Academy of Sciences
2024

Shanghai Jiao Tong University
2024

Zhengzhou University
2022-2024

Soochow University
2019-2023

Chongqing Dazu District People's Hospital
2023

Chongqing University
2023

University of Chinese Academy of Sciences
2022-2023

Despite the promising progress made in recent years, person re-identification remains a challenging task due to complex variations human appearances from different camera views. This paper presents logistic discriminant metric learning method for this problem. Different with most existing algorithms, it exploits both original data and auxiliary during training, which is motivated by new machine paradigm-learning using privileged information. Such information kind of knowledge, only available...

10.1109/tip.2017.2765836 article EN IEEE Transactions on Image Processing 2017-10-23

This paper attacks the challenging problem of video retrieval by text. In such a paradigm, an end user searches for unlabeled videos ad-hoc queries described exclusively in form natural-language sentence, with no visual example provided. Given as sequences frames and words, effective sequence-to-sequence cross-modal matching is crucial. To that end, two modalities need to be first encoded into real-valued vectors then projected common space. this we achieve proposing dual deep encoding...

10.1109/tpami.2021.3059295 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2021-01-01

We tackle the task of video moment retrieval (VMR), which aims to localize a specific in according textual query. Existing methods primarily model matching relationship between query and by complex cross-modal interactions. Despite their effectiveness, current models mostly exploit dataset biases while ignoring content, thus leading poor generalizability. argue that issue is caused hidden confounder VMR, i.e., temporal location moments, spuriously correlates input prediction. How design...

10.1145/3404835.3462823 article EN Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval 2021-07-11

The task of video moment retrieval (VMR) is to retrieve the specific from an untrimmed video, according a textual query. It challenging that requires effective modeling complex cross-modal matching relationship. Recent efforts primarily model interactions by hand-crafted network architectures. Despite their effectiveness, they rely heavily on expert experience select architectures and have numerous hyperparameters need be carefully tuned, which significantly limit applications in real-world...

10.1109/tip.2022.3140611 article EN IEEE Transactions on Image Processing 2022-01-01

This paper studies an intelligent reflecting surface (IRS)-assisted integrated sensing and communication (ISAC) system, in which one IRS with a uniform linear array (ULA) is deployed to not only assist the wireless from multi-antenna base station (BS) single-antenna user (CU), but also create virtual line-of-sight (LoS) links for potential targets at areas LoS blocked. We consider that BS transmits combined information signals ISAC. Under this setup, we jointly optimize transmit beamforming...

10.1109/wcnc51071.2022.9771801 article EN 2022 IEEE Wireless Communications and Networking Conference (WCNC) 2022-04-10

Despite the promising progress made in recent years, person reidentification (re-ID) remains a challenging task due to complex variations human appearances from different camera views. This paper proposes tackle this by jointly learning feature representation and distance metric an end-to-end manner. Existing deep learning-based re-ID methods usually encounter following two weaknesses: 1) most works based on pairwise or triplet constraints often suffer slow convergence poor local optima,...

10.1109/tnnls.2018.2861991 article EN IEEE Transactions on Neural Networks and Learning Systems 2018-08-24

Understanding the objects and relations between them is indispensable to fine-grained video content analysis, which widely studied in recent research works multimedia computer vision. However, existing are limited evaluating with either small datasets or indirect metrics, such as performance over images. The underlying reason that construction of a large-scale dataset dense annotation tricky costly. In this paper, we address several main issues annotating user-generated videos, propose an...

10.1145/3323873.3325056 article EN 2019-06-05

The rapid growth of user-generated videos on the Internet has intensified need for text-based video retrieval systems. Traditional methods mainly favor concept-based paradigm with simple queries, which are usually ineffective complex queries that carry far more semantics. Recently, embedding-based emerged as a popular approach. It aims to map and into shared embedding space where semantically-similar texts much closer each other. Despite its simplicity, it forgoes exploitation syntactic...

10.1145/3397271.3401151 article EN 2020-07-25

Despite the promising progress made in recent years, person re-identification (re-ID) remains a challenging task due to complex variations human appearances from different camera views. For this problem, large variety of algorithms have been developed fully supervised setting, requiring access amount labeled training data. However, main bottleneck for re-ID is limited availability samples. To address we propose self-trained subspace learning paradigm that effectively utilizes both and...

10.1145/3089249 article EN ACM Transactions on Multimedia Computing Communications and Applications 2017-06-28

Real-time bidding (RTB) is an important mechanism in online display advertising, where a proper bid for each page view plays essential role good marketing results. Budget constrained typical scenario RTB the advertisers hope to maximize total value of winning impressions under pre-set budget constraint. However, optimal strategy hard be derived due complexity and volatility auction environment. To address these challenges, this paper, we formulate as Markov Decision Process propose...

10.1145/3269206.3271748 preprint EN 2018-10-17

Saliency detection has recently received increasing research interest on using high-dimensional datasets beyond two-dimensional images. Despite the many available capturing devices and algorithms, there still exists a wide spectrum of challenges that need to be addressed achieve accurate saliency detection. Inspired by success light-field technique, in this article, we propose new computational scheme detect salient regions integrating multiple visual cues from First, prior maps are...

10.1145/3107956 article EN ACM Transactions on Multimedia Computing Communications and Applications 2017-07-27

Understanding the mix-and-match relationships of fashion items receives increasing attention in industry. Existing methods have primarily utilized visual content to learn compatibility and performed matching a latent space. Despite their effectiveness, these work like black box cannot reveal reasons that two match well. The rich attributes associated with items, e.g.,off-shoulder dress skinny jean, which describe semantics human-interpretable way, largely been ignored.

10.1145/3331184.3331242 article EN 2019-07-18

Identifying mix-and-match relationships between fashion items is an urgent task in a e-commerce recommender system. It will significantly enhance user experience and satisfaction. However, due to the challenges of inferring rich yet complicated set compatibility patterns large corpus items, this still underexplored. Inspired by recent advances multirelational knowledge representation learning deep neural networks, paper proposes novel Translation-based Neural Fashion Compatibility Modeling...

10.1609/aaai.v33i01.3301403 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2019-07-17

In the next generation wireless networks, more applications will emerge, covering virtual reality movies, augmented reality, holographic three-dimensional telepresence, haptic telemedicine and so on, which require provisioning of high bandwidth efficiency low latency services. order to better support aforementioned services, novel distributed channel access (DCA) schemes are necessary. Therefore, we propose a new MAC protocol, QMIX-advanced Listen-Before-Talk (QLBT), based on cutting-edge...

10.1109/jsac.2022.3143251 article EN IEEE Journal on Selected Areas in Communications 2022-01-14

This article tackles the challenging yet important task of Visual Grounding (VG), which aims to localize a visual region in given image referred by natural language query. Existing efforts on VG are twofold: (1) two-stage methods first extract proposals and then rank them according their similarities with referring expression, usually leads suboptimal results due quality proposals; (2) one-stage predict all possible coordinates target online leveraging modern object detection architectures,...

10.1145/3587251 article EN ACM Transactions on Multimedia Computing Communications and Applications 2023-03-09

Effectively summarizing and re-expressing video content by natural languages in a more human-like fashion is one of the key topics field multimedia understanding. Despite good progress made recent years, existing efforts usually overlooked emotions user-generated videos, thus making generated sentence bit boring soulless. To fill research gap, this paper presents novel emotional captioning framework which we design Vision-based Emotion Interpretation Network to effectively capture conveyed...

10.1109/tip.2024.3359045 article EN IEEE Transactions on Image Processing 2024-01-01

Abstract Sensitive, flexible, and low false alarm rate X‐ray detector is crucial for medical diagnosis, industrial inspection, scientific research. However, most semiconductors detectors are susceptible to interference from ambient light, their high thickness hinders application in wearable electronics. Herein, a flexible visible‐blind ultraviolet‐blind based on Indium‐doped Gallium oxide (Ga 2 O 3 :In) single microwire prepared. Joint experiment−theory characterizations reveal that the Ga...

10.1002/adma.202404656 article EN Advanced Materials 2024-08-19

Real-Time Bidding (RTB) is an important paradigm in display advertising, where advertisers utilize extended information and algorithms served by Demand Side Platforms (DSPs) to improve advertising performance. A common problem for DSPs help gain as much value possible with budget constraints. However, would routinely add certain key performance indicator (KPI) constraints that the campaign must meet due practical reasons. In this paper, we study case aim maximize quantity of conversions, set...

10.1145/3292500.3330681 preprint EN 2019-07-25

Video Visual Relation Detection (VidVRD) aims to semantically describe the dynamic interactions across visual concepts localized in a video form of subject, predicate, object. It can help mitigate semantic gap between vision and language understanding, thus receiving increasing attention multimedia communities. Existing efforts primarily leverage multimodal/spatio-temporal feature fusion augment representation object trajectories as well their formulate prediction predicates multi-class...

10.1145/3474085.3475540 article EN Proceedings of the 30th ACM International Conference on Multimedia 2021-10-17

Fashion trend forecasting is a crucial task for both academia andindustry. Although some efforts have been devoted to tackling this challenging task, they only studied limited fashion elements with highly seasonal or simple patterns, which could hardly reveal thereal trends. Towards insightful forecasting,this work focuses on investigating fine-grained element trends specific user groups. We first contribute large-scale dataset (FIT) collected from Instagram extracted time series records and...

10.1145/3372278.3390677 article EN 2020-06-02

Cognitive diagnosis is a fundamental yet critical research task in the field of intelligent education, which aims to discover proficiency level different students on specific knowledge concepts. Despite effectiveness existing efforts, previous methods always considered mastery whole students, so they still suffer from Long Tail Effect. A large number who have sparse interaction records are usually wrongly diagnosed during inference. To relieve situation, we proposed Self-supervised Diagnosis...

10.1609/aaai.v37i1.25082 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2023-06-26

Crowd density estimation has gained significant research interest owing to its potential in various industries and social applications. Therefore, this paper proposes a multistyle joint-perception network based on knowledge distillation-trained student (MJPNet-S*) for drone-based red–green–blue, thermal/depth (RGB-T/D) crowd tasks. To provide superior accuracy efficiency, novel trimodal working module effectively combines the modalities facilitate comprehensive extraction utilization. A...

10.1109/jiot.2024.3369642 article EN IEEE Internet of Things Journal 2024-02-26
Coming Soon ...