NFDI4DS | UHH-SEMS - Publication Details

Yanxiong Li

ORCID: 0000-0003-4362-1125

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5070863631

Research Areas

Speech and Audio Processing
Music and Audio Processing
Speech Recognition and Synthesis
Diverse Musicological Studies
Music Technology and Sound Studies
Anomaly Detection Techniques and Applications
High voltage insulation and dielectric phenomena
Image and Signal Denoising Methods
Digital Media Forensic Detection
Human Pose and Action Recognition
Advanced Algorithms and Applications
Power Transformer Diagnostics and Insulation
Biometric Identification and Security
Natural Language Processing Techniques
Stochastic processes and financial applications
Video Analysis and Summarization
Advanced Neural Network Applications
Advanced Computational Techniques and Applications
Image Enhancement Techniques
Capital Investment and Risk Analysis
Thermal Analysis in Power Transmission
Advanced Steganography and Watermarking Techniques
Acoustic Wave Phenomena Research
Advanced Image Processing Techniques
Domain Adaptation and Few-Shot Learning

South China University of Technology
2016-2025

Wuchang University of Technology
2015

Stockholm University
2015

China University of Technology
2014

Xi'an Jiaotong University
2012-2013

Guangzhou Education Bureau
2013

City University of Hong Kong
2009

Anomalous Sound Detection Using Deep Audio Representation and a BLSTM Network for Audio Surveillance of Roads

OPENALEX - Publications

Yanxiong Li Xianku Li Yuhan Zhang Mingle Liu Wucheng Wang

Surveillance systems based on image analysis can automatically detect road accidents to ensure a quick intervention by rescue teams. However, in some situations, the visual information is insufficiently reliable, whereas use of sound detector greatly improve overall reliability surveillance system. In this paper, we focus detecting two classes anomalous sounds for audio roads, i.e., tire skidding and car crash, whose occurrences are an evidently acoustic indication or disruptions. proposed...

10.1109/access.2018.2872931 article EN cc-by-nc-nd IEEE Access 2018-01-01

Violence Detection in Videos Based on Fusing Visual and Audio Information

OPENALEX - Publications

Wenfeng Pang Qianhua He Yongjian Hu Yanxiong Li

Determining whether given video frames contain violent content is a basic problem in violence detection. Visual and audio information are useful for detecting included video, usually complementary; however, detection studies focusing on fusing visual relatively rare. Therefore, we explored methods information. We proposed neural network containing three modules multimodal information: 1) attention module utilizing weighted features to generate effective based the mutual guidance between...

10.1109/icassp39728.2021.9413686 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021-05-13

Sound Event Detection Via Dilated Convolutional Recurrent Neural Networks

OPENALEX - Publications

Yanxiong Li Mingle Liu Konstantinos Drossos Tuomas Virtanen

Convolutional recurrent neural networks (CRNNs) have achieved state-of-the-art performance for sound event detection (SED). In this paper, we propose to use a dilated CRNN, namely CRNN with convolutional kernel, as the classifier task of SED. We investigate effectiveness dilation operations which provide expanded receptive fields capture long temporal context without increasing amount CRNN's parameters. Compared baseline obtains maximum increase 1.9%, 6.3% and 2.5% at F1 score decrease 1.7%,...

10.1109/icassp40776.2020.9054433 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020-04-09

Sound Event Detection with Depthwise Separable and Dilated Convolutions

OPENALEX - Publications

Konstantinos Drossos Stylianos Ioannis Mimilakis Shayan Gharib Yanxiong Li Tuomas Virtanen

State-of-the-art sound event detection (SED) methods usually employ a series of convolutional neural networks (CNNs) to extract useful features from the input audio signal, and then recurrent (RNNs) model longer temporal context in extracted features. The number channels CNNs size weight matrices RNNs have direct effect on total amount parameters SED method, which is couple millions. Additionally, long sequences that are used as an method along with employment RNN, introduce implications...

10.1109/ijcnn48605.2020.9207532 article EN 2022 International Joint Conference on Neural Networks (IJCNN) 2020-07-01

Fully Few-shot Class-incremental Audio Classification with Adaptive Improvement of Stability and Plasticity

OPENALEX - Publications

Yongjie Si Yanxiong Li Jiaxin Tan Guoqing Chen Qianqian Li and 1 more

In the problem of Few-shot Class-incremental Audio Classification (FCAC), training samples per class in base session are required to be abundant. However, many scenarios, it is difficult collect abundant because data scarcity and high collection cost. this paper, we explore a new FCAC problem, namely Fully (FFCAC), which for all classes both incremental sessions few. Moreover, propose FFCAC method by adaptively improving model's stability seen plasticity unseen classes. The model consists an...

10.1109/taslpro.2025.3527147 article EN IEEE Transactions on Audio Speech and Language Processing 2025-01-01

Cross-Domain Few-Shot Open-Set Keyword Spotting Using Keyword Adaptation and Prototype Reprojection

OPENALEX - Publications

Mingru Yang Qianhua He Jinxin Huang Yongqiang Chen Zunxian Liu and 1 more

10.1109/icassp49660.2025.10888051 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

Deep Enhancement Spotting Network for Low-complexity Keyword Spotting in Noisy Environments

OPENALEX - Publications

Yongqiang Chen Qianhua He Yanxiong Li Zunxian Liu Mingru Yang and 1 more

10.1109/icassp49660.2025.10890683 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

Low-complexity speaker embedding module with feature segmentation, transformation and reconstruction for few-shot speaker identification

OPENALEX - Publications

Yanxiong Li Qisheng Huang Xiaofen Xing Xiangmin Xu

10.1016/j.eswa.2025.127542 article EN Expert Systems with Applications 2025-04-01

Predicting skeleton trajectories using a Skeleton-Transformer for video anomaly detection

OPENALEX - Publications

Wenfeng Pang Qianhua He Yanxiong Li

10.1007/s00530-022-00915-9 article EN Multimedia Systems 2022-03-30

Acoustic Scene Classification Using Deep Audio Feature and BLSTM Network

OPENALEX - Publications

Yanxiong Li Xianku Li Yuhan Zhang Wucheng Wang Mingle Liu and 1 more

Although acoustic scene classification has been received great attention from researchers in the field of audio signal processing, it is still a challenging and unsolved task to date. In this paper, we present our work for challenge Detection Classification Acoustic Scenes Events 2017, i.e., DCASE2017 challenge, using feature Deep Audio Feature (DAF) representation classifier Bidirectional Long Short Term Memory (BLSTM) network classification. We first use deep neural generate DAF Mel...

10.1109/icalip.2018.8455765 article EN 2018-07-01

Mobile Phone Clustering From Speech Recordings Using Deep Representation and Spectral Clustering

OPENALEX - Publications

Yanxiong Li Zhang Xue Xianku Li Yuhan Zhang Jichen Yang and 1 more

Considerable attention has been paid to acquisition device recognition over the past decade in forensic community, especially digital image forensics. In contrast, clustering from speech recordings is a new problem that aims merge acquired by same into single cluster without having prior information about and training classifiers advance. this paper, we propose method for mobile phone using feature of deep representation spectral algorithm. The learned auto-encoder network representing...

10.1109/tifs.2017.2774505 article EN IEEE Transactions on Information Forensics and Security 2017-11-16

Acoustic Scene Clustering Using Joint Optimization of Deep Embedding Learning and Clustering Iteration

OPENALEX - Publications

Yanxiong Li Mingle Liu Wucheng Wang Yuhan Zhang Qianhua He

Recent efforts have been made on acoustic scene classification in the audio signal processing community. In contrast, few studies conducted clustering, which is a newly emerging problem. Acoustic clustering aims at merging recordings of same class into single cluster without using prior information and training classifiers. this study, we propose method for that jointly optimizes procedures feature learning iteration. proposed method, learned deep embedding extracted from convolutional...

10.1109/tmm.2019.2947199 article EN IEEE Transactions on Multimedia 2019-10-14

Few-shot class-incremental audio classification via discriminative prototype learning

OPENALEX - Publications

Wei Xie Yanxiong Li Qianhua He Wenchang Cao

10.1016/j.eswa.2023.120044 article EN Expert Systems with Applications 2023-04-05

The influence of temperature on water treeing in polyethylene

OPENALEX - Publications

Jinfeng Wang Xiaoquan Zheng Yanxiong Li Jiang Wu

The influence of temperature on water treeing in polyethylene (PE) is investigated this paper. Low-density (LDPE) and peroxide cross-linked (XLPE) are chosen as the test materials. liquid needle electrode method used for tree development at 20, 40, 60 80 °C, a metallographic microscope to observe morphology. sizes initiation rate trees measured. results obtained from experiments indicate that significantly influenced by temperature. rates decrease first, then increase rises. also exhibit an...

10.1109/tdei.2013.6508757 article EN IEEE Transactions on Dielectrics and Electrical Insulation 2013-04-01

Using multi-stream hierarchical deep neural network to extract deep audio feature for acoustic event detection

OPENALEX - Publications

Yanxiong Li Zhang Xue Hai Jin Xianku Li Qin Wang and 2 more

10.1007/s11042-016-4332-z article EN Multimedia Tools and Applications 2017-01-06

Few-Shot Speaker Identification Using Lightweight Prototypical Network With Feature Grouping and Interaction

OPENALEX - Publications

Yanxiong Li Hao Chen Wenchang Cao Qisheng Huang Qianhua He

Existing methods for few-shot speaker identification (FSSI) obtain high accuracy, but their computational complexities and model sizes need to be reduced lightweight applications. In this work, we propose a FSSI method using prototypical network with the final goal implement on intelligent terminals limited resources, such as smart watches speakers. proposed network, an embedding module is designed perform feature grouping reducing memory requirement complexity, interaction enhancing...

10.1109/tmm.2023.3253301 article EN IEEE Transactions on Multimedia 2023-01-01

Deep mutual attention network for acoustic scene classification

OPENALEX - Publications

Wei Xie Qianhua He Zitong Yu Yanxiong Li

10.1016/j.dsp.2022.103450 article EN Digital Signal Processing 2022-01-29

Speaker verification using attentive multi-scale convolutional recurrent network

OPENALEX - Publications

Yanxiong Li Zhongjie Jiang Wenchang Cao Qisheng Huang

10.1016/j.asoc.2022.109291 article EN Applied Soft Computing 2022-07-11

Cancelable Voiceprint Templates Based on Knowledge Signatures

OPENALEX - Publications

Wenhua Xu Qianhua He Yanxiong Li Tao Li

Biometric-based authentication can provide strong safety guarantee of user identity, but creates other concerns pertaining to template security. How prevent the templates from being abused, and how protect users' privacy well are important problems. This paper proposes an approach based on knowledge signatures solve these problems, which provides a cancelable template. Only non-invertible transformed version biometrics is stored in server, original data can't be obtained, so protected well....

10.1109/isecs.2008.100 article EN International Symposium on Electronic Commerce and Security 2008-01-01

Source cell phone matching from speech recordings by sparse representation and KISS metric

OPENALEX - Publications

Ling Zou Qianhua He Jichen Yang Yanxiong Li

Source recording device matching from two speech recordings is a new and important problem of digital media forensics. It aims to answer the question that whether or not are recorded by same device. In this study we propose source cell phone scheme. The Gaussian supervector (GSV) based on Mel-frequency cepstral coefficients (MFCCs) extracted sparse represented with respect dictionary learned K-SVD algorithm. reduced-dimensional representation coefficient utilized characterize intrinsic...

10.1109/icassp.2016.7472043 article EN 2016-03-01

Speaker Clustering by Co-Optimizing Deep Representation Learning and Cluster Estimation

OPENALEX - Publications

Yanxiong Li Wucheng Wang Mingle Liu Zhongjie Jiang Qianhua He

Speaker clustering is a task to merge speech segments uttered by the same speaker into single cluster, which an effective tool for alleviating management of massive amount audio documents. In this paper, we present work co-optimizing two main steps clustering, namely, feature learning and cluster estimation. our method, deep representation learned convolutional autoencoder network (DCAN), while estimation realized softmax layer that combined with DCAN. We devise integrated loss function...

10.1109/tmm.2020.3024667 article EN IEEE Transactions on Multimedia 2020-09-21

Characteristics-based effective applause detection for meeting speech

OPENALEX - Publications

Yanxiong Li Qianhua He Sam Kwong Tao Li Jichen Yang

10.1016/j.sigpro.2009.03.001 article EN Signal Processing 2009-03-11

Revisiting Label Smoothing Regularization with Knowledge Distillation

OPENALEX - Publications

Jiyue Wang Pei Zhang Qianhua He Yanxiong Li Yongjian Hu

Label Smoothing Regularization (LSR) is a widely used tool to generalize classification models by replacing the one-hot ground truth with smoothed labels. Recent research on LSR has increasingly focused correlation between and Knowledge Distillation (KD), which transfers knowledge from teacher model lightweight student penalizing their output’s Kullback–Leibler-divergence. Based this observation, Teacher-free (Tf-KD) method was proposed in previous work. Instead of real model, handcrafted...

10.3390/app11104699 article EN cc-by Applied Sciences 2021-05-20

Few-Shot Class-Incremental Audio Classification With Adaptive Mitigation of Forgetting and Overfitting

OPENALEX - Publications

Yanxiong Li Jialong Li Yongjie Si Jiaxin Tan Qianhua He

Few-shot Class-incremental Audio Classification (FCAC) is a task to continuously identify incremental classes with only few training samples after the model on base abundant samples. The key solving FCAC problem ensure that has good stability (without forgetting classes) and strong plasticity overfitting classes). In this paper, we propose method which able adaptively mitigate model's of classes. Our consists an embedding extractor expandable classifier. former backbone residual network...

10.1109/taslp.2024.3385287 article EN IEEE/ACM Transactions on Audio Speech and Language Processing 2024-01-01

Low-Complexity Acoustic Scene Classification Using Parallel Attention-Convolution Network

OPENALEX - Publications

Yanxiong Li Jiaxin Tan Guoqing Chen Jialong Li Yongjie Si and 1 more

This work is an improved system that we submitted to task 1 of DCASE2023 challenge. We propose a method low-complexity acoustic scene classification by parallel attention-convolution network which consists four modules, including pre-processing, fusion, global and local contextual information extraction. The proposed computationally efficient capture from each audio clip. In addition, integrate other techniques into our method, such as knowledge distillation, data augmentation, adaptive...

10.21437/interspeech.2024-591 article EN Interspeech 2022 2024-09-01

Coming Soon ...