Yanxiong Li

ORCID: 0000-0003-4362-1125
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Speech and Audio Processing
  • Music and Audio Processing
  • Speech Recognition and Synthesis
  • Diverse Musicological Studies
  • Music Technology and Sound Studies
  • Anomaly Detection Techniques and Applications
  • High voltage insulation and dielectric phenomena
  • Image and Signal Denoising Methods
  • Digital Media Forensic Detection
  • Human Pose and Action Recognition
  • Advanced Algorithms and Applications
  • Power Transformer Diagnostics and Insulation
  • Biometric Identification and Security
  • Natural Language Processing Techniques
  • Stochastic processes and financial applications
  • Video Analysis and Summarization
  • Advanced Neural Network Applications
  • Advanced Computational Techniques and Applications
  • Image Enhancement Techniques
  • Capital Investment and Risk Analysis
  • Thermal Analysis in Power Transmission
  • Advanced Steganography and Watermarking Techniques
  • Acoustic Wave Phenomena Research
  • Advanced Image Processing Techniques
  • Domain Adaptation and Few-Shot Learning

South China University of Technology
2016-2025

Wuchang University of Technology
2015

Stockholm University
2015

China University of Technology
2014

Xi'an Jiaotong University
2012-2013

Guangzhou Education Bureau
2013

City University of Hong Kong
2009

Surveillance systems based on image analysis can automatically detect road accidents to ensure a quick intervention by rescue teams. However, in some situations, the visual information is insufficiently reliable, whereas use of sound detector greatly improve overall reliability surveillance system. In this paper, we focus detecting two classes anomalous sounds for audio roads, i.e., tire skidding and car crash, whose occurrences are an evidently acoustic indication or disruptions. proposed...

10.1109/access.2018.2872931 article EN cc-by-nc-nd IEEE Access 2018-01-01

Determining whether given video frames contain violent content is a basic problem in violence detection. Visual and audio information are useful for detecting included video, usually complementary; however, detection studies focusing on fusing visual relatively rare. Therefore, we explored methods information. We proposed neural network containing three modules multimodal information: 1) attention module utilizing weighted features to generate effective based the mutual guidance between...

10.1109/icassp39728.2021.9413686 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021-05-13

Convolutional recurrent neural networks (CRNNs) have achieved state-of-the-art performance for sound event detection (SED). In this paper, we propose to use a dilated CRNN, namely CRNN with convolutional kernel, as the classifier task of SED. We investigate effectiveness dilation operations which provide expanded receptive fields capture long temporal context without increasing amount CRNN's parameters. Compared baseline obtains maximum increase 1.9%, 6.3% and 2.5% at F1 score decrease 1.7%,...

10.1109/icassp40776.2020.9054433 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020-04-09

State-of-the-art sound event detection (SED) methods usually employ a series of convolutional neural networks (CNNs) to extract useful features from the input audio signal, and then recurrent (RNNs) model longer temporal context in extracted features. The number channels CNNs size weight matrices RNNs have direct effect on total amount parameters SED method, which is couple millions. Additionally, long sequences that are used as an method along with employment RNN, introduce implications...

10.1109/ijcnn48605.2020.9207532 article EN 2022 International Joint Conference on Neural Networks (IJCNN) 2020-07-01

In the problem of Few-shot Class-incremental Audio Classification (FCAC), training samples per class in base session are required to be abundant. However, many scenarios, it is difficult collect abundant because data scarcity and high collection cost. this paper, we explore a new FCAC problem, namely Fully (FFCAC), which for all classes both incremental sessions few. Moreover, propose FFCAC method by adaptively improving model's stability seen plasticity unseen classes. The model consists an...

10.1109/taslpro.2025.3527147 article EN IEEE Transactions on Audio Speech and Language Processing 2025-01-01

10.1109/icassp49660.2025.10888051 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

10.1109/icassp49660.2025.10890683 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

Although acoustic scene classification has been received great attention from researchers in the field of audio signal processing, it is still a challenging and unsolved task to date. In this paper, we present our work for challenge Detection Classification Acoustic Scenes Events 2017, i.e., DCASE2017 challenge, using feature Deep Audio Feature (DAF) representation classifier Bidirectional Long Short Term Memory (BLSTM) network classification. We first use deep neural generate DAF Mel...

10.1109/icalip.2018.8455765 article EN 2018-07-01

Considerable attention has been paid to acquisition device recognition over the past decade in forensic community, especially digital image forensics. In contrast, clustering from speech recordings is a new problem that aims merge acquired by same into single cluster without having prior information about and training classifiers advance. this paper, we propose method for mobile phone using feature of deep representation spectral algorithm. The learned auto-encoder network representing...

10.1109/tifs.2017.2774505 article EN IEEE Transactions on Information Forensics and Security 2017-11-16

Recent efforts have been made on acoustic scene classification in the audio signal processing community. In contrast, few studies conducted clustering, which is a newly emerging problem. Acoustic clustering aims at merging recordings of same class into single cluster without using prior information and training classifiers. this study, we propose method for that jointly optimizes procedures feature learning iteration. proposed method, learned deep embedding extracted from convolutional...

10.1109/tmm.2019.2947199 article EN IEEE Transactions on Multimedia 2019-10-14

The influence of temperature on water treeing in polyethylene (PE) is investigated this paper. Low-density (LDPE) and peroxide cross-linked (XLPE) are chosen as the test materials. liquid needle electrode method used for tree development at 20, 40, 60 80 °C, a metallographic microscope to observe morphology. sizes initiation rate trees measured. results obtained from experiments indicate that significantly influenced by temperature. rates decrease first, then increase rises. also exhibit an...

10.1109/tdei.2013.6508757 article EN IEEE Transactions on Dielectrics and Electrical Insulation 2013-04-01

Existing methods for few-shot speaker identification (FSSI) obtain high accuracy, but their computational complexities and model sizes need to be reduced lightweight applications. In this work, we propose a FSSI method using prototypical network with the final goal implement on intelligent terminals limited resources, such as smart watches speakers. proposed network, an embedding module is designed perform feature grouping reducing memory requirement complexity, interaction enhancing...

10.1109/tmm.2023.3253301 article EN IEEE Transactions on Multimedia 2023-01-01

Biometric-based authentication can provide strong safety guarantee of user identity, but creates other concerns pertaining to template security. How prevent the templates from being abused, and how protect users' privacy well are important problems. This paper proposes an approach based on knowledge signatures solve these problems, which provides a cancelable template. Only non-invertible transformed version biometrics is stored in server, original data can't be obtained, so protected well....

10.1109/isecs.2008.100 article EN International Symposium on Electronic Commerce and Security 2008-01-01

Source recording device matching from two speech recordings is a new and important problem of digital media forensics. It aims to answer the question that whether or not are recorded by same device. In this study we propose source cell phone scheme. The Gaussian supervector (GSV) based on Mel-frequency cepstral coefficients (MFCCs) extracted sparse represented with respect dictionary learned K-SVD algorithm. reduced-dimensional representation coefficient utilized characterize intrinsic...

10.1109/icassp.2016.7472043 article EN 2016-03-01

Speaker clustering is a task to merge speech segments uttered by the same speaker into single cluster, which an effective tool for alleviating management of massive amount audio documents. In this paper, we present work co-optimizing two main steps clustering, namely, feature learning and cluster estimation. our method, deep representation learned convolutional autoencoder network (DCAN), while estimation realized softmax layer that combined with DCAN. We devise integrated loss function...

10.1109/tmm.2020.3024667 article EN IEEE Transactions on Multimedia 2020-09-21

Label Smoothing Regularization (LSR) is a widely used tool to generalize classification models by replacing the one-hot ground truth with smoothed labels. Recent research on LSR has increasingly focused correlation between and Knowledge Distillation (KD), which transfers knowledge from teacher model lightweight student penalizing their output’s Kullback–Leibler-divergence. Based this observation, Teacher-free (Tf-KD) method was proposed in previous work. Instead of real model, handcrafted...

10.3390/app11104699 article EN cc-by Applied Sciences 2021-05-20

Few-shot Class-incremental Audio Classification (FCAC) is a task to continuously identify incremental classes with only few training samples after the model on base abundant samples. The key solving FCAC problem ensure that has good stability (without forgetting classes) and strong plasticity overfitting classes). In this paper, we propose method which able adaptively mitigate model's of classes. Our consists an embedding extractor expandable classifier. former backbone residual network...

10.1109/taslp.2024.3385287 article EN IEEE/ACM Transactions on Audio Speech and Language Processing 2024-01-01

This work is an improved system that we submitted to task 1 of DCASE2023 challenge. We propose a method low-complexity acoustic scene classification by parallel attention-convolution network which consists four modules, including pre-processing, fusion, global and local contextual information extraction. The proposed computationally efficient capture from each audio clip. In addition, integrate other techniques into our method, such as knowledge distillation, data augmentation, adaptive...

10.21437/interspeech.2024-591 article EN Interspeech 2022 2024-09-01
Coming Soon ...