- Speech and Audio Processing
- Music and Audio Processing
- Speech Recognition and Synthesis
- Diverse Musicological Studies
- Music Technology and Sound Studies
- Anomaly Detection Techniques and Applications
- High voltage insulation and dielectric phenomena
- Image and Signal Denoising Methods
- Digital Media Forensic Detection
- Human Pose and Action Recognition
- Advanced Algorithms and Applications
- Power Transformer Diagnostics and Insulation
- Biometric Identification and Security
- Natural Language Processing Techniques
- Stochastic processes and financial applications
- Video Analysis and Summarization
- Advanced Neural Network Applications
- Advanced Computational Techniques and Applications
- Image Enhancement Techniques
- Capital Investment and Risk Analysis
- Thermal Analysis in Power Transmission
- Advanced Steganography and Watermarking Techniques
- Acoustic Wave Phenomena Research
- Advanced Image Processing Techniques
- Domain Adaptation and Few-Shot Learning
South China University of Technology
2016-2025
Wuchang University of Technology
2015
Stockholm University
2015
China University of Technology
2014
Xi'an Jiaotong University
2012-2013
Guangzhou Education Bureau
2013
City University of Hong Kong
2009
Surveillance systems based on image analysis can automatically detect road accidents to ensure a quick intervention by rescue teams. However, in some situations, the visual information is insufficiently reliable, whereas use of sound detector greatly improve overall reliability surveillance system. In this paper, we focus detecting two classes anomalous sounds for audio roads, i.e., tire skidding and car crash, whose occurrences are an evidently acoustic indication or disruptions. proposed...
Determining whether given video frames contain violent content is a basic problem in violence detection. Visual and audio information are useful for detecting included video, usually complementary; however, detection studies focusing on fusing visual relatively rare. Therefore, we explored methods information. We proposed neural network containing three modules multimodal information: 1) attention module utilizing weighted features to generate effective based the mutual guidance between...
Convolutional recurrent neural networks (CRNNs) have achieved state-of-the-art performance for sound event detection (SED). In this paper, we propose to use a dilated CRNN, namely CRNN with convolutional kernel, as the classifier task of SED. We investigate effectiveness dilation operations which provide expanded receptive fields capture long temporal context without increasing amount CRNN's parameters. Compared baseline obtains maximum increase 1.9%, 6.3% and 2.5% at F1 score decrease 1.7%,...
State-of-the-art sound event detection (SED) methods usually employ a series of convolutional neural networks (CNNs) to extract useful features from the input audio signal, and then recurrent (RNNs) model longer temporal context in extracted features. The number channels CNNs size weight matrices RNNs have direct effect on total amount parameters SED method, which is couple millions. Additionally, long sequences that are used as an method along with employment RNN, introduce implications...
In the problem of Few-shot Class-incremental Audio Classification (FCAC), training samples per class in base session are required to be abundant. However, many scenarios, it is difficult collect abundant because data scarcity and high collection cost. this paper, we explore a new FCAC problem, namely Fully (FFCAC), which for all classes both incremental sessions few. Moreover, propose FFCAC method by adaptively improving model's stability seen plasticity unseen classes. The model consists an...
Although acoustic scene classification has been received great attention from researchers in the field of audio signal processing, it is still a challenging and unsolved task to date. In this paper, we present our work for challenge Detection Classification Acoustic Scenes Events 2017, i.e., DCASE2017 challenge, using feature Deep Audio Feature (DAF) representation classifier Bidirectional Long Short Term Memory (BLSTM) network classification. We first use deep neural generate DAF Mel...
Considerable attention has been paid to acquisition device recognition over the past decade in forensic community, especially digital image forensics. In contrast, clustering from speech recordings is a new problem that aims merge acquired by same into single cluster without having prior information about and training classifiers advance. this paper, we propose method for mobile phone using feature of deep representation spectral algorithm. The learned auto-encoder network representing...
Recent efforts have been made on acoustic scene classification in the audio signal processing community. In contrast, few studies conducted clustering, which is a newly emerging problem. Acoustic clustering aims at merging recordings of same class into single cluster without using prior information and training classifiers. this study, we propose method for that jointly optimizes procedures feature learning iteration. proposed method, learned deep embedding extracted from convolutional...
The influence of temperature on water treeing in polyethylene (PE) is investigated this paper. Low-density (LDPE) and peroxide cross-linked (XLPE) are chosen as the test materials. liquid needle electrode method used for tree development at 20, 40, 60 80 °C, a metallographic microscope to observe morphology. sizes initiation rate trees measured. results obtained from experiments indicate that significantly influenced by temperature. rates decrease first, then increase rises. also exhibit an...
Existing methods for few-shot speaker identification (FSSI) obtain high accuracy, but their computational complexities and model sizes need to be reduced lightweight applications. In this work, we propose a FSSI method using prototypical network with the final goal implement on intelligent terminals limited resources, such as smart watches speakers. proposed network, an embedding module is designed perform feature grouping reducing memory requirement complexity, interaction enhancing...
Biometric-based authentication can provide strong safety guarantee of user identity, but creates other concerns pertaining to template security. How prevent the templates from being abused, and how protect users' privacy well are important problems. This paper proposes an approach based on knowledge signatures solve these problems, which provides a cancelable template. Only non-invertible transformed version biometrics is stored in server, original data can't be obtained, so protected well....
Source recording device matching from two speech recordings is a new and important problem of digital media forensics. It aims to answer the question that whether or not are recorded by same device. In this study we propose source cell phone scheme. The Gaussian supervector (GSV) based on Mel-frequency cepstral coefficients (MFCCs) extracted sparse represented with respect dictionary learned K-SVD algorithm. reduced-dimensional representation coefficient utilized characterize intrinsic...
Speaker clustering is a task to merge speech segments uttered by the same speaker into single cluster, which an effective tool for alleviating management of massive amount audio documents. In this paper, we present work co-optimizing two main steps clustering, namely, feature learning and cluster estimation. our method, deep representation learned convolutional autoencoder network (DCAN), while estimation realized softmax layer that combined with DCAN. We devise integrated loss function...
Label Smoothing Regularization (LSR) is a widely used tool to generalize classification models by replacing the one-hot ground truth with smoothed labels. Recent research on LSR has increasingly focused correlation between and Knowledge Distillation (KD), which transfers knowledge from teacher model lightweight student penalizing their output’s Kullback–Leibler-divergence. Based this observation, Teacher-free (Tf-KD) method was proposed in previous work. Instead of real model, handcrafted...
Few-shot Class-incremental Audio Classification (FCAC) is a task to continuously identify incremental classes with only few training samples after the model on base abundant samples. The key solving FCAC problem ensure that has good stability (without forgetting classes) and strong plasticity overfitting classes). In this paper, we propose method which able adaptively mitigate model's of classes. Our consists an embedding extractor expandable classifier. former backbone residual network...
This work is an improved system that we submitted to task 1 of DCASE2023 challenge. We propose a method low-complexity acoustic scene classification by parallel attention-convolution network which consists four modules, including pre-processing, fusion, global and local contextual information extraction. The proposed computationally efficient capture from each audio clip. In addition, integrate other techniques into our method, such as knowledge distillation, data augmentation, adaptive...