- Music and Audio Processing
- Speech and Audio Processing
- Speech Recognition and Synthesis
- Natural Language Processing Techniques
- Music Technology and Sound Studies
- Topic Modeling
- Advanced Adaptive Filtering Techniques
- Water Quality Monitoring Technologies
- Multimodal Machine Learning Applications
- Domain Adaptation and Few-Shot Learning
- Speech and dialogue systems
- Fish Ecology and Management Studies
- Indoor and Outdoor Localization Technologies
- Blind Source Separation Techniques
- Conducting polymers and applications
- Text and Document Classification Technologies
- Hearing Loss and Rehabilitation
- Perovskite Materials and Applications
- Video Surveillance and Tracking Methods
- Organic Light-Emitting Diodes Research
- Advanced Algorithms and Applications
- Chalcogenide Semiconductor Thin Films
- Smart Agriculture and AI
- Advanced Cellulose Research Studies
- Powder Metallurgy Techniques and Materials
University of Surrey
2021-2025
The Synergetic Innovation Center for Advanced Materials
2023
Nanjing Tech University
2023
Qingdao University
2022
Shanghai Institute of Technology
2022
Beijing University of Posts and Telecommunications
2019
ABSTRACT Digital aquaculture leverages advanced technologies and data‐driven methods, providing substantial benefits over traditional practices. This article presents a comprehensive review of three interconnected digital tasks, namely, fish tracking, counting, behaviour analysis, using novel unified approach. Unlike previous reviews which focused on single modalities or individual we analyse vision‐based (i.e., image‐ video‐based), acoustic‐based, biosensor‐based methods across all tasks....
Although the power conversion efficiency values of perovskite solar cells continue to be refreshed, it is still far from theoretical Shockley-Queisser limit. Two major issues need addressed, including disorder crystallization and unbalanced interface charge extraction, which limit further improvements in device efficiency. Herein, we develop a thermally polymerized additive as polymer template film, can form monolithic grain unique "Mortise-Tenon" structure after spin-coating hole-transport...
In this paper, we introduce the task of language-queried audio source separation (LASS), which aims to separate a target from an mixture based on natural language query (e.g., "a man tells joke followed by people laughing"). A unique challenge in LASS is associated with complexity description and its relation sources. To address issue, proposed LASS-Net, end-to-end neural network that learned jointly process acoustic linguistic information, consistent mixture. We evaluate performance our...
Deep generative models have recently achieved impressive performance in speech and music synthesis. However, compared to the generation of those domain-specific sounds, generating general sounds (such as siren, gunshots) has received less attention, despite their wide applications. In previous work, SampleRNN method was considered for sound time domain. is potentially limited capturing long-range dependencies within it only back-propagates through a number samples. this we propose via neural...
Audio captioning aims at using language to describe the content of an audio clip. Existing systems are generally based on encoder-decoder architecture, in which acoustic information is extracted by encoder and then a decoder used generate captions. Training system often encounters problem data scarcity. Transferring knowledge from pre-trained models such as Pre-trained Neural Networks (PANNs) have recently emerged useful method mitigate this issue. However, there less attention exploiting...
Automated audio captioning aims to use natural language describe the content of data. This paper presents an system with encoder-decoder architecture, where decoder predicts words based on features extracted by encoder. To improve proposed system, transfer learning from either upstream audio-related task or a large in-domain dataset is introduced mitigate problem induced data scarcity. Besides, evaluation metrics are incorporated into optimization model reinforcement learning, which helps...
Thousands of ILs with the potential to efficiently dissolve hemicellulose were screened by COSMO-RS, and best model was constructed verified. This screening method will play an important role in sustainable development.
Hybrid halide perovskite solar cells (PSCs) have emerged as the next-generation photovoltaic technology. Compared to steady silicon cells, PSCs are facilely processable but easily generate defects/traps during thin-film fabrication from solution. To passivate these defects, which been considered origin of PSC instability, numerous large-sized organic cations (LSOCs) were applied via post-treatment methods. Unfortunately, along with passivation on LSOCs could also react regular phases and...
Zero-shot cross-domain slot filling aims to transfer knowledge from the labeled source domain unlabeled target domain. Existing models either encode descriptions and examples or design handcrafted question templates using heuristic rules, suffering poor generalization capability robustness. In this paper, we propose a generative zero-shot prompt learning framework for filling, both improving robustness than previous work. Besides, introduce novel inverse prompting strategy distinguish...
Recently, the development of deep learning boosts object detection for remote sensing images. The existing methods can be divided into two types. region-based represented by Faster R-CNN have progressive performance in accuracy. However, their computational cost is massive due to Convolutional Neural Network (CNN) backbones, which limits efficiency. regression-based such as YOLO and Single Shot MultiBox Detector (SSD) are advantageous speed while accuracy not satisfactory. To meet increasing...
Audio-visual tracking of multiple speakers requires to estimate the state (e.g. velocity and location) each speaker by leveraging information both audio visual modalities. Estimating number their states jointly remains a challenging problem. We propose an Audio-Visual Possion Multi-Bernoulli Mixture Filter (AV-PMBM) that can not only predict but also give accurate estimation states. novel sound source localization technique based on DOA deep learning object detector provide reliable...
Acoustic scene classification (ASC) aims to classify an audio clip based on the characteristic of recording environment. In this regard, deep learning approaches have emerged as a useful tool for ASC problems. Conventional improving accuracy include integrating auxiliary methods such attention mechanism, pre-trained models and ensemble multiple sub-networks. However, due complexity clips captured from different environments, it is difficult distinguish their categories without using any...
Particle filters (PFs) have been widely used in speaker tracking due to their capability modeling a non-linear process or non-Gaussian environment. However, particle are limited by several issues. For example, pre-defined handcrafted measurements often which can limit the model performance. In addition, transition and update models preset make PF less flexible be adapted different scenarios. To address these issues, we propose an end-to-end differentiable filter framework employing...
Fish feeding intensity assessment (FFIA) aims to evaluate the change of fish appetite during process, which is potentially useful in industrial aquaculture. Previous methods are mainly based on computer vision techniques. However, these limited by water refraction and uneven illumination. In this paper, we introduce a new approach for FFIA using audio. We create audio dataset FFIA, namely AFFIA3K, contains 3000 labelled clips different (None, Weak, Medium, Strong). present deep learning...
In real dialogue scenarios, as there are unknown input noises in the utterances, existing supervised slot filling models often perform poorly practical applications. Even though some studies on noise-robust models, these works only evaluated rule-based synthetic datasets, which is limiting, making it difficult to promote research of methods. this paper, we introduce a noise robustness evaluation dataset named Noise-SF for task. The proposed contains five types human-annotated noise, and all...
Multi-domain text classification can automatically classify texts in various scenarios. Due to the diversity of human languages, with same label different domains may differ greatly, which brings challenges multi-domain classification. Current advanced methods use private-shared paradigm, capturing domain-shared features by a shared encoder, and training private encoder for each domain extract domain-specific features. However, realistic scenarios, these suffer from inefficiency as new are...
Most existing slot filling models tend to memorize inherent patterns of entities and corresponding contexts from training data. However, these can lead system failure or undesirable outputs when being exposed spoken language perturbation variation in practice. We propose a perturbed semantic structure awareness transferring method for perturbation-robust models. Specifically, we introduce two MLM-based strategies respectively learn contextual word distribution unsupervised corpus. Then,...
Sound event localization and detection (SELD) combines two subtasks: sound (SED) direction of arrival (DOA) estimation. SELD is usually tackled as an audio-only problem, but visual information has been recently included. Few audio-visual (AV)-SELD works have published most employ vision via face/object bounding boxes, or human pose keypoints. In contrast, we explore the integration audio feature embeddings extracted with pre-trained deep networks. For modality, tested ResNet50 Inflated 3D...
Sound event localization and detection (SELD) aims to determine the appearance of sound classes, together with their Direction Arrival (DOA). However, current SELD systems can only predict activities specific for example, 13 classes in DCASE challenges. In this paper, we propose text-queried target (SEL), a new paradigm that allows user input text describe event, SEL model location related event. The proposed task presents more user-friendly way human-computer interaction. We provide...
Universal sound separation (USS) is a task of separating mixtures arbitrary sources. Typically, universal models are trained from scratch in supervised manner, using labeled data. Self-supervised learning (SSL) an emerging deep approach that leverages unlabeled data to obtain task-agnostic representations, which can benefit many downstream tasks. In this paper, we propose integrating self-supervised pre-trained model, namely the audio masked autoencoder (A-MAE), into system enhance its...
Cascaded speech-to-speech translation systems often suffer from the error accumulation problem and high latency, which is a result of cascaded modules whose inference delays accumulate. In this paper, we propose transducer-based speech model that outputs discrete tokens in low-latency streaming fashion. This approach eliminates need for generating text output first, followed by machine (MT) text-to-speech (TTS) systems. The produced can be directly used to generate signal with low latency...