- Speech Recognition and Synthesis
- Speech and Audio Processing
- Music and Audio Processing
- Natural Language Processing Techniques
- Fault Detection and Control Systems
- Acoustic Wave Resonator Technologies
- Advanced MRI Techniques and Applications
- Speech and dialogue systems
- Neural Networks and Applications
- Blind Source Separation Techniques
- Microfluidic and Bio-sensing Technologies
- Software Testing and Debugging Techniques
- Underwater Acoustics Research
- Vehicle License Plate Recognition
- Radiation Detection and Scintillator Technologies
- Consumer Market Behavior and Pricing
- Consumer Retail Behavior Studies
- Handwritten Text Recognition Techniques
- Innovation and Socioeconomic Development
- Robotics and Automated Systems
- Ultrasonics and Acoustic Wave Propagation
- Digital Marketing and Social Media
- Sensor Technology and Measurement Systems
- Image Processing and 3D Reconstruction
- Mechanical and Optical Resonators
Yunnan Normal University
2024
Adam Smith Institute
2022-2023
University of Glasgow
2022-2023
Tsinghua University
2019-2022
National Engineering Research Center for Information Technology in Agriculture
2021
In this paper, we present a new open source toolkit for speech recognition, named CAT (CTC-CRF based ASR Toolkit).CAT inherits the data-efficiency of hybrid approach and simplicity E2E approach, providing full-fledged implementation CTC-CRFs complete training testing scripts number English Chinese benchmarks.Experiments show obtains state-of-the-art results, which are comparable to fine-tuned models in Kaldi but with much simpler pipeline.Compared existing nonmodularized models, performs...
Neural Architecture Search (NAS), the process of automating architecture engineering, is an appealing next step to advancing end-to-end Automatic Speech Recognition (ASR), replacing expert-designed networks with learned, task-specific architectures. In contrast early computational-demanding NAS methods, recent gradient-based e.g., DARTS (Differentiable ARchiTecture Search), SNAS (Stochastic NAS) and ProxylessNAS, significantly improve efficiency. this paper, we make two contributions. First,...
Automatic speech recognition (ASR) has been significantly advanced with the use of deep learning and big data. How-ever improving robustness, including achieving equally good performance on diverse speakers accents, is still a challenging problem. In particular, children (CSR) lags behind due to 1) language characteristics children's voice are substantially different from those adults 2) sizable open dataset for not available in research community. To address these problems, we launch...
In this paper, we present a new open source toolkit for automatic speech recognition (ASR), named CAT (CRF-based ASR Toolkit). A key feature of is discriminative training in the framework conditional random field (CRF), particularly with connectionist temporal classification (CTC) inspired state topology. contains full-fledged implementation CTC-CRF and provides complete workflow CRF-based end-to-end recognition. Evaluation results on Chinese English benchmarks such as Switchboard Aishell...
The use of phonological features (PFs) potentially allows language-specific phones to remain linked in training, which is highly desirable for information sharing multilingual and crosslingual speech recognition methods low-resourced languages. A drawback suffered by previous using that the acoustic-to-PF extraction a bottom-up way itself difficult. In this paper, we propose join phonology driven phone embedding (top-down) deep neural network (DNN) based acoustic feature (bottom-up)...
With the upgrading of people’s consumption patterns, omni-channel supply chain becomes mainstream form e-commerce platform enterprise development. Aiming at two different enterprises, we construct an evolutionary game model for enterprises’ “online+offline” construction strategy by self-build or cooperating with brick-and-mortar stores. It is based on Stackelberg and Cournot competition model, combined pricing strategy, using theory perfect rationality bounded rationality, combing...
Recycling channel construction plays an important role in the development of closed-loop supply chains. In particular, emergence online recycling channels has made up for shortcomings traditional with poor information and limited markets. This paper constructs evolutionary game model to investigate cooperation between manufacturers e-commerce platforms government intervention or not. The result shows that whether enterprise actively participates cooperative depends on actual cost...
Attention-based encoder-decoder, e.g. transformer and its variants, generates the output sequence in an autoregressive (AR) manner. Despite superior performance, AR model is computationally inefficient as generation requires many iterations length. In this paper, we propose Paraformer-v2, improved version of Paraformer, for fast, accurate, noise-robust non-autoregressive speech recognition. use a CTC module to extract token embeddings, alternative continuous integrate-and-fire Paraformer....
In this study, we delve into the efficacy of transformers within pre-trained language models (PLMs) when repurposed as encoders for Automatic Speech Recognition (ASR). Our underlying hypothesis posits that, despite being initially trained on text-based corpora, these possess a remarkable capacity to extract effective features from input sequence. This inherent capability, argue, is transferrable speech data, thereby augmenting acoustic modeling ability ASR. Through rigorous empirical...
Acoustophoresis separation technique has attracted great attention due to its superior properties, such as biocompatibility, non-contact, label-free and high-efficiency. In this paper, of particles based on motion modes via tilt angle standing surface acoustic wave (TaSSAW) driven by a unidirectional transducer is developed theoretically. It verified that the designed electrode width controlled transducers are effective improve intensity field radiation force in channel. The results show...
Recently, self-attention-based transformers and conformers have been introduced as alternatives to RNNs for ASR acoustic modeling. Nevertheless, the full-sequence attention mechanism is non-streamable computationally expensive, thus requiring modifications, such chunking caching, efficient streaming ASR. In this paper, we propose apply RWKV, a variant of linear transformer, RWKV combines superior performance inference efficiency RNNs, which well-suited scenarios where budget latency memory...
Neural Architecture Search (NAS), the process of automating architecture engineering, is an appealing next step to advancing end-to-end Automatic Speech Recognition (ASR), replacing expert-designed networks with learned, task-specific architectures. In contrast early computational-demanding NAS methods, recent gradient-based e.g., DARTS (Differentiable ARchiTecture Search), SNAS (Stochastic NAS) and ProxylessNAS, significantly improve efficiency. this paper, we make two contributions. First,...
Recently, the end-to-end training approach for multi-channel ASR has shown its effectiveness, which usually consists of a beamforming front-end and recognition back-end. However, becomes more difficult due to integration multiple modules, particularly considering that multichannel speech data recorded in real environments are limited size. This raises demand exploit single-channel ASR. In this paper, we systematically compare performance three schemes external ASR, namely back-end...
Time Delay Neural Networks (TDNNs) are widely used in both DNN-HMM based hybrid speech recognition systems and recent end-to-end systems. Nevertheless, the receptive fields of TDNNs limited fixed, which is not desirable for tasks like recognition, where temporal dynamics varied affected by many factors. This paper proposes to use deformable adaptive modeling recognition. Inspired ConvNets, augment sampling locations with additional offsets learn automatically on ASR criterion, without...
Recently, recurrent neural network transducer (RNN-T) gains increasing popularity due to its natural streaming capability as well superior performance. Nevertheless, RNN-T training requires large time and computation resources loss calculation is slow consumes a lot of memory. Another limitation that it tends access more contexts for better performance, thus leading higher emission latency in ASR. In this paper we propose boundary-aware (BAT) memory-efficient low-latency BAT, the lattice...
In a speech recognition system, voice activity detection (VAD) is crucial frontend module. Addressing the issues of poor noise robustness in traditional binary VAD systems based on DFSMN, paper further proposes semantic multi-task learning with improved models for real-time and offline systems, to meet specific application requirements. Evaluations internal datasets show that, compared system RWKV achieves relative decreases CER 7.0\%, DCF 26.1\% improvement NRR 19.2\%. Similarly, when SAN-M...
In this paper, we present a new open source toolkit for speech recognition, named CAT (CTC-CRF based ASR Toolkit). inherits the data-efficiency of hybrid approach and simplicity E2E approach, providing full-fledged implementation CTC-CRFs complete training testing scripts number English Chinese benchmarks. Experiments show obtains state-of-the-art results, which are comparable to fine-tuned models in Kaldi but with much simpler pipeline. Compared existing non-modularized models, performs...
Automatic speech recognition (ASR) has been significantly advanced with the use of deep learning and big data. However improving robustness, including achieving equally good performance on diverse speakers accents, is still a challenging problem. In particular, children (CSR) lags behind due to 1) language characteristics children's voice are substantially different from those adults 2) sizable open dataset for not available in research community. To address these problems, we launch...