- Speech and Audio Processing
- Advanced Adaptive Filtering Techniques
- Music and Audio Processing
- Speech Recognition and Synthesis
- Hearing Loss and Rehabilitation
- Advanced Data Compression Techniques
- Radio Frequency Integrated Circuit Design
- Blind Source Separation Techniques
- Microwave Engineering and Waveguides
- Underwater Acoustics Research
- Photonic and Optical Devices
- Image and Signal Denoising Methods
- Acoustic Wave Phenomena Research
- Underwater Vehicles and Communication Systems
- Digital Media Forensic Detection
- Optical Network Technologies
- Advanced Power Amplifier Design
- Indoor and Outdoor Localization Technologies
- Human auditory perception and evaluation
- Advanced Photonic Communication Systems
- Advanced Fiber Laser Technologies
- Geophysical Methods and Applications
- Music Technology and Sound Studies
- Advanced Sensor and Control Systems
- Natural Language Processing Techniques
Beijing University of Technology
2016-2025
China Academy of Engineering Physics
2020-2024
University of Electronic Science and Technology of China
2017
Abstract To address the challenges of poor representation capability and low data utilization rate end-to-end speech recognition models in deep learning, this study proposes an model based on multi-scale feature fusion multi-view self-supervised learning (MM-ASR). It adopts a multi-task paradigm for training. The proposed method emphasizes importance inter-layer information within shared encoders, aiming to enhance model’s characterization via module. Moreover, we apply effectively exploit...
A speech enhancement approach using adaptive wavelet threshold and spectral subtraction in the domain is proposed this paper. First, order to maintain linear phase prevent aliasing of reconstructed signals, noisy signals are decomposed by un-decimated bi-orthogonal packet. Second, used for reducing noise from two low-frequency sub-bands, which avoids excessive distortion components caused de-noising. Next, improved shrinkage algorithm adopted other high-frequency sub-bands. This updated...
The performance of the existing speech enhancement algorithms is not ideal in low signal-to-noise ratio (SNR) non-stationary noise environments. In order to resolve this problem, a novel algorithm based on multi-feature and adaptive mask with deep learning presented paper. First, we construct new feature called multi-resolution auditory cepstral coefficient (MRACC). This which extracted from four cochleagrams different resolutions can capture local information spectrotemporal context reduce...
The intelligibility, the definition and Comfort for digital hearing aids were reduced because of distorting formant speech resulted by existing loudness compensation method(LCM). In order to solve these problems, a LCM based on human auditory is proposed in this paper. This method adopts gammatone filter banks which can simulate model ear cochlea. Input signal divided into 32 frequency bands banks. And then each band compressed or amplified accordance with curve impaired. Experimental...
The recognition precision of the existing auditory scene algorithms is relatively satisfactory, but they can only be applied to several noise scenarios, and it can't meet performance requirements digital hearing aids in complex environment. In order solve above problems, algorithm based on multi-feature weighted minimum distance classifier proposed this paper. algorithm, speech endpoint detection band-partitioning spectral entropy energy used divide noisy into segment segment. Then...
In this paper, by using the cyclostationary properties of speech signal, a voice activity detection (VAD) algorithm based on cyclic cumulant is proposed. The proposed scheme employs third-order LPC residual signal. Analytical expressions for short-term are derived assuming sinusoidal model. Matrix pencil method (MP) adopted to estimate frequencies harmonic signal contained in which used as cumulant. Then defined and construct VAD variation. test results show that gives better than G.729B VAD.
A Carrier Suppressed Single Sideband(CS-SSB) modulation decoupling control technique based on Dual Parallel Mach-Zehnder Modulator (DPMZM) is proposed to solve the problem that bias point of DPMZM leads degradation signal under influence ambient temperature, fiber coupling loss or other factors. This introduces three non-disturbing dithering voltages at end modulator and detects power ratio optical output realize CS-SSB modulation. Experimental results show can effectively stabilize Optical...
This paper presents a 0.1-6 GHz digital controlled variable gain low noise MMIC amplifier using 0.15 μm GaAs pHEMT technology. The VGA is composed of three stages common source (LNA) cascade and attenuator. proposed both by LNA provides high which has step 0.5 dB 2-bit attenuator adjusts the 4-bit according to requirement. To reduce chip size improve input P1 dB, design novelty utilizes parallel pHEMTs as switches control gate bias path extract voltage from series resistor divider circuit...
Most of the current pitch detection algorithms can not work well under high noise environment. For this reason, a algorithm for noisy speech signal based on pre-filtering and weighted wavelet coefficients is proposed. Firstly, signals are pre-filtered. Secondly, pre-filtered decomposed by quadratic spline wavelet. Thirdly, three consecutive scales to emphasize sharp change points. Fourthly, candidate periods extracted from signals. Finally, period calculated autocorrelation function....
An LP (Linear Prediction) spectrum modification method is proposed based on linear extrapolation for different signal to noise ratio under the white environment. A direct noisy employed, since autocorrelation coefficients of original speech are not available in some practical applications. The suppression rule applied into high-frequency region spectrum, while no made low-frequency region. experimental results ITU-T G.722.2 demonstrate that, comparing with reference methods, could provide...
A compressed domain speech enhancement method based on the joint modification of adaptive and algebraic codebook gains for codec ITU-T G.722.2 is proposed in this paper. First power excitation signal corresponding to noise estimated by minimum statistics. Then decision-directed approach used get an estimate a priori SNR. And gain modified multiplying Wiener-type factor. In order solve problem loss voiced segment, got keeping equal scaled version noisy one. The result performance evaluation...
In this paper a compressive sampling method of MLT coefficients which is used for extracting stereo information adopted based on principal component analysis (PCA) and Modulated Lapped Transform (MLT). With method, an embedded variable bit-rates speech audio coding algorithm proposed in paper. codec, the signal sampled at 32 kHz 16 can be coded terms scalable bit rates, structure bit-stream divided into several layers. The core codec ITUT G. 729.1 process mono with 7 bandwidth. Besides there...
Abstract A novel deep learning (DL) method is proposed for binaural sound source localization with low SNR. Firstly, the signals are decomposed into several channels by using Gammatone filter. Secondly, 4 feature parameters of Head-related Transfer Function, interaural time difference (ITD), coherence (IC), level (ILD), and phase (IPD) extracted. Thirdly, ITD IC go through a Deep Belief Network (DBN) to determine quadrant reduce positioning range. Then, ITD, IC, ILD, IPD Neural (DNN) obtain...