- Speech and Audio Processing
- Music and Audio Processing
- Parallel Computing and Optimization Techniques
- Advancements in Semiconductor Devices and Circuit Design
- Indoor and Outdoor Localization Technologies
- Hearing Loss and Rehabilitation
- Speech Recognition and Synthesis
- Underwater Acoustics Research
- Advanced Algorithms and Applications
- Analog and Mixed-Signal Circuit Design
- Acoustic Wave Phenomena Research
- Advanced Data Storage Technologies
- Advancements in PLL and VCO Technologies
- Distributed systems and fault tolerance
- Semiconductor materials and devices
- Advanced Adaptive Filtering Techniques
- Silicon Carbide Semiconductor Technologies
- Hydrological Forecasting Using AI
- Flood Risk Assessment and Management
- Advanced Sensor and Control Systems
- Digital Filter Design and Implementation
- Interconnection Networks and Systems
- Video Surveillance and Tracking Methods
- Smart Grid and Power Systems
- Advanced Wireless Communication Techniques
Beijing Microelectronics Technology Institute
1998-2024
Westlake University
2021-2024
North China University of Technology
2008-2024
Southwestern University of Finance and Economics
2024
Hubei Zhongshan Hospital
2024
Wuhan University
2024
Sichuan Agricultural University
2024
Hangzhou Dianzi University
2023
Peking University
2002-2022
Shandong University of Technology
2022
Graph convolutional networks have been widely used for skeleton-based action recognition due to their excellent modeling ability of non-Euclidean data. As the graph convolution is a local operation, it can only utilize short-range joint dependencies and short-term trajectory but fails directly model distant joints relations long-range temporal information that are vital distinguishing various actions. To solve this problem, we present multi-scale spatial (MS-GC) module (MT-GC) enrich...
Most binaural speech source localization models perform poorly in unprecedentedly noisy and reverberant situations. Here, this issue is approached by modelling a multiscale dilated convolutional neural network (CNN). The time-related crosscorrelation function (CCF) energy-related interaural level differences (ILD) are preprocessed separate branches of network. CNN can encode discriminative representations for CCF ILD, respectively. After encoding, the individual fused to map direction....
This article proposes a deep neural network (DNN)-based direct-path relative transfer function (DP-RTF) enhancement method for robust direction of arrival (DOA) estimation in noisy and reverberant environments. The DP-RTF refers to the ratio between acoustic functions two microphone channels. First, complex-value is decomposed into inter-channel intensity difference, sinusoidal phase difference time-frequency domain. Then, features from series temporal context frames are utilized train DNN...
This paper addresses the problem of multiple sound source counting and localization in adverse acoustic environments, using microphone array recordings. The proposed time-frequency (TF) wise spatial spectrum clustering based method contains two stages. First, given received sensor signals, correlation matrix is computed denoised TF domain. TF-wise estimated on signal subspace information, further enhanced by an exponential transform, which can increase reliability presence possibility...
Direct-path relative transfer function (DP-RTF) refers to the ratio between direct-path acoustic functions of two microphone channels. Though DP-RTF fully encodes sound spatial cues and serves as a reliable localization feature, it is often erroneously estimated in presence noise reverberation. This paper proposes learn with deep neural networks for robust binaural source localization. A learning network designed regress sensor signals real-valued representation DP-RTF. It consists branched...
Abstract This research conducted quasi‐experiments in four middle schools to evaluate the long‐term effects of an intelligent web‐based English instruction system, C omputer S imulation E ducational ommunication ( CSIEC ), on students' academic attainment. The analysis regular examination scores and vocabulary test validates positive impact , most cases, is statistically significant. reliability ensured by spectrum students from Grade 1 3 three junior high 2 one senior school, teachers with...
Multiple moving sound source localization in real-world scenarios remains a challenging issue due to interaction between sources, time-varying trajectories, distorted spatial cues, etc. In this work, we propose use deep learning techniques learn competing and direct-path phase differences for localizing multiple sources. A causal convolutional recurrent neural network is designed extract the difference sequence from signals of each microphone pair. To avoid assignment ambiguity problem...
As a result of climate change and rapid urbanization, urban waterlogging commonly caused by rainstorm, is becoming more frequent severe in developing countries. Urban sometimes results significant financial losses as well human casualties. Accurate depth prediction critical for early warning system emergency response. However, the existing hydrological models need to obtain abundant data, model construction complicated. The technology based on object detection are highly dependent image...
Multiple sound source localization in wireless acoustic sensor networks (WASNs) is a challenging problem. Although compressive sensing based methods have shown effectiveness uncorrelated sources localization, their performance degrades significantly when they are used to locate multiple speech sources. To this end, we propose method on the time difference of arrival (TDOA) clustering and multi-path matching pursuit algorithm. First, TDOAs calculated locally time-frequency (TF) bins...
<title>Abstract</title> Unmanned Aerial Vehicles (UAVs) capture aerial photographs with a wide viewing angle, variable backgrounds, and high-speed motion imaging. Object detection in UAV images is challenging due to significant changes object scale, small mutually occluded objects, lack of feature information. Conventional algorithms have poor real-time performance accuracy this field. The YOLO algorithm prone high false omission rates for objects complex scenes, leading accuracy. To address...
Audio-visual speaker tracking in 3D space is a challenging problem. Although the classical particle filter based methods have shown effectiveness audio-visual tracking, performance degrades considerably when measurements are disturbed by noise. To this end, novel two-layer proposed for tracking. Firstly, two groups of particles, which generated from audio and video streams respectively, propagated independently layer visual layer. Then, likelihoods combined an adaptive sigmoid function, can...
Terahertz (THz) nondestructive testing (NDT) technology has been increasingly applied to the internal defect detection of composite materials. However, THz image is affected by background noise and power limitation, leading poor quality. The recognition rate based on traditional machine vision algorithms not high. above methods are usually unable determine surface defects in a timely accurate manner. In this paper, we propose method detect materials using terahertz images faster...
Various time-frequency (T-F) masks are being applied to sound source localization tasks. Moreover, deep learning has dramatically advanced T-F mask estimation. However, existing usually designed for speech separation tasks and suitable only single-channel signals. A novel complex-valued is proposed that reserves the head-related transfer function (HRTF), customized binaural localization. In addition, because convolutional neural network exploited estimate takes spectral information as input...
Defective wafer pattern recognition is important for quality control and yield enhancement in semiconductor fabrication systems. The collected maps are usually imbalanced, which may degrade the performance of classifier. In this paper, a focal auxiliary classifier generative adversarial network (FAC-GAN) defective with imbalanced data proposed. FAC-GAN composed AC-GAN modified loss generation deep neural network. proposed measured on real-world map dataset "WM-811k" it outperforms SVM CNN.
Water level prediction in large dammed rivers is an important task for flood control, hydropower generation, and ecological protection. The variations of water levels are traditionally simulated based on hydrological models. Recently, most studies have begun applying deep learning (DL) models as alternative method forecasting the dynamics levels. However, it still challenging to directly apply DL simultaneous across multiple sites. This study attempts develop a hybrid framework by combining...
Harmful algal blooms (HABs) have been deteriorating global water bodies, and the accurate prediction of dynamics using modelling method is a challenging research area. High-frequency monitoring deep learning technology opened up new horizons for HAB forecasting. However, non-stationary stochastic process behind largely limits performance early warning booms. Through an analysis published literature, we found that decomposition methods are widely used in time-series hydrological processes....
Lip-reading methods and fusion strategy are crucial for audio-visual speech recognition. In recent years, most approaches involve two separate audio visual streams with early or late strategies. Such a single-stage method may fail to guarantee the integrity representativeness of information simultaneously. This paper extends traditional network two-step feature by adding an (AV-EFF) stream baseline model. can learn different stages, preserving original features as much possible ensuring...
The fusion of audio and visual modalities is an important stage audio-visual speech recognition (AVSR), which generally approached through feature or decision fusion. Feature can exploit the covariations between features from different effectively, whereas shows robustness capturing optimal combination multimodality. In this work, to take full advantage complementarity two strategies address challenge inherent ambiguity in noisy environments, we propose a novel hybrid based AVSR method with...
Lipreading is an important component of audio-visual speech recognition. However, lips are usually modeled as a whole in lipreading, which ignores that each part lip focuses on different characteristics mouth and the overall model can not fit perfectly. Besides, features based vary lot according to speakers, leads training databases need contain much speakers possible. In this paper, A part-based lipreading (PBL) method proposed deal with mismatch between separate parts lips, also excessive...
Background: Motor is a device that converts electrical energy into mechanical energy. It one of the most widely used equipments. Its running state directly affects performance machinery. Keywords: Alternating current motor, fault diagnosis, feature extraction, improved particle swarm optimization, support vector machine, wavelet packet.
This paper proposes a novel cross correlation function (CCF) extraction method based on convolutional neural network for time difference of arrival (TDOA) estimation or further direction (DOA) estimation. CNN is utilized to learn the relationship between localization features and pre-processed waveform signal which may include not only source but also background noise reverberation. In contrast many previous sound approaches, proposed focuses spatial feature extraction. Two kind outputs,...