- Speech and Audio Processing
- Speech Recognition and Synthesis
- Advanced Adaptive Filtering Techniques
- Music and Audio Processing
- Image and Signal Denoising Methods
- Infrastructure Maintenance and Monitoring
- Multilingual Education and Policy
- Photovoltaic System Optimization Techniques
- 3D Surveying and Cultural Heritage
- Hearing Loss and Rehabilitation
- Emotion and Mood Recognition
- Solar Thermal and Photovoltaic Systems
- Speech and dialogue systems
- Solar Radiation and Photovoltaics
- Geotechnical Engineering and Analysis
- Second Language Learning and Teaching
- EFL/ESL Teaching and Learning
University of Science and Technology of China
2021-2025
We propose a viseme subword modeling (VSM) approach to improve the generalizability and interpretability capabilities of deep neural network based lip reading. A comprehensive analysis preliminary experimental results reveals complementary nature conventional end-to-end (E2E) proposed VSM frameworks, especially concerning speaker head movements. To increase reading accuracy, we hybrid subwords (HVSEM), which exploits strengths both approaches through multitask learning. As an extension...
A multi-level distortion measure (MLDM) is proposed as an objective to optimize deep neural network-based speech enhancement (SE) in both audio-only and audio-visual scenarios. The aim achieve simultaneous performance improvements quality, intelligibility, recognition error reductions. Moreover, a comprehensive correlation analysis shows that these three evaluation metrics exhibit high Pearson coefficient (PCC) values with commonly used optimization objectives: the mean squared between ideal...
In this paper, we propose a novel framework for recognizing both discrete and dimensional emotions. our framework, deep features extracted from foundation models are used as robust acoustic visual representations of raw video. Three different structures based on attention-guided feature gathering (AFG) designed fusion. Then, introduce joint decoding structure emotion classification valence regression in the stage. A multi-task loss uncertainty is also to optimize whole process. Finally, by...
The exploration of language skills in models (LMs) has always been one the central goals mechanistic interpretability. However, existing circuit analyses often fall short representing full functional scope these models, primarily due to exclusion Feed-Forward layers. Additionally, isolating effect a single skill from text, which inherently involves multiple entangled skills, poses significant challenge. To address gaps, we introduce novel concept, Memory Circuit, minimum unit that fully and...