Hang Chen

ORCID: 0000-0002-0904-8946
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Speech and Audio Processing
  • Speech Recognition and Synthesis
  • Advanced Adaptive Filtering Techniques
  • Music and Audio Processing
  • Image and Signal Denoising Methods
  • Infrastructure Maintenance and Monitoring
  • Multilingual Education and Policy
  • Photovoltaic System Optimization Techniques
  • 3D Surveying and Cultural Heritage
  • Hearing Loss and Rehabilitation
  • Emotion and Mood Recognition
  • Solar Thermal and Photovoltaic Systems
  • Speech and dialogue systems
  • Solar Radiation and Photovoltaics
  • Geotechnical Engineering and Analysis
  • Second Language Learning and Teaching
  • EFL/ESL Teaching and Learning

University of Science and Technology of China
2021-2025

We propose a viseme subword modeling (VSM) approach to improve the generalizability and interpretability capabilities of deep neural network based lip reading. A comprehensive analysis preliminary experimental results reveals complementary nature conventional end-to-end (E2E) proposed VSM frameworks, especially concerning speaker head movements. To increase reading accuracy, we hybrid subwords (HVSEM), which exploits strengths both approaches through multitask learning. As an extension...

10.1109/tmm.2024.3390148 article EN IEEE Transactions on Multimedia 2024-01-01

A multi-level distortion measure (MLDM) is proposed as an objective to optimize deep neural network-based speech enhancement (SE) in both audio-only and audio-visual scenarios. The aim achieve simultaneous performance improvements quality, intelligibility, recognition error reductions. Moreover, a comprehensive correlation analysis shows that these three evaluation metrics exhibit high Pearson coefficient (PCC) values with commonly used optimization objectives: the mean squared between ideal...

10.1109/taslp.2024.3393732 article EN IEEE/ACM Transactions on Audio Speech and Language Processing 2024-01-01

In this paper, we propose a novel framework for recognizing both discrete and dimensional emotions. our framework, deep features extracted from foundation models are used as robust acoustic visual representations of raw video. Three different structures based on attention-guided feature gathering (AFG) designed fusion. Then, introduce joint decoding structure emotion classification valence regression in the stage. A multi-task loss uncertainty is also to optimize whole process. Finally, by...

10.1145/3581783.3612859 preprint EN 2023-10-26

The exploration of language skills in models (LMs) has always been one the central goals mechanistic interpretability. However, existing circuit analyses often fall short representing full functional scope these models, primarily due to exclusion Feed-Forward layers. Additionally, isolating effect a single skill from text, which inherently involves multiple entangled skills, poses significant challenge. To address gaps, we introduce novel concept, Memory Circuit, minimum unit that fully and...

10.48550/arxiv.2410.01334 preprint EN arXiv (Cornell University) 2024-10-02
Coming Soon ...