NFDI4DS | UHH-SEMS - Publication Details

Guanglai Gao

ORCID: 0009-0005-5513-1192

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5076174513

Research Areas

Natural Language Processing Techniques
Speech Recognition and Synthesis
Topic Modeling
Speech and Audio Processing
Image Retrieval and Classification Techniques
Handwritten Text Recognition Techniques
Music and Audio Processing
Advanced Image and Video Retrieval Techniques
Image Processing and 3D Reconstruction
Web Data Mining and Analysis
Multimodal Machine Learning Applications
Advanced Computational Techniques and Applications
Speech and dialogue systems
Emotion and Mood Recognition
Text and Document Classification Technologies
Advanced Graph Neural Networks
Advanced Text Analysis Techniques
Educational Technology and Assessment
Neural Networks and Applications
Sentiment Analysis and Opinion Mining
Face and Expression Recognition
Linguistics and Cultural Studies
Advanced Adaptive Filtering Techniques
Digital Media Forensic Detection
Data Management and Algorithms

Inner Mongolia University
2016-2025

National University of Mongolia
2017-2021

University of Delaware
2009

Louisiana State University
2009

Inner Mongolia University of Technology
2009

Text-to-Speech for Low-Resource Agglutinative Language With Morphology-Aware Language Model Pre-Training

OPENALEX - Publications

Rui Liu Yifan Hu Haolin Zuo Zhaojie Luo Longbiao Wang and 1 more

Text-to-Speech (TTS) aims to convert the input text a human-like voice. With development of deep learning, encoder-decoder based TTS models perform superior performance, in terms naturalness, mainstream languages such as Chinese, English, etc. Note that linguistic information learning capability encoder is key. However, for low-resource agglutinative languages, scale <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math...

10.1109/taslp.2023.3348762 article EN IEEE/ACM Transactions on Audio Speech and Language Processing 2024-01-01

Expressive TTS Training With Frame and Style Reconstruction Loss

OPENALEX - Publications

Rui Liu Berrak Şişman Guanglai Gao Haizhou Li

We propose a novel training strategy for Tacotron-based text-to-speech (TTS) system that improves the speech styling at utterance level. One of key challenges in prosody modeling is lack reference makes explicit difficult. The proposed technique doesn't require annotations from data. It attempt to model explicitly either, but rather encodes association between input text and its styles using TTS framework. This study marks departure style token paradigm where modeled by bank embeddings....

10.1109/taslp.2021.3076369 article EN cc-by IEEE/ACM Transactions on Audio Speech and Language Processing 2021-01-01

Exploiting Modality-Invariant Feature for Robust Multimodal Emotion Recognition with Missing Modalities

OPENALEX - Publications

Haolin Zuo Rui Liu Jinming Zhao Guanglai Gao Haizhou Li

Multimodal emotion recognition leverages complementary information across modalities to gain performance. However, we cannot guarantee that the data of all are always present in practice. In studies predict missing modalities, inherent difference between heterogeneous namely modality gap, presents a challenge. To address this, propose use invariant features for imagination network (IF-MMIN) which includes two novel mechanisms: 1) an feature learning strategy is based on central moment...

10.1109/icassp49357.2023.10095836 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023-05-05

Training Supervised Speech Separation System to Improve STOI and PESQ Directly

OPENALEX - Publications

Hui Zhang Xueliang Zhang Guanglai Gao

Supervised speech separation methods train learning machine to cast the noisy target clean speech. Most of them use mean-square error (MSE) as loss function. However, MSE is not perfect choice because it doesn't match human auditory perception. Short-time objective intelligibility (STOI) and perceptual evaluation quality (PESQ) are closely related perception widely used in research criteria. Therefore, STOI PESQ may be better choices for they nondifferentiable functions which cannot...

10.1109/icassp.2018.8461965 article EN 2018-04-01

Teacher-Student Training For Robust Tacotron-Based TTS

OPENALEX - Publications

Rui Liu Berrak Şişman Jingdong Li Feilong Bao Guanglai Gao and 1 more

While neural end-to-end text-to-speech (TTS) is superior to conventional statistical methods in many ways, the exposure bias problem autoregressive models remains an issue be resolved. The arises from mismatch between training and inference process, that results unpredictable performance for out-of-domain test data at run-time. To overcome this, we propose a teacher-student scheme Tacotron-based TTS by introducing distillation loss function addition feature function. We first train...

10.1109/icassp40776.2020.9054681 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020-04-09

Multi-space channel representation learning for mono-to-binaural conversion based audio deepfake detection

OPENALEX - Publications

Rui Liu Jinhua Zhang Guanglai Gao

10.1016/j.inffus.2024.102257 article EN Information Fusion 2024-01-21

Controllable Accented Text-to-Speech Synthesis With Fine and Coarse-Grained Intensity Rendering

OPENALEX - Publications

Rui Liu Berrak Şişman Guanglai Gao Haizhou Li

Accented text-to-speech (TTS) synthesis seeks to generate speech with an accent (L2) as a variant of the standard version (L1), which is challenging L2 different from L1 in terms phonetic rendering and prosody pattern (pitch, energy, duration variance, etc.). TTS has several significant real-world applications, such language learning, preserving documenting endangered languages dialects, etc. that make it important area research development. Moreover, changing intensity any conversational AI...

10.1109/taslp.2024.3378110 article EN IEEE/ACM Transactions on Audio Speech and Language Processing 2024-01-01

Local and Global Structure-Aware Contrastive Framework for Entity alignment

OPENALEX - Publications

Cunda Wang Weihua Wang Qiuyu Liang Guanglai Gao

10.1016/j.neucom.2025.129445 article EN Neurocomputing 2025-01-01

Structural-Aware Disentangled Learning with CLIP for Hyperbolic Zero-Shot Sketch-Based Image Retrieval*

OPENALEX - Publications

Qing Zhang Jing Zhang Feilong Bao Xiangdong Su Guanglai Gao

10.1109/icassp49660.2025.10890204 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

Dynamic Structure Hypergraph for Document-level Event Extraction

OPENALEX - Publications

Qianqian Ren Weihua Wang Jie Yu Guanglai Gao

10.1109/icassp49660.2025.10889135 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

Listening and seeing again: Generative error correction for audio-visual speech recognition

OPENALEX - Publications

Rui Liu Hongyu Yuan Guanglai Gao Haizhou Li

10.1016/j.inffus.2025.103077 article EN Information Fusion 2025-03-01

SSAN: A Symbol Spatial-Aware Network for Handwritten Mathematical Expression Recognition

OPENALEX - Publications

Haoran Zhang Xiangdong Su Zheng-Wei Zhou Guanglai Gao

The great challenge of handwritten mathematical expression recognition (HMER) is the complex structures expressions, which are directly related to symbol spatial positions. Existing HMER methods typically employ attention mechanisms in decoder their models implicitly perceive positions, or counting and tree-based strategies model relation. However, these still cannot effectively capture structural information formulas, thus negatively impacting decoding HMER. To deal with this problem...

10.1609/aaai.v39i21.34396 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2025-04-11

A keyword retrieval system for historical Mongolian document images

OPENALEX - Publications

Hongxi Wei Guanglai Gao

10.1007/s10032-013-0203-6 article EN International Journal on Document Analysis and Recognition (IJDAR) 2013-02-25

Fractal property of generalized M-set with rational number exponent

OPENALEX - Publications

Shuai Liu Xiaochun Cheng Caihe Lan Weina Fu Jiantao Zhou and 2 more

Dynamic systems described by fc(z) = z2 + c is called Mandelbrot set (M-set), which important for fractal and chaos theories due to its simple expression complex structure. zk generalized M (k–M set). This paper proposes a new theory compute the higher lower bounds of while exponent k rational, proves relevant properties, such as that could cover whole number plane when < 1, boundary ranges from circle with radius 1 infinite large. explores characteristics set, k–M determined k, p/q, where p...

10.1016/j.amc.2013.06.096 article EN cc-by Applied Mathematics and Computation 2013-08-07

A Pairwise Algorithm Using the Deep Stacking Network for Speech Separation and Pitch Estimation

OPENALEX - Publications

Xueliang Zhang Hui Zhang Shuai Nie Guanglai Gao Wenju Liu

Speech separation and pitch estimation in noisy conditions are considered to be a "chicken-and-egg" problem. On one hand, information is an important cue for speech separation. the other makes easier when background noise removed. In this paper, we propose supervised learning architecture solve these two problems iteratively. The proposed algorithm based on deep stacking network (DSN), which provides method simple processing modules build architectures. Each module classifier whose target...

10.1109/taslp.2016.2540805 article EN IEEE/ACM Transactions on Audio Speech and Language Processing 2016-03-10

Exploiting Morphological and Phonological Features to Improve Prosodic Phrasing for Mongolian Speech Synthesis

OPENALEX - Publications

Rui Liu Berrak Şişman Feilong Bao Jichen Yang Guanglai Gao and 1 more

Prosodic phrasing is an important factor that affects naturalness and intelligibility in text-to-speech synthesis. Studies show deep learning techniques improve prosodic when large text speech corpus are available. However, for low-resource languages, such as Mongolian, remains a challenge various reasons. First, the database suitable system training limited. Second, word composition knowledge prosody-informing has not been used phrase modeling. To address these problems, this article, we...

10.1109/taslp.2020.3040523 article EN cc-by IEEE/ACM Transactions on Audio Speech and Language Processing 2020-11-25

TeAST: Temporal Knowledge Graph Embedding via Archimedean Spiral Timeline

OPENALEX - Publications

Jiang Li Xiangdong Su Guanglai Gao

Temporal knowledge graph embedding (TKGE) models are commonly utilized to infer the missing facts and facilitate reasoning decision-making in temporal based systems. However, existing methods fuse information into entities, potentially leading evolution of entity limiting link prediction performance TKG. Meanwhile, current TKGE often lack ability simultaneously model important relation patterns provide interpretability, which hinders their effectiveness potential applications. To address...

10.18653/v1/2023.acl-long.862 article EN cc-by 2023-01-01

L$^2$GC: Lorentzian Linear Graph Convolutional Networks For Node Classification

OPENALEX - Publications

Qiuyu Liang Weihua Wang Feilong Bao Guanglai Gao

Linear Graph Convolutional Networks (GCNs) are used to classify the node in graph data. However, we note that most existing linear GCN models perform neural network operations Euclidean space, which do not explicitly capture tree-like hierarchical structure exhibited real-world datasets modeled as graphs. In this paper, attempt introduce hyperbolic space into and propose a novel framework for Lorentzian GCN. Specifically, map learned features of nodes then feature transformation underlying...

10.48550/arxiv.2403.06064 preprint EN arXiv (Cornell University) 2024-03-09

Betray Oneself: A Novel Audio DeepFake Detection Model via Mono-to-Stereo Conversion

OPENALEX - Publications

Rui Liu Jinhua Zhang Guanglai Gao Haizhou Li

10.21437/interspeech.2023-2335 article EN Interspeech 2022 2023-08-14

Sub-Band Knowledge Distillation Framework for Speech Enhancement

OPENALEX - Publications

Xiang Hao Shixue Wen Xiangdong Su Yun Liu Guanglai Gao and 1 more

In single-channel speech enhancement, methods based on fullband spectral features have been widely studied.However, only a few pay attention to non-full-band features.In this paper, we explore knowledge distillation framework sub-band mapping for enhancement.Specifically, divide the full frequency band into multiple sub-bands and pre-train an elite-level enhancement model (teacher model) each sub-band.These teacher models are dedicated processing their own sub-bands.Next, under models'...

10.21437/interspeech.2020-1539 article EN Interspeech 2022 2020-10-25

Improving Mongolian Phrase Break Prediction by Using Syllable and Morphological Embeddings with BiLSTM Model

OPENALEX - Publications

Rui Liu Feilong Bao Guanglai Gao Hui Zhang Yonghe Wang

10.21437/interspeech.2018-1706 article EN Interspeech 2022 2018-08-28

Learning Morpheme Representation for Mongolian Named Entity Recognition

OPENALEX - Publications

Weihua Wang Feilong Bao Guanglai Gao

10.1007/s11063-019-10044-6 article EN Neural Processing Letters 2019-05-02

A knowledge-based recognition system for historical Mongolian documents

OPENALEX - Publications

Xiangdong Su Guanglai Gao Hongxi Wei Feilong Bao

10.1007/s10032-016-0267-1 article EN International Journal on Document Analysis and Recognition (IJDAR) 2016-04-25

Expressive TTS Training with Frame and Style Reconstruction Loss

OPENALEX - Publications

Rui Liu Berrak Şişman Guanglai Gao Haizhou Li

We propose a novel training strategy for Tacotron-based text-to-speech (TTS) system to improve the expressiveness of speech. One key challenges in prosody modeling is lack reference that makes explicit difficult. The proposed technique doesn't require annotations from data. It attempt model explicitly either, but rather encodes association between input text and its styles using TTS framework. Our idea marks departure style token paradigm where modeled by bank embeddings. adopts combination...

10.48550/arxiv.2008.01490 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Coming Soon ...