NFDI4DS | UHH-SEMS - Publication Details

Investigating macro marine litter and beach cleanliness along Southern Vietnam beaches

OPENALEX - Publications

Dong Nguyen Minh‐Ky Nguyen Minh‐Thuan Pham Tuan Anh Nguyen Dao Van Tri and 4 more

10.1016/j.marpolbul.2025.117566 article EN Marine Pollution Bulletin 2025-01-16

Fast and Accurate Capitalization and Punctuation for Automatic Speech Recognition Using Transformer and Chunk Merging

OPENALEX - Publications

Binh P. Nguyen Vu Bao Hung Nguyen Hien D. Nguyen Pham Ngoc Phuong The-Loc Nguyen and 2 more

In recent years, studies on automatic speech recognition (ASR) have shown outstanding results that reach human parity short segments. However, there are still difficulties in standardizing the output of ASR such as capitalization and punctuation restoration for long-speech transcription. The problems obstruct readers to understand semantically also cause natural language processing models NER, POS semantic parsing. this paper, we propose a method restore is based Transformer chunk merging...

10.1109/o-cocosda46868.2019.9041202 article EN 2019-10-01

QUALITY OF LIFE OF TYPE 2 DIABETES MELLITUS PATIENTS AT MY PHUOC HOSPITAL

OPENALEX - Publications

Thi Anh Nguyen Ninh T. Nguyen Thi Thanh Thuong Nguyen Quoc Truong Sirenda Vong and 1 more

Objectives: This study evaluated the quality of life and associated factors among patients with diabetes mellitus type 2 in My Phuoc Hospital 2024. Subjects methods: A cross-sectional design was conducted, Vietnamese Asian Diabetes Quality Life Version (AsianDQOL) used for data collection 151 participants. Results: The mean score AsianDQOL respondents 55.2 (SD 15.4). majority have a moderate level (60.9%). At same time, 6.6% had good life, while rate participants who poor 32.5%. average...

10.51298/vmj.v550i1.13869 article EN Tạp chí Y học Việt Nam 2025-04-29

Preserving Word-Level Emphasis in Speech-to-Speech Translation

OPENALEX - Publications

Quoc Truong Tomoki Toda Graham Neubig Sakriani Sakti Satoshi Nakamura

Speech-to-speech translation (S2ST) is a technology that translates speech across languages, which can remove barriers in cross-lingual communication. In the conventional S2ST systems, linguistic meaning of was translated, but paralinguistic information conveying other features such as emotion or emphasis were ignored. this paper, we propose method to translate information, specifically focusing on emphasis. The consists series components accurately using all acoustic speech. First,...

10.1109/taslp.2016.2643280 article EN IEEE/ACM Transactions on Audio Speech and Language Processing 2016-12-21

Learning a Lexicon and Translation Model from Phoneme Lattices

OPENALEX - Publications

Oliver Adams Graham Neubig Trevor Cohn Steven Bird Quoc Truong and 1 more

Language documentation begins by gathering speech.Manual or automatic transcription at the word level is typically not possible because of absence an orthography prior lexicon, and though manual phonemic possible, it prohibitively slow.On other hand, translations minority language into a major are more easily acquired.We propose method to harness such improve phoneme recognition.The assumes no lexicon translation model, instead learning them from lattices speech being transcribed.Experiments...

10.18653/v1/d16-1263 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2016-01-01

Sequence-to-Sequence Models for Emphasis Speech Translation

OPENALEX - Publications

Quoc Truong Sakriani Sakti Satoshi Nakamura

Speech-to-speech translation (S2ST) systems are capable of breaking language barriers in cross-lingual communication by translating speech across languages. Recent studies have introduced many improvements that allow existing S2ST to handle not only linguistic meaning but also paralinguistic information such as emphasis proposing additional estimation and components. However, the approach used for is optimal sequence tasks fails easily long-term dependencies words levels. It requires...

10.1109/taslp.2018.2846402 article EN IEEE/ACM Transactions on Audio Speech and Language Processing 2018-06-11

Toward Expressive Speech Translation: A Unified Sequence-to-Sequence LSTMs Approach for Translating Words and Emphasis

OPENALEX - Publications

Quoc Truong Sakriani Sakti Satoshi Nakamura

10.21437/interspeech.2017-896 article EN Interspeech 2022 2017-08-16

Preserving word-level emphasis in speech-to-speech translation using linear regression HSMMs

OPENALEX - Publications

Quoc Truong Shinnosuke Takamichi Sakriani Sakti Graham Neubig Tomoki Toda and 1 more

In speech, emphasis is an important type of paralinguistic information that helps convey the focus utterance, new information, and emotion. If can be incorporated into a speech-to-speech (S2S) translation system, it will possible to this across language barrier. However, previous related work focuses only on particular prosodic features, such as F0, or works with but extremely small vocabularies, 10 digits. paper, we describe S2S method able translate languages consider multiple features...

10.21437/interspeech.2015-727 article EN Interspeech 2022 2015-09-06

Transferring Emphasis in Speech Translation Using Hard-Attentional Neural Network Models

OPENALEX - Publications

Quoc Truong Sakriani Sakti Graham Neubig Satoshi Nakamura

10.21437/interspeech.2016-898 article EN Interspeech 2022 2016-08-29

Japanese-English Code-Switching Speech Data Construction

OPENALEX - Publications

Sahoko Nakayama Takatomo Kano Quoc Truong Sakriani Sakti Satoshi Nakamura

As the number of Japanese-English bilingual speakers continues to increase, code-switching phenomena also happen more frequently. The units and locations switches may vary widely from single word whole phrases (beyond length loanword units). Therefore, speech recognition systems must be developed that can handle not only Japanese or English but code-switching. Consequently, a large-scale database is required for model training. But collecting natural conversation dialogues data both...

10.1109/icsda.2018.8693044 article EN 2018-05-01

A Hybrid System for Continuous Word-Level Emphasis Modeling Based on HMM State Clustering and Adaptive Training

OPENALEX - Publications

Quoc Truong Tomoki Toda Graham Neubig Sakriani Sakti Satoshi Nakamura

10.21437/interspeech.2016-930 article EN Interspeech 2022 2016-08-29

The NAIST ASR system for the 2015 Multi-Genre Broadcast challenge: On combination of deep learning systems using a rank-score function

OPENALEX - Publications

Quoc Truong Michael Heck Sakriani Sakti Graham Neubig Tomoki Toda and 1 more

The Multi-Genre Broadcast challenge is an official of the IEEE Automatic Speech Recognition and Understanding Workshop. This paper presents NAISTs contribution to premiere this challenge. presented speech-to-text system for English makes use various front-ends (e.g., MFCC, i-vector FBANK), DNN acoustic models several language decoding rescoring (N-gram, RNNLM). Subsets training data with varying sizes were evaluated respect overall quality. Two speech segmentation systems developed...

10.1109/asru.2015.7404858 article EN 2015-12-01

DEVELOPMENT OF HIGH-PERFORMANCE AND LARGE-SCALE VIETNAMESE AUTOMATIC SPEECH RECOGNITION SYSTEMS

OPENALEX - Publications

Quoc Truong Pham Ngoc Phuong Tran Hoang Tung Lương Chi

Automatic Speech Recognition (ASR) systems convert human speech into the corresponding transcription automatically. They have a wide range of applications such as controlling robots, call center analytics, voice chatbot. Recent studies on ASR for English achieved performance that surpasses ability. The were trained large amount training data and performed well under many environments. With regards to Vietnamese, there been improving existing systems, however, them are conducted small-scaled...

10.15625/1813-9663/34/4/13165 article EN Journal of Computer Science and Cybernetics 2019-01-30

Multi-Modal Multi-Task Deep Learning For Speaker And Emotion Recognition Of TV-Series Data

OPENALEX - Publications

Sashi Novitasari Quoc Truong Sakriani Sakti Dessi Puji Lestari Satoshi Nakamura

Since paralinguistic aspects must be considered to understand speech, we construct a deep learning framework that utilizes multi-modal features simultaneously recognize both speakers and emotions. There are three kinds of feature modalities: acoustic, lexical, facial. To fuse the from multiple modalities, experimented on methods: majority voting, concatenation, hierarchical fusion. The recognition was done TV-series dataset simulate actual conversations.

10.1109/icsda.2018.8693020 article EN 2018-05-01

Vietnamese recognition using tonal phoneme based on multi space distribution

OPENALEX - Publications

Nguyễn Văn Huy Lương Chi Vũ Tất Thắng Quoc Truong

This paper presents an approach of Multi Space Distribution Hidden Markov Model (MSD-HMM) for Vietnamese recognition. An MSD-HMM prototype with four independent streams is proposed modeling the phonemes which embedded tonal information corresponding to its syllable. These are built by adding symbol each phoneme syllables based on International Phonetic Alphabet (IPA). improves 2.49% accuracy compared baseline system. A process feature extraction that suitable also described. The result shows...

10.15625/1813-9663/30/1/3553 article EN Journal of Computer Science and Cybernetics 2014-04-16

WFST-based structural classification integrating dnn acoustic features and RNN language features for speech recognition

OPENALEX - Publications

Quoc Truong Satoshi Nakamura Marc Delcroix Takaaki Hori

This paper proposes a method to train Weighted Finite State Transducer (WFST) based structural classifiers using deep neural network (DNN) acoustic features and recurrent (RNN) language for speech recognition. Structural classification is an effective approach achieve highly accurate recognition of structured data in which the classifier optimized maximize discriminative performance different kinds features. A WFST-based classifier, can integrate acoustic, pronunciation, embedded composed...

10.1109/icassp.2015.7178914 article EN 2015-04-01

Toward Multi-Features Emphasis Speech Translation: Assessment of Human Emphasis Production and Perception with Speech and Text Clues

OPENALEX - Publications

Quoc Truong Sakriani Sakti Satoshi Nakamura

Emphasis is an important factor of human speech that helps convey emotion and the focused information utterances. Recently, studies have been conducted on speech-to-speech translation to preserve emphasis from source language target language. However, since different cultures various ways expressing emphasis, just considering acoustic-to-acoustic feature may not always reflect experiences users. On other hand, can be expressed at levels in both text speech. it remains unclear how we...

10.1109/slt.2018.8639641 article EN 2022 IEEE Spoken Language Technology Workshop (SLT) 2018-12-01

A high quality and phonetic balanced speech corpus for Vietnamese

OPENALEX - Publications

Pham Ngoc Phuong Quoc Truong Luong Chi

This paper presents a high quality Vietnamese speech corpus that can be used for analyzing characteristic as well building synthesis models. The consists of 5400 clean-speech utterances spoken by 12 speakers including 6 males and females. is designed with phonetic balanced in mind so it synthesis, especially, adaptation approaches. Specifically, all utter common dataset contains 250 sentences. To increase the variety context, each speaker also utters another 200 non-shared, phonetic-balanced...

10.48550/arxiv.1904.05569 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Collection and analysis of a Japanese-English emphasized speech corpora

OPENALEX - Publications

Quoc Truong Graham Neubig Sakriani Sakti Tomoki Toda Satoshi Nakamura

Speech-to-speech (S2S) translation [10] is gradually starting to break down the language barrier, bringing opportunities for people understand each other using different languages. However, one of limitations current S2S systems that they usually do not translate paralinguistic information included in input speech. Among various types information, we focus on emphasis, a type used convey sentence, emotion speaker, or high level useful communication. This paper describes collection an...

10.1109/icsda.2014.7051424 article EN 2014-09-01

AdapITN: A Fast, Reliable, and Dynamic Adaptive Inverse Text Normalization

OPENALEX - Publications

Thai Binh Nguyen Le Duc Minh Nhat Quang Minh Nguyen Quoc Truong Chi Mai Luong and 1 more

Inverse text normalization (ITN) is the task that transforms in spoken-form into written-form. While automatic speech recognition (ASR) produces spoken-form, human and natural language understanding systems prefer to consume ITN generally deals with semiotic phrases (e.g., numbers, date, time). However, lack of studies deal phonetization phrases, which ASR's output when it handles unseen data foreign-named entities, domain names), although these exist same form text. The reason are infinite...

10.1109/icassp49357.2023.10094599 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023-05-05