- Natural Language Processing Techniques
- Speech Recognition and Synthesis
- Topic Modeling
- Speech and dialogue systems
- Speech and Audio Processing
- Biomedical Text Mining and Ontologies
- Music and Audio Processing
- Phonetics and Phonology Research
- Multi-Agent Systems and Negotiation
- Advanced Data Compression Techniques
- Text and Document Classification Technologies
- Text Readability and Simplification
- Machine Learning in Healthcare
- Blind Source Separation Techniques
- Multimodal Machine Learning Applications
- Misinformation and Its Impacts
- Semantic Web and Ontologies
- Intelligent Tutoring Systems and Adaptive Learning
- Advanced Text Analysis Techniques
- Geographic Information Systems Studies
- Infant Health and Development
- Software Engineering Research
- Language Development and Disorders
- Expert finding and Q&A systems
- Context-Aware Activity Recognition Systems
The Ohio State University
2016-2025
Nationwide Children's Hospital
2018-2020
Amazon (United States)
2018-2019
University of Science and Technology of China
2019
University of Udine
2018-2019
University of Cambridge
2018-2019
Middle East Technical University
2018-2019
University of Illinois Urbana-Champaign
2018-2019
Delft University of Technology
2018-2019
The University of Tokyo
2018-2019
Function words, especially frequently occurring ones such as (the, that, and, and of ), vary widely in pronunciation. Understanding this variation is essential both for cognitive modeling lexical production computer speech recognition synthesis. This study investigates which factors affect the forms function whether they have a fuller pronunciation (e.g., ði, ðæt, ænd, ʌv) or more reduced lenited ðə, ðīt, n, ə). It based on over 8000 occurrences ten most frequent English words 4-h sample...
We present a domain-independent topic segmentation algorithm for multi-party speech. Our feature-based combines knowledge about content using text-based as feature and form linguistic acoustic cues shifts extracted from This uses automatically induced decision rules to combine the different features. The embedded builds on lexical cohesion has performance comparable state-of-the-art algorithms based information. A significant error reduction is obtained by combining two sources.
Training a POS tagging model with crosslingual transfer learning usually requires linguistic knowledge and resources about the relation between source language target language. In this paper, we introduce cross-lingual for without ancillary such as parallel corpora. The proposed utilizes common BLSTM that enables from other languages, private BLSTMs language-specific representations. is trained language-adversarial training bidirectional modeling auxiliary objectives to better represent...
We report progress in the development of a measure speaking rate that is computed from acoustic signal. The newest form our analysis incorporates multiple estimates rate; besides spectral moment for full-band energy envelope we have previously reported, also used pointwise correlation between pairs compressed sub-band envelopes. complete measure, called mrate, has been compared to reference syllable derived manually transcribed subset Switchboard database. with significantly higher than...
The causes of pronunciation reduction in 8458 occurrences ten frequent English function words a four-hour sample from conversations the Switchboard corpus were examined. Using ordinary linear and logistic regression models, we examined length words, form their vowel (basic, full, or reduced), final obstruent deletion. For all these found strong, independent effects speaking rate, predictability, following word, planning problem disfluencies. results bear on issues speech recognition, models...
Automatic speech recognition (ASR) systems suffer from performance degradation under noisy and reverberant conditions. In this work, we explore a deep neural network (DNN) based approach for spectral feature mapping corrupted to clean speech. The DNN substantially reduces interference produces estimated features ASR training decoding. We experiment with several different approaches demonstrate that trained predict log filterbank coefficients spectrogram directly can be extremely effective....
In this paper, we take a step towards jointly modeling automatic speech recognition (STT) and synthesis (TTS) in fully non-autoregressive way. We develop novel multimodal framework capable of handling the text modalities as input either individually or together. The proposed model can also be trained with unpaired data owing to its nature. further propose an iterative refinement strategy improve STT TTS performance our such that partial hypothesis at output fed back model, thus iteratively...
We propose Inference Knowledge Graph, a novel approach of remapping existing, large scale, semantic knowledge graphs into Markov Random Fields in order to create user goal tracking models that could form part spoken dialog system. Since include both entities and their attributes, the proposed method merges dialog-state-tracking attributes database lookup fulfill users' requests one single unified step. Using graph contains all businesses Bellevue, WA, extracted from Microsoft Satori, we...
Introduction: Practicing a medical history using standardized patients is an essential component of school curricula. Recent advances in technology now allow for newer approaches practicing and assessing communication skills. We describe herein virtual patient (VSP) system that allows students to practice their taking skills receive immediate feedback.Methods: Our VSPs consist artificially intelligent, emotionally responsive 3D characters which communicate with natural language. The...
Clinical trials are essential for determining whether new interventions effective. In order to determine the eligibility of patients enroll into these trials, clinical trial coordinators often perform a manual review notes in electronic health record patients. This is very time-consuming and exhausting task. Efforts this process can be expedited if directed toward specific parts text that relevant determination. study, we describe creation dataset used evaluate automated methods capable...
In this paper, we propose to improve end-to-end (E2E) spoken language understand (SLU) in an RNN transducer model (RNN-T) by incorporating a joint self-conditioned CTC automatic speech recognition (ASR) objective. Our proposed is akin E2E differentiable cascaded which performs ASR and SLU sequentially ensure that the task conditioned on having self conditioning. This novel modeling of improves performance significantly over just using optimization. We further aligning acoustic embeddings...
In this paper we describe how discriminative training can be applied to language models for speech recognition. Language are important guide the recognition search, particularly in compensating mistakes acoustic decoding. A frequently used measure of quality is perplexity; however, what more accurate decoding not necessarily having maximum likelihood hypothesis, but rather best separation correct string from competing, acoustically confusible hypotheses. Discriminative help improve purpose...
Automatic Speech Attribute Transcription (ASAT), an ITR project sponsored under the NSF grant (IIS-04-27113), is a cross-institute effort involving Georgia Institute of Technology, The Ohio State University, University California at Berkeley, and Rutgers University. This approaches speech recognition from more linguistic perspective: unlike traditional ASR systems, humans detect acoustic auditory cues, weigh combine them to form theories, then process these cognitive hypotheses until...
Conditional random fields (CRFs) are a statistical framework that has recently gained in popularity both the automatic speech recognition (ASR) and natural language processing communities because of different nature assumptions made predicting sequences labels compared to more traditional hidden Markov model (HMM). In ASR community, CRFs have been employed method similar HMMs, using sufficient statistics input data compute probability label given acoustic input. this paper, we explore...
Automatic Speech Recognition systems suffer from severe performance degradation in the presence of myriad complicating factors such as noise, reverberation, multiple speech sources, recording devices, etc. Previous challenges have sparked much innovation when it comes to designing capable handling these complications. In this spirit, CHiME-3 challenge presents system builders with task recognizing a real-world noisy setting wherein speakers talk an array 6 microphones tablet. order address...
Recently, much work has been devoted to the computation of binary masks for speech segregation. Conventional wisdom in field ASR holds that these cannot be used directly; missing energy significantly affects calculation cepstral features commonly ASR. We show this held belief may a misconception; we demonstrate effectiveness directly using masked data on both small and large vocabulary dataset. In fact, approach, which term direct masking performs comparably two previously proposed feature...
Learning representations for knowledge base entities and concepts is becoming increasingly important NLP applications. However, recent entity embedding methods have relied on structured resources that are expensive to create new domains corpora. We present a distantly-supervised method jointly learning embeddings of text from an unnanotated corpus, using only list mappings between surface forms. learn open-domain biomedical corpora, compare against prior rely human-annotated or large graph...
The second track of the 2014 i2b2 challenge asked participants to automatically identify risk factors for heart disease among diabetic patients using natural language processing techniques clinical notes. This paper describes a rule-based system developed combination regular expressions, concepts from Unified Medical Language System (UMLS), and freely-available resources community. With performance (F1=90.7) that is significantly higher than median (F1=87.20) close top performing (F1=92.8),...