NFDI4DS | UHH-SEMS - Publication Details

Andrew Rosenberg

ORCID: 0000-0003-1780-4390

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5102902866

Research Areas

Speech Recognition and Synthesis
Natural Language Processing Techniques
Speech and dialogue systems
Topic Modeling
Phonetics and Phonology Research
Speech and Audio Processing
Music and Audio Processing
Deception detection and forensic psychology
Sentiment Analysis and Opinion Mining
User Authentication and Security Systems
Evolutionary Algorithms and Applications
Text and Document Classification Technologies
Linguistic Variation and Morphology
Language, Metaphor, and Cognition
Metaheuristic Optimization Algorithms Research
Advanced Text Analysis Techniques
Digital Communication and Language
Video Analysis and Summarization
Cardiac, Anesthesia and Surgical Outcomes
Humor Studies and Applications
Meta-analysis and systematic reviews
Algorithms and Data Compression
Advanced Multi-Objective Optimization Algorithms
Complex Network Analysis Techniques
Handwritten Text Recognition Techniques

Google (United States)
2019-2025

NYU Langone Health
2024

IT University of Copenhagen
2023

Tokyo Institute of Technology
2023

Administration for Community Living
2023

American Jewish Committee
2023

University of Michigan
2004-2019

New York University
2019

The Graduate Center, CUNY
2011-2018

IBM (United States)
2016-2018

Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning

OPENALEX - Publications

Yu Zhang Ron J. Weiss Heiga Zen Yonghui Wu Zhifeng Chen and 4 more

We present a multispeaker, multilingual text-to-speech (TTS) synthesis model based on Tacotron that is able to produce high quality speech in multiple languages.Moreover, the transfer voices across languages, e.g.synthesize fluent Spanish using an English speaker's voice, without training any bilingual or parallel examples.Such works distantly related e.g.English and Mandarin.Critical achieving this result are: 1. phonemic input representation encourage sharing of capacity 2. incorporating...

10.21437/interspeech.2019-2668 article EN Interspeech 2022 2019-09-13

AutoBI - a tool for automatic toBI annotation

OPENALEX - Publications

Andrew Rosenberg

This paper describes the AuToBI tool for automatic generation of hypothesized ToBI labels. While research on prosodic annotation has been conducted many years, represents first publicly available to automatically detect and classify breaks tones that make up standard. feature extraction routines as well classifiers used events Additionally, we report performance evaluating models trained Boston Directions Corpus Columbia Games Corpus. By distinct speakers domains recording conditions, this...

10.21437/interspeech.2010-71 article EN Interspeech 2022 2010-09-26

Charisma perception from text and speech

OPENALEX - Publications

Andrew Rosenberg Julia Hirschberg

10.1016/j.specom.2008.11.001 article EN Speech Communication 2008-11-20

Speech Recognition with Augmented Synthesized Speech

OPENALEX - Publications

Andrew Rosenberg Yu Zhang Bhuvana Ramabhadran Jia Ye Pedro J. Moreno and 2 more

Recent success of the Tacotron speech synthesis architecture and its variants in producing natural sounding multi-speaker synthesized has raised exciting possibility replacing expensive, manually transcribed, domain-specific, human that is used to train recognizers. The can learn latent embedding spaces prosody, speaker style variations derived from input acoustic representations thereby allowing for manipulation speech. In this paper, we evaluate feasibility enhancing recognition...

10.1109/asru46091.2019.9003990 article EN 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2019-12-01

End-to-End ASR-Free Keyword Search From Speech

OPENALEX - Publications

Kartik Audhkhasi Andrew Rosenberg Abhinav Sethy Bhuvana Ramabhadran Brian Kingsbury

End-to-end (E2E) systems have achieved competitive results compared to conventional hybrid hidden Markov model (HMM)-deep neural network based automatic speech recognition (ASR) systems. Such E2E are attractive due the lack of dependence on alignments between input acoustic and output grapheme or HMM state sequence during training. This paper explores design an ASR-free end-to-end system for text query-based keyword search (KWS) from trained with minimal supervision. Our KWS consists three...

10.1109/jstsp.2017.2759726 article EN publisher-specific-oa IEEE Journal of Selected Topics in Signal Processing 2017-10-05

Generating Diverse and Natural Text-to-Speech Samples Using a Quantized Fine-Grained VAE and Autoregressive Prosody Prior

OPENALEX - Publications

Guangzhi Sun Zhang Yu Ron J. Weiss Yuan Cao Heiga Zen and 3 more

Recent neural text-to-speech (TTS) models with fine-grained latent features enable precise control of the prosody synthesized speech. Such typically incorporate a variational autoencoder (VAE) structure, extracting at each input token (e.g., phonemes). However, generating samples standard VAE prior often results in unnatural and discontinuous speech, dramatic prosodic variation between tokens. This paper proposes sequential discrete space which can generate more naturally sounding samples....

10.1109/icassp40776.2020.9053436 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020-04-09

Joint Modeling of Accents and Acoustics for Multi-Accent Speech Recognition

OPENALEX - Publications

Xuesong Yang Kartik Audhkhasi Andrew Rosenberg Samuel Thomas Bhuvana Ramabhadran and 1 more

The performance of automatic speech recognition systems degrades with increasing mismatch between the training and testing scenarios. Differences in speaker accents are a significant source such mismatch. traditional approach to deal multiple involves pooling data from several during building single model multi-task fashion, where tasks correspond individual accents. In this paper, we explore an alternate jointly learn accent classifier acoustic model. Experiments on American English Wall...

10.1109/icassp.2018.8462557 preprint EN 2018-04-01

Improving Speech Recognition Using Consistent Predictions on Synthesized Speech

OPENALEX - Publications

Gary Wang Andrew Rosenberg Zhehuai Chen Yu Zhang Bhuvana Ramabhadran and 2 more

Speech synthesis has advanced to the point of being close indistinguishable from human speech. However, efforts train speech recognition systems on synthesized utterances have not been able show that data can be effectively used augment or replace In this work, we demonstrate promoting consistent predictions in response real and enables significantly improved performance. We also find training 460 hours LibriSpeech augmented with 500 transcripts (without audio) performance is within 0.2% WER...

10.1109/icassp40776.2020.9053831 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020-04-09

Acoustic/prosodic and lexical correlates of charismatic speech

OPENALEX - Publications

Andrew Rosenberg Julia Hirschberg

Charisma, the ability to command authority on basis of personal qualities, is more difficult define than identify.How do charismatic leaders such as Fidel Castro or Pope John Paul II attract and retain their followers?We present results an analysis subjective ratings charisma from a corpus American political speech.We identify associations between other attributes.We also examine acoustic/prosodic lexical features this speech correlate these with ratings.

10.21437/interspeech.2005-329 article EN Interspeech 2022 2005-09-04

Classifying skewed data: importance weighting to optimize average recall

OPENALEX - Publications

Andrew Rosenberg

Promoted in part by its use the Interspeech Challenges 2009-2012, Average Recall has emerged as an attractive evaluation measure of classifier performance where data a skewed class distribution. In this paper, we show that importance weighting can be used to optimize directly. We compare approach sampling techniques have been previously classify data. demonstrate on 2009 Emotion Challenge tasks, and prosodic analysis tasks.

10.21437/interspeech.2012-131 article EN Interspeech 2022 2012-09-09

End-to-end speech recognition and keyword search on low-resource languages

OPENALEX - Publications

Andrew Rosenberg Kartik Audhkhasi Abhinav Sethy Bhuvana Ramabhadran Michael Picheny

In recent years, so-called, "end-to-end" speech recognition systems have emerged as viable alternatives to traditional ASR frameworks. Keyword search, localizing an orthographic query in a corpus, is typically performed by using automatic (ASR) generate index. Previous work has evaluated the use of end-to-end for on well known corpora (WSJ, Switchboard, TIMIT, etc.) high-resource languages like English and Mandarin. this work, we investigate Connectionist Temporal Classification (CTC)...

10.1109/icassp.2017.7953164 article EN 2017-03-01

Cross-Cultural Production and Detection of Deception from Speech

OPENALEX - Publications

Sarah Ita Levitan Guzhen An Mandi Wang Gideon Mendels Julia Hirschberg and 2 more

Detecting deception from different dimensions of human behavior has been a major goal research in psychology and computational linguistics for some years is currently considerable interest to military law enforcement agencies. However, relatively little work done develop automatic methods detect spoken language or compare detection production between cultures. We present results experiments on new corpus deceptive non-deceptive speech, collected native speakers Standard American English...

10.1145/2823465.2823468 article EN 2015-11-09

Utilizing linguistically enhanced keystroke dynamics to predict typist cognition and demographics

OPENALEX - Publications

David Guy Brizan Adam Goodkind Patrick Koch Kiran S. Balagani Vir V. Phoha and 1 more

10.1016/j.ijhcs.2015.04.005 article EN International Journal of Human-Computer Studies 2015-05-21

Knowledge distillation across ensembles of multilingual models for low-resource languages

OPENALEX - Publications

Jia Cui Brian Kingsbury Bhuvana Ramabhadran George Saon Tom Sercu and 4 more

This paper investigates the effectiveness of knowledge distillation in context multilingual models. We show that with distillation, Long Short-Term Memory(LSTM) models can be used to train standard feed-forward Deep Neural Network (DNN) for a variety low-resource languages. then examine how agreement between teacher's best labels and original affects student model's performance. Next, we easily applied semi-supervised learning improve model also propose promising data selection method filter...

10.1109/icassp.2017.7953073 article EN 2017-03-01

Bias and Statistical Significance in Evaluating Speech Synthesis with Mean Opinion Scores

OPENALEX - Publications

Andrew Rosenberg Bhuvana Ramabhadran

10.21437/interspeech.2017-479 article EN Interspeech 2022 2017-08-16

Extending Multilingual Speech Synthesis to 100+ Languages without Transcribed Data

OPENALEX - Publications

Takaaki Saeki Gary Wang Nobuyuki Morioka Isaac Elias Kyle Kastner and 5 more

Collecting high-quality studio recordings of audio is challenging, which limits the language coverage text-to-speech (TTS) systems. This paper proposes a framework for scaling multilingual TTS model to 100+ languages using found data without supervision. The proposed combines speech-text encoder pretraining with unsupervised training untranscribed speech and unspoken text sources, thereby leveraging massively joint representation learning. Without any transcribed in new language, this can...

10.1109/icassp48485.2024.10448074 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024-03-18

Optimal Head Rotation for Internal Jugular Vein Cannulation When Relying on External Landmarks

OPENALEX - Publications

Jeremy A. Lieberman Kayode Williams Andrew Rosenberg

In Brief External anatomic landmarks have traditionally been used to approximate the location of neck blood vessels optimize central venous cannulation internal jugular vein (IJV) while avoiding common carotid artery (CCA). Head rotation affects vessel orientation, but most landmark techniques do not specify its optimal degree. We simulated catheter insertion via both an anterior and approach right IJV using ultrasound probe held in manner a syringe needle 49 volunteers. Increased head from...

10.1213/01.ane.0000132908.77111.ca article EN Anesthesia & Analgesia 2004-09-22

V-Measure: A conditional entropy-based external cluster evaluation

OPENALEX - Publications

Julia Hirschberg Andrew Rosenberg

10.7916/d80v8n84 article EN 2007-01-01

Speech segmentation and spoken document processing

OPENALEX - Publications

Mari Ostendorf Benoît Favre Ralph Grishman Dilek Hakkani‐Tür Mary P. Harper and 12 more

Progress in both speech and language processing has spurred efforts to support applications that rely on spoken rather than written input. A key challenge moving from text-based documents such is lacks explicit punctuation formatting, which can be crucial for good performance. This article describes different levels of segmentation, approaches automatically recovering segment boundary locations, experimental results demonstrating impact several tasks. The also show a need optimizing...

10.1109/msp.2008.918023 article EN IEEE Signal Processing Magazine 2008-04-23

Detecting pitch accents at the word, syllable and vowel level

OPENALEX - Publications

Andrew Rosenberg Julia Hirschberg

The automatic identification of prosodic events such as pitch accent in English has long been a topic interest to speech researchers, with applications variety spoken language processing tasks. However, much remains be understood about the best methods for obtaining high accuracy detection. We describe experiments examining optimal domain analysis. Specifically, we compare at syllable, vowel or word level domains analysis acoustic indicators accent. Our results indicate that word-based...

10.3115/1620853.1620878 article EN 2009-01-01

The most influential articles in critical care medicine

OPENALEX - Publications

Andrew Rosenberg Ravi Tripathi James M. Blum

10.1016/j.jcrc.2008.12.010 article EN Journal of Critical Care 2009-03-30

A bibliometric search of citation classics in anesthesiology

OPENALEX - Publications

Ravi Tripathi James M. Blum Thomas J. Papadimos Andrew Rosenberg

Articles cited counts are catalogued and help identify landmark papers. This study provides a citation classics of anesthesiology literature using the framework subspecialties to provide review well-developed areas research in anesthesiology.A comprehensive list most-cited articles anesthesia was compiled bibliometric database general search terms such as "anesthesia" well subspecialty-specific terms. Queries were reviewed for relevance practice, categorized by subspecialty, ranked according...

10.1186/1471-2253-11-24 article EN cc-by BMC Anesthesiology 2011-12-01

Automatic recognition of unified Parkinson's disease rating from speech with acoustic, i-vector and phonotactic features

OPENALEX - Publications

Guozhen An David Guy Brizan Min Ma Michelle Morales Ali Syed and 1 more

10.21437/interspeech.2015-185 article EN Interspeech 2022 2015-09-06

Coming Soon ...