- Speech and Audio Processing
- Speech Recognition and Synthesis
- Advanced Adaptive Filtering Techniques
- Music and Audio Processing
- Phonetics and Phonology Research
- Voice and Speech Disorders
- Blind Source Separation Techniques
- Image and Signal Denoising Methods
- Advanced Data Compression Techniques
- Hearing Loss and Rehabilitation
- Emotion and Mood Recognition
- EEG and Brain-Computer Interfaces
- Stuttering Research and Treatment
- Infant Health and Development
- Digital Filter Design and Implementation
- Underwater Acoustics Research
- Speech and dialogue systems
- Autism Spectrum Disorder Research
- Structural Health Monitoring Techniques
- Traumatic Brain Injury Research
- Neuroscience and Music Perception
- Ultrasonics and Acoustic Wave Propagation
- Language Development and Disorders
- Neural Networks and Applications
- Neural dynamics and brain function
MIT Lincoln Laboratory
2016-2025
Massachusetts Institute of Technology
2011-2024
Harvard University
2018-2024
Google (United States)
2010-2022
Harvard University Press
2022
Spaulding Rehabilitation Hospital
2020
Boston University
2020
United States International Trade Commission
1983
A sinusoidal model for the speech waveform is used to develop a new analysis/synthesis technique that characterized by amplitudes, frequencies, and phases of component sine waves. These parameters are estimated from short-time Fourier transform using simple peak-picking algorithm. Rapid changes in highly resolved spectral components tracked concept "birth" "death" underlying For given frequency track cubic function unwrap interpolate phase such maximally smooth. This applied sine-wave...
An efficient solution to the fundamental problem of estimating time-varying amplitude envelope and instantaneous frequency a real-valued signal that has both an AM FM structure is provided. Nonlinear combinations outputs from energy operator are used separate its output product into components. The theoretical analysis done first for continuous-time signals. Then several algorithms developed compared discrete-time AM-FM These separation search modulations in speech resonances, which modeled...
It is shown that the nonlinear energy-tracking signal operator Psi (x)=(dx/dt)/sup 2/-xd/sup 2/x/dt/sup 2/ and its discrete-time counterpart can estimate AM FM modulating signals. Specifically, approximately amplitude envelope of signals instantaneous frequency Bounds are derived for approximation errors, which negligible under general realistic conditions. These results, coupled with simplicity , establish usefulness energy demodulation. ideas then extended to a more class sine waves...
Major depressive disorder (MDD) is known to result in neurophysiological and neurocognitive changes that affect control of motor, linguistic, cognitive functions. MDD's impact on these processes reflected an individual's communication via coupled mechanisms: vocal articulation, facial gesturing choice content convey a dialogue. In particular, MDD-induced are associated with decline dynamics coordination speech motor control, while influence dialogue semantics. this paper, biomarkers derived...
An automatic technique for estimating and modeling the glottal flow derivative source waveform from speech, applying model parameters to speaker identification, is presented. The estimate of decomposed into coarse structure, representing general shape, fine comprising aspiration other perturbations in flow, which are obtained. estimated using an inverse filter determined within a time interval vocal-fold closure that identified through differences formant frequency modulation during open...
This paper discusses the design of two-dimensional (2-D) linear-phase FIR digital filters by transformations one-demensional (l-D) filters, using a technique first presented McClellan. His original are generalized and several algorithms for transformations. Examples included to demonstrate, versatility method.
In this paper, a signal is shown to be uniquely represented by the magnitude of its short-time Fourier transform (STFT) under mild restrictions on and analysis window STFT. Furthermore, various algorithms are developed which reconstruct from appropriate samples STFT magnitude. Several can also used obtain estimates processed magnitude, generally does not have valid structure. These successfully applied time-scale modification noise reduction problems in speech processing. Finally, results...
The simplified linear model of speech production predicts that when the rate articulation is changed, resulting waveform takes on appearance original, except for a change in time scale. A time-scale modification system preserves this shape-invariance property during voicing developed. This done using version sinusoidal analysis-synthesis models and independently modifies phase contributions vocal tract cord excitation. An important its ability to perform time-varying rates change. Extensions...
This paper develops a multiband or wavelet approach for capturing the AM-FM components of modulated signals immersed in noise. The technique utilizes recently-popularized nonlinear energy operator Psi (s)=(s)/sup 2/-ss to isolate energy, and an separation algorithm (ESA) extract instantaneous amplitudes frequencies. It is demonstrated that performance operator/ESA vastly improved if signal first filtered through bank bandpass filters, at each instant analyzed (via ESA) using dominant local...
In individuals with major depressive disorder, neurophysiological changes often alter motor control and thus affect the mechanisms controlling speech production facial expression. These are typically associated psychomotor retardation, a condition marked by slowed neuromotor output that is behaviorally manifested as altered coordination timing across multiple motor-based properties. Changes in outputs can be inferred from vocal acoustics movements speak. We derive novel multi-scale...
In this paper a new speech analysis/synthesis technique is presented which provides the basis for general class of transformations including time-scale modification, frequency scaling, and pitch modification. These modifications can be performed with time-varying change, permitting continuous adjustment speaker's fundamental rate articulation. The method based on sinusoidal representation production mechanism has been shown to produce synthetic that preserves wave-form shape perceptually...
In Major Depressive Disorder (MDD), neurophysiologic changes can alter motor control [1, 2] and therefore speech production by influencing the characteristics of vocal source, tract, prosodics. Clinically, many these are associated with psychomotor retardation, where a patient shows sluggishness disorder in articulation, affecting coordination across multiple aspects [3, 4]. this paper, we exploit such effects selecting features that reflect tract motion MDD. Specifically, investigate...
Abstract Auditory attention decoding (AAD) through a brain-computer interface has had flowering of developments since it was first introduced by Mesgarani and Chang (2012) using electrocorticograph recordings. AAD been pursued for its potential application to hearing-aid design in which an attention-guided algorithm selects, from multiple competing acoustic sources, should be enhanced the listener suppressed. Traditionally, researchers have separated problem into two stages: reconstruction...
1 A hypothesis in characterizing human depression is that change the brain‟s basal ganglia results a decline of motor coordination [6][8][14]. Such neuro-physiological may therefore affect laryngeal control and dynamics. Under this hypothesis, toward goal objective monitoring severity, we investigate vocal-source biomarkers for depression; specifically, source features relate to precision control, including vocal-fold shimmer jitter, degree aspiration, fundamental frequency dynamics,...
Goal: We propose a speech modeling and signal-processing framework to detect track COVID-19 through asymptomatic symptomatic stages. Methods: The approach is based on complexity of neuromotor coordination across subsystems involved in respiration, phonation articulation, motivated by the distinct nature involving lower (i.e., bronchial, diaphragm, tracheal) versus upper laryngeal, pharyngeal, oral nasal) respiratory tract inflammation, as well growing evidence virus' neurological...
When exposed to continuous high-level noise, cochlear neurons are more susceptible damage than hair cells (HCs): exposures causing temporary threshold shifts (TTS) without permanent HC can destroy ribbon synapses, permanently silencing the they formerly activated. While this "hidden hearing loss" has little effect on thresholds in quiet, neural degeneration degrades noise and may be an important elicitor of tinnitus. Similar sensory pathologies seen after blast injury, even if shift (PTS) is...
Goal: The aim of the study herein reported was to review mobile health (mHealth) technologies and explore their use monitor mitigate effects COVID-19 pandemic. Methods: A Task Force assembled by recruiting individuals with expertise in electronic Patient-Reported Outcomes (ePRO), wearable sensors, digital contact tracing technologies. Its members collected discussed available information summarized it a series reports. Results: identified that could be deployed response pandemic would likely...
Purpose: Over the past decade, signal processing and machine learning literature has demonstrated notable advancements in automated speech with use of artificial intelligence for medical assessment monitoring (e.g., depression, dementia, Parkinson's disease, among others). Meanwhile, clinical identified several interpretable, theoretically motivated measures that are sensitive to abnormalities cognitive, linguistic, affective, motoric, anatomical domains. Both fields have, thus,...
In this paper, we develop iterative algorithms for reconstructing a minimum phase sequence from the or magnitude of its Fourier transform. These solutions involve repeatedly imposing causality constraint in time domain and incorporating known function frequency domain. This approach is basis new means computing Hilbert transform log-magnitude which does not require unwrapping. Finally, discuss potential use computation determining samples unwrapped mixed sequence.
Iterative algorithms for signal reconstruction from partial time- and frequency-domain knowledge have proven useful in a number of application areas. In this paper, general convergence proof, applicable to class such iterative algorithms, is presented. The proof relies on the concept nonexpansive mapping both time frequency domains. Two examples studied detail are time-limited extrapolation (equivalently, band-limited extrapolation) phase-only reconstruction. iteration new result obtained by...
In prior work, a manually derived measure of vocal fold vibratory phase asymmetry correlated to varying degrees with visual judgments made from laryngeal high-speed videoendoscopy (HSV) recordings. This investigation extended this work by establishing an automated HSV-based framework quantify 3 categories asymmetry.