Jiarui Hai

ORCID: 0000-0001-9968-7372
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Speech and Audio Processing
  • Speech Recognition and Synthesis
  • Music and Audio Processing
  • Blind Source Separation Techniques
  • Artificial Intelligence in Healthcare
  • Digital Mental Health Interventions
  • Mental Health via Writing
  • Hearing Loss and Rehabilitation
  • Misinformation and Its Impacts
  • Machine Learning in Healthcare
  • Vaccine Coverage and Hesitancy
  • Health Literacy and Information Accessibility
  • Mental Health Research Topics
  • Opinion Dynamics and Social Influence
  • Meteorological Phenomena and Simulations
  • Electronic Health Records Systems
  • Complex Network Analysis Techniques
  • Artificial Intelligence in Healthcare and Education
  • COVID-19 epidemiological studies
  • Sentiment Analysis and Opinion Mining
  • Flood Risk Assessment and Management
  • Underwater Acoustics Research
  • Influenza Virus Research Studies
  • Advanced Text Analysis Techniques
  • Precipitation Measurement and Analysis

Johns Hopkins University
2022-2025

Tsinghua University
2021-2023

Institute of Atmospheric Physics
2023

Chinese Academy of Sciences
2023

University of Chinese Academy of Sciences
2023

Kuaishou (China)
2022

Abstract Objective To develop and apply a natural language processing (NLP)-based approach to analyze public sentiments on social media their geographic pattern in the United States toward coronavirus disease 2019 (COVID-19) vaccination. We also aim provide insights facilitate understanding of attitudes concerns regarding COVID-19 Methods collected Tweet posts by residents after dissemination vaccine. performed sentiment analysis based Bidirectional Encoder Representations from Transformers...

10.1093/jamiaopen/ooad023 article EN cc-by JAMIA Open 2023-04-06

ABSTRACT Objective This scoping review aims to identify and understand the role of artificial intelligence in application integrated electronic health records (EHRs) patient-generated data (PGHD) care, including clinical decision support, care quality, patient safety. We focused on that combined PGHD EHR data, we investigated (AI) care. Methods used Preferred Reporting Items for Systematic Reviews Meta-Analyses (PRISMA) guidelines search articles six databases: PubMed, Embase, Web Science,...

10.1101/2024.05.01.24306690 preprint EN cc-by-nc-nd medRxiv (Cold Spring Harbor Laboratory) 2024-05-03

10.1109/icassp49660.2025.10890066 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

10.1109/icassp49660.2025.10889119 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

The emerging health technologies and digital services provide effective ways of collecting information gathering patient-generated data (PGHD), which a more holistic view patient's quality life over time, increase visibility into adherence to treatment plan or study protocol, enable timely intervention before costly care episode.Through national cross-sectional survey in the United States, we aimed describe compare characteristics populations with without mental issues (depression anxiety...

10.2196/30898 article EN cc-by Journal of Medical Internet Research 2022-04-29

This study aims to propose a novel approach for enhancing clinical prediction models by combining structured and unstructured data with multimodal fusion. We presented comprehensive framework that integrated sources, including textual notes, electronic health records (EHRs), relevant from National Electronic Injury Surveillance System (NEISS) datasets. proposed hybrid fusion method, which incorporated state-of-the-art pre-trained language model, integrate text EHR other thereby capturing...

10.1101/2023.08.24.23294597 preprint EN cc-by-nc medRxiv (Cold Spring Harbor Laboratory) 2023-08-25

Auditory Attention Decoding (AAD) algorithms play a crucial role in isolating desired sound sources within challenging acoustic environments directly from brain activity. Although recent research has shown promise AAD using shallow representations such as auditory envelope and spectrogram, there been limited exploration of deep Self-Supervised (SS) on larger scale. In this study, we undertake comprehensive investigation into the performance linear decoders across 12 2 representations,...

10.1109/icassp48485.2024.10448271 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024-03-18

Latent diffusion models have shown promising results in text-to-audio (T2A) generation tasks, yet previous encountered difficulties quality, computational cost, sampling, and data preparation. In this paper, we introduce EzAudio, a transformer-based T2A model, to handle these challenges. Our approach includes several key innovations: (1) We build the model on latent space of 1D waveform Variational Autoencoder (VAE), avoiding complexities handling 2D spectrogram representations using an...

10.48550/arxiv.2409.10819 preprint EN arXiv (Cornell University) 2024-09-16

Precipitation nowcasting is a crucial element in current weather service systems. Data-driven methods have proven highly advantageous, due to their flexibility utilizing detailed initial hydrometeor observations, and capability approximate meteorological dynamics effectively given sufficient training data. However, data-driven often encounter severe approximation/optimization errors, rendering predictions associated uncertainty estimates unreliable. Here we develop probabilistic diffusion...

10.22541/essoar.169945499.97460779/v1 preprint EN Authorea (Authorea) 2023-11-08

Social network data often contain missing values because of the sensitive nature information collected and dependency among actors. As a response, imputation methods including simple ones constructed from structural characteristics more complicated model-based have been developed. Although past studies explored influence on social networks effectiveness procedures in many conditions, current study aims to evaluate extensive set eight techniques (i.e., null-tie, Reconstruction, Preferential...

10.6339/22-jds1045 article EN cc-by Journal of Data Science 2022-04-20

Common target sound extraction (TSE) approaches primarily relied on discriminative in order to separate the while minimizing interference from unwanted sources, with varying success separating background. This study introduces DPM-TSE, a generative method based diffusion probabilistic modeling (DPM) for Target Sound Extraction (TSE), achieve both cleaner renderings as well improved separability sounds. The technique also tackles noise floor of DPM by introducing correction schedules and...

10.1109/icassp48485.2024.10447219 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024-03-18

Speech separation, the task of isolating multiple speech sources from a mixed audio signal, remains challenging in noisy environments. In this paper, we propose generative correction method to enhance output discriminative separator. By leveraging corrector based on diffusion model, refine separation process for single-channel mixture by removing noises and perceptually unnatural distortions. Furthermore, optimize model using predictive loss streamline model's reverse into single step...

10.48550/arxiv.2406.07461 preprint EN arXiv (Cornell University) 2024-06-11

Generative voice technologies are rapidly evolving, offering opportunities for more personalized and inclusive experiences. Traditional one-shot conversion (VC) requires a target recording during inference, limiting ease of usage in generating desired timbres. Text-guided generation offers an intuitive solution to convert voices "DreamVoices" according the users' needs. Our paper presents two major contributions VC technology: (1) DreamVoiceDB, robust dataset timbre annotations 900 speakers...

10.48550/arxiv.2406.16314 preprint EN arXiv (Cornell University) 2024-06-24

Generative voice technologies are rapidly evolving, offering opportunities for more personalized and inclusive experiences. Traditional one-shot conversion (VC) requires a target recording during inference, limiting ease of usage in generating desired timbres. Text-guided generation offers an intuitive solution to convert voices "DreamVoices" according the users' needs. Our paper presents two major contributions VC technology: (1) DreamVoiceDB, robust dataset timbre annotations 900 speakers...

10.21437/interspeech.2024-1432 article EN Interspeech 2022 2024-09-01

Speech separation, the task of isolating multiple speech sources from a mixed audio signal, remains challenging in noisy environments. In this paper, we propose generative correction method to enhance output discriminative separator. By leveraging corrector based on diffusion model, refine separation process for single-channel mixture by removing noises and perceptually unnatural distortions. Furthermore, optimize model using predictive loss streamline model's reverse into single step...

10.21437/interspeech.2024-327 article EN Interspeech 2022 2024-09-01

In this paper, we introduce SoloAudio, a novel diffusion-based generative model for target sound extraction (TSE). Our approach trains latent diffusion models on audio, replacing the previous U-Net backbone with skip-connected Transformer that operates features. SoloAudio supports both audio-oriented and language-oriented TSE by utilizing CLAP as feature extractor sounds. Furthermore, leverages synthetic audio generated state-of-the-art text-to-audio training, demonstrating strong...

10.48550/arxiv.2409.08425 preprint EN arXiv (Cornell University) 2024-09-12

In this paper, we introduce SSR-Speech, a neural codec autoregressive model designed for stable, safe, and robust zero-shot text-based speech editing text-to-speech synthesis. SSR-Speech is built on Transformer decoder incorporates classifier-free guidance to enhance the stability of generation process. A watermark Encodec proposed embed frame-level watermarks into edited regions so that which parts were can be detected. addition, waveform reconstruction leverages original unedited segments,...

10.48550/arxiv.2409.07556 preprint EN arXiv (Cornell University) 2024-09-11

ABSTRACT Objective To develop and apply a natural language processing (NLP) – based approach to analyze public sentiments on social media their geographic pattern in the United States toward COVID-19 vaccination. We also provide insights facilitate understanding of attitudes concerns regarding Methods collected Tweet posts by residents after official dissemination vaccine. performed sentiment analysis Bidirectional Encoder Representations from Transformers (BERT) qualitative content...

10.1101/2022.08.26.22279278 preprint EN cc-by-nc-nd medRxiv (Cold Spring Harbor Laboratory) 2022-08-30

ABSTRACT Objective To describe and compare characteristics of the population with without mental health issues (depression or anxiety disorder), including physical health, sleep, alcohol use. We also examined patterns social networking service use, patient-generated data on digital platforms, information sharing attitudes activities. Methods drew from National Cancer Institute’s 2019 Health Information Trends Survey (HINTS). Participants were divided into two groups by status. Then, we...

10.1101/2021.06.11.21258777 preprint EN cc-by medRxiv (Cold Spring Harbor Laboratory) 2021-06-18

<sec> <title>BACKGROUND</title> The emerging health technologies and digital services provide effective ways of collecting information gathering patient-generated data (PGHD), which a more holistic view patient’s quality life over time, increase visibility into adherence to treatment plan or study protocol, enable timely intervention before costly care episode. </sec> <title>OBJECTIVE</title> Through national cross-sectional survey in the United States, we aimed describe compare...

10.2196/preprints.30898 preprint EN 2021-06-02

Pitch correction is the process of adjusting original pitch a recording or live performance in order to fit it specific key match target profile. systems typical consist several stages: estimation, curve modification, and resynthesis audio with curve. Unfortunately, often leads significant artifacts that degrade overall quality modified audio, rendering unnatural unpleasant. In this work, we introduce Diff-Pitcher <sup xmlns:mml="http://www.w3.org/1998/Math/MathML"...

10.1109/waspaa58266.2023.10248127 article EN 2023-09-15

Common target sound extraction (TSE) approaches primarily relied on discriminative in order to separate the while minimizing interference from unwanted sources, with varying success separating background. This study introduces DPM-TSE, a first generative method based diffusion probabilistic modeling (DPM) for extraction, achieve both cleaner renderings as well improved separability sounds. The technique also tackles common background noise issues DPM by introducing correction schedules and...

10.48550/arxiv.2310.04567 preprint EN public-domain arXiv (Cornell University) 2023-01-01

Auditory Attention Decoding (AAD) algorithms play a crucial role in isolating desired sound sources within challenging acoustic environments directly from brain activity. Although recent research has shown promise AAD using shallow representations such as auditory envelope and spectrogram, there been limited exploration of deep Self-Supervised (SS) on larger scale. In this study, we undertake comprehensive investigation into the performance linear decoders across 12 2 representations,...

10.48550/arxiv.2311.00814 preprint EN public-domain arXiv (Cornell University) 2023-01-01

Sentiment analysis has traditionally leveraged information from text data. More recently, it become increasingly clear that multimodal data provides a rich space to drastically boost interpretation of human sentiments by harnessing across multiple modalities. In this study, we incorporate pre-trained feature extractors and propose multitask training strategy improve modality representations for Multimodal Analysis (MSA). The experimental results on the CH-SIMS v2 dataset demonstrate superior...

10.1109/asru57964.2023.10389694 article EN 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2023-12-16
Coming Soon ...