- Speech Recognition and Synthesis
- Music and Audio Processing
- Speech and Audio Processing
- Diverse Musicological Studies
- Natural Language Processing Techniques
- Music Technology and Sound Studies
- Topic Modeling
- Generative Adversarial Networks and Image Synthesis
- Emotion and Mood Recognition
- Asian Culture and Media Studies
- Face recognition and analysis
- Human Motion and Animation
- Nasal Surgery and Airway Studies
- Handwritten Text Recognition Techniques
- Advancements in Battery Materials
- Domain Adaptation and Few-Shot Learning
- Multimodal Machine Learning Applications
- Speech and dialogue systems
- Digital Media Forensic Detection
- AI in cancer detection
- Data Analysis with R
- Advanced Battery Materials and Technologies
- Scientific Computing and Data Management
- Advanced Neural Network Applications
- Research Data Management Practices
Ping An (China)
2021-2025
Shenzhen Technology University
2021-2025
Lanzhou University
2015-2024
Chinese Academy of Medical Sciences & Peking Union Medical College
2022-2024
Lamar University
2021-2024
Jinling Institute of Technology
2024
Ningxia University
2021-2023
Foundation for Biomedical Research
2023
Committee on Publication Ethics
2023
Wuhan University of Science and Technology
2023
Voice Conversion(VC) refers to changing the timbre of a speech while retaining discourse content. Recently, many works have focused on disentangle-based learning techniques separate and linguistic content information from signal. Once successful, voice conversion will be feasible straightforward. This paper proposed novel one-shot framework based vector quantization (VQVC) AutoVC, called AVQVC. A new training method is applied VQVC more effectively. The result shows that this approach has...
The vibration caused blade High Cycle Fatigue (HCF) is seriously affects the safety operation of turbomachinery especially for aero-engine. Thus, it crucial important to identify parameters and then evaluate dynamic stress amplitude. Blade Tip Timing (BTT) method one promising solve these problems. While, need a high resolution Once Per Revolution (OPR) signal which difficult get Here, Coupled Vibration Analysis (CVA) identifying by none OPR BTT proposed. assumes that every real has its own...
Voice Conversion (VC) aims to convert the style of a source speaker, such as timbre and pitch, any target speaker while preserving linguistic content. However, ground truth converted speech does not exist in non-parallel VC scenario, which induces train-inference mismatch problem. Moreover, existing methods still have an inaccurate pitch low adaptation quality, there is significant disparity between domains. As result, models tend generate with hoarseness, posing challenges achieving...
The effect of the process aid “OPS” on rheological properties hydroxyl-terminated polybutadiene propellant was investigated by formulating different components high-solid-content slurry, and change in slurry viscosity with shear rate, surface morphology solid-phase particles, contact angle relevant interfaces were characterized. results showed that polyalkene polyamine surfactant OPS could significantly reduce apparent enhance to up a 30% reduction, achieved adjusting interfacial aluminum...
Metaverse is an interactive world that combines reality and virtuality, where participants can be virtual avatars. Anyone hold a concert in hall, users quickly identify the real singer behind idol through identification. Most identification methods are processed using frame-level features. However, expect singer's timbre, music frame includes information, such as melodiousness, rhythm, tonal. It means information noise for features to singers. In this paper, instead of only features, we...
Any-to-any voice conversion problem aims to convert voices for source and target speakers, which are out of the training data. Previous works wildly utilize disentangle-based models. The model assumes speech consists content speaker style information untangle them change conversion. focus on reducing dimension get information. But size is hard determine lead overlapping problem. We propose Disentangled Representation Voice Conversion (DRVC) address issue. DRVC an end-to-end self-supervised...
Existing emotional speech synthesis methods often utilize an utterance-level style embedding extracted from reference audio, neglecting the inherent multi-scale property of prosody. We introduce ED-TTS, a model that leverages Speech Emotion Diarization (SED) and Recognition (SER) to emotions at different levels. Specifically, our proposed approach integrates emotion by SER with fine-grained frame-level obtained SED. These embeddings are used condition reverse process denoising diffusion...
Voice conversion refers to transferring speaker identity with well-preserved content. Better disentanglement of speech representations leads better voice conversion. Recent studies have found that phonetic information from input audio has the potential ability well represent Besides, speaker-style modeling pre-trained models making process more complex. To tackle these issues, we introduce an new method named "CTVC" which utilizes disen-tangled contrastive learning and time-invariant...
Lithium-ion battery is widely utilized in space applications with its significant performance advantages. The safety and reliability of lithium-ion are critical for spacecraft. It essential to assess the degradation estimate state battery. Meanwhile, as a brand new terminology Cyber-physical Systems (CPS), Digital Twin used smart manufacturing industry due advantages on real-time, stability reliability. Thus, can be pack ensure So far, has not been application about management assessment. As...
Singing voice detection or vocal is a classification task that determines whether given audio segment contains singing voices. This plays very important role in vocal-related music information retrieval tasks, such as singer identification. Although humans can easily distinguish between and nonsinging parts, it still difficult for machines to do so. Most existing methods focus on feature engineering with classifiers, which rely the experience of algorithm designer. In recent years, deep...
Vocal melody extraction is an important and challenging task in music information retrieval. One main difficulty that, most of the time, various instruments singing voices are mixed according to harmonic structure, making it hard identify fundamental frequency (F0) a voice. Therefore, reducing interference accompaniment beneficial pitch estimation In this paper, we first adopted high-resolution network (HRNet) separate vocals from polyphonic music, then designed encoder-decoder estimate...
Multi-speaker text-to-speech (TTS) using a few adaption data is challenge in practical applications. To address that, we propose zero-shot multi-speaker TTS, named nnSpeech, that could synthesis new speaker voice without fine-tuning and only one utterance. Compared with representation module to extract the characteristics of speakers, our method bases on speaker-guided conditional variational autoencoder can generate variable Z, which contains both content information. The latent Z...
Recent expressive text to speech (TTS) models focus on synthesizing emotional speech, but some fine-grained styles such as intonation are neglected. In this paper, we propose QI-TTS which aims better transfer and control further deliver the speaker's questioning intention while transferring emotion from reference speech. We a multi-style extractor extract style embedding two different levels. While sentence level represents emotion, final syllable intonation. For control, use relative...