- Speech Recognition and Synthesis
- Music and Audio Processing
- Speech and Audio Processing
- Natural Language Processing Techniques
- Topic Modeling
- Speech and dialogue systems
- Diverse Musicological Studies
- Music Technology and Sound Studies
- Cancer Cells and Metastasis
- Genetic factors in colorectal cancer
- Cancer-related molecular mechanisms research
- Acupuncture Treatment Research Studies
- Bone Metabolism and Diseases
- Emotion and Mood Recognition
- Adversarial Robustness in Machine Learning
- Drug-Induced Hepatotoxicity and Protection
- Effects of Radiation Exposure
- Sirtuins and Resveratrol in Medicine
- TGF-β signaling in diseases
- Pelvic floor disorders treatments
- Mesenchymal stem cell research
- Renal and related cancers
- Greenhouse Technology and Climate Control
- Music Therapy and Health
- COVID-19 diagnosis using AI
Agency for Science, Technology and Research
2024-2025
Institute for Infocomm Research
2024-2025
National University of Singapore
2007-2024
Anhui University of Traditional Chinese Medicine
2024
Shanxi Academy of Medical Sciences
2023
Shanxi Medical University
2023
East China Normal University
2022
Tianjin University of Traditional Chinese Medicine
2021-2022
Beijing Technology and Business University
2020
Fudan University
2019-2020
Lyrics are the words that make up a song, while chords harmonic sets of multiple notes in music. and generally essential information music, i.e. unaccompanied singing vocals mixed with instrumental representing important components polyphonic In traditional lyrics transcription task, we first extract from music then transcribe resulting vocals, where two steps optimized independently. this paper, propose novel end-to-end network architectures designed to disentangle for effective single...
Multi-modal class-incremental learning (MMCIL) seeks to leverage multi-modal data, such as audio-visual and image-text pairs, thereby enabling models learn continuously across a sequence of tasks while mitigating forgetting. While existing studies primarily focus on the integration utilization information for MMCIL, critical challenge remains: issue missing modalities during incremental phases. This oversight can exacerbate severe forgetting significantly impair model performance. To bridge...
Automatic Speech Recognition (ASR) systems are pivotal in transcribing speech into text, yet the errors they introduce can significantly degrade performance of downstream tasks like summarization. This issue is particularly pronounced clinical dialogue summarization, a low-resource domain where supervised data for fine-tuning scarce, necessitating use ASR models as black-box solutions. Employing conventional augmentation enhancing noise robustness summarization not feasible either due to...
BackgroundDachshund homologue 1 (DACH1) is highly expressed in LGR5+ intestinal stem cells and colorectal tumours. However, the roles of DACH1 cell stemness tumorigenesis remain largely undefined.MethodsWe used immunohistochemistry, western blotting quantitative real-time PCR to analyse expression cancer (CRC) samples. CRISPR/Cas9 gene editing lentiviral vector-mediated overexpression shRNA-mediated knockdown were utilized modulate lines organoids. An organoid-based functional model was...
Abstract Chest computed tomography (CT) scanning is one of the most important technologies for COVID-19 diagnosis and disease monitoring, particularly early detection coronavirus. Recent advancements in computer vision motivate more concerted efforts developing AI-driven diagnostic tools to accommodate enormous demands tests globally. To help alleviate burdens on medical systems, we develop a lesion-attention deep neural network (LA-DNN) predict positive or negative with richly annotated...
Lyrics transcription of polyphonic music is challenging as the background affects lyrics intelligibility. Typically, can be performed by a two-step pipeline, i.e. singing vocal extraction front end, followed transcriber back where end and are trained separately. Such pipeline suffers from both imperfect mismatch between end. In this work, we propose novel end-to-end integrated fine-tuning framework, that call PoLyScriber, to globally optimize extractor for in music. The experimental results...
Self-supervised pre-training has been successful in both text and speech processing. Speech offer different but complementary information. The question is whether we are able to perform a speech-text joint on unpaired text. In this paper, take the idea of self-supervised one step further propose token2vec, novel framework for based discrete representations speech. Specifically, introduce two modality-specific tokenizers Based these tokenizers, convert speech/text sequences into token...
Lyrics transcription of polyphonic music is challenging not only because the singing vocals are corrupted by background music, but also and style vary across genres, such as pop, metal, hip hop, which affects lyrics intelligibility song in different ways. In this work, we propose to transcribe using a novel genre-conditioned network. The proposed network adopts pre-trained model parameters, incorporates genre adapters between layers capture peculiarities for lyrics-genre pairs, thereby...
Accurate alignment between singing signal and its spoken lyrics at frame-level is imperative to several applications in processing. As the acoustic characteristics of speech signals differ significantly, finding temporal them not easy. In this paper, we study identify their common properties facilitate alignment. We observe that: (i) excitation source human voice production mechanism largely vary with speaking and, (ii) for same linguistic content, present very different formant patterns....
Speech-to-Singing (STS) conversion aims at converting one's reading speech into his/her singing vocal. The prior work was mainly focused on transforming the prosody of to singing, however, there exist prominent differences between spectra and which need be transformed as well. In this paper, we propose make use parallel multi-speaker speak-sing data develop a speaker-independent spectral mapping model, is conditioned i-vector generate target speaker/singer identity. model therefore called...
Epithelial regeneration is critical for barrier maintenance and organ function after intestinal injury, although the repair mechanisms are unclear. Here, we found that Bach2 deficiency promotes epithelial cell proliferation during homeostasis. Moreover, genetic inactivation of in mouse epithelium facilitated crypt irradiation, resulting a reduction mortality. RNA-sequencing analysis isolated crypts revealed altered expression numerous genes, including those regulating double-strand break...
Estrogen is very important to the differentiation of B lymphocytes; lymphopoiesis induced by OVX was supposedly involved in osteoporosis. But effects lymphocytes on osteogenic bone mesenchymal stem cells (BMSCs) are not clear. In this study, we detected quality and loss a trabecular electronic universal material testing machine microcomputed tomography (micro-CT) splenectomized-ovariectomy (SPX-OVX) rats. Additionally, changes (B lymphocyte, CD4+ CD8+ T lymphocytes, macrophages) marrow were...
The current lyrics transcription approaches heavily rely on supervised learning with labeled data, but such data are scarce and manual labeling of singing is expensive. How to benefit from unlabeled alleviate limited problem have not been explored for transcription. We propose the first semi-supervised paradigm, Self-Transcriber, by leveraging using selftraining noisy student augmentation. attempt demonstrate possibility a few amount data. Self-Transcriber generates pseudo labels teacher...
Acute liver failure (ALF) is a serious disease that difficult to treat owing its unclear pathogenesis. This study aimed investigate the roles and molecular mechanisms of calycosin (CA) in ALF.
Despite speech-to-singing (STS) voice conversion has been widely studied, a large database for this task not constructed yet. We present new Spoken Lyrics and Singing (SLS) corpus developed at NUS-HLT that can be useful STS. In work, the details of is reported contains 3,058 utterances 90 English songs from 10 professional singers collected in recording studio environment. The spoken lyrics corresponding to are also recorded create database, which we refer as SLS corpus. A comparison singing...