Xinjian Li

ORCID: 0000-0003-4585-159X
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Speech Recognition and Synthesis
  • Natural Language Processing Techniques
  • Speech and Audio Processing
  • Music and Audio Processing
  • Topic Modeling
  • Speech and dialogue systems
  • Adversarial Robustness in Machine Learning
  • Anomaly Detection Techniques and Applications
  • Digital Media Forensic Detection
  • Advanced Neural Network Applications
  • Chaos-based Image/Signal Encryption
  • Decision-Making and Behavioral Economics
  • Stochastic Gradient Optimization Techniques
  • Neural and Behavioral Psychology Studies
  • Functional Brain Connectivity Studies
  • Caching and Content Delivery
  • Voice and Speech Disorders
  • Recommender Systems and Techniques
  • Domain Adaptation and Few-Shot Learning
  • Video Analysis and Summarization
  • Advanced Authentication Protocols Security
  • Advanced Graph Neural Networks
  • Service and Product Innovation
  • Text Readability and Simplification
  • Machine Learning and ELM

China Tobacco
2023-2025

Carnegie Mellon University
2018-2024

Google (United States)
2023

Tencent (China)
2022

Allen Institute for Brain Science
2020-2022

Georgia Institute of Technology
2021

University of Pittsburgh
2021

City, University of London
2021

SIL International
2020

Johns Hopkins University
2020

Multilingual models can improve language processing, particularly for low resource situations, by sharing parameters across languages. acoustic models, however, generally ignore the difference between phonemes (sounds that support lexical contrasts in a particular language) and their corresponding phones (the sounds are actually spoken, which independent). This lead to performance degradation when combining variety of training languages, as identically annotated correspond several different...

10.1109/icassp40776.2020.9054362 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020-04-09

Abstract In humans, risk attitude is highly context-dependent, varying with wealth levels or for different potential outcomes, such as gains losses. These behavioral effects have been modelled using prospect theory, the key assumption that humans represent value of each available option asymmetrically a gain loss relative to reference point. It remains unknown how these computations are implemented at neuronal level. Here we show macaques, like change their across and gain/loss contexts...

10.1038/s41467-022-28278-9 article EN cc-by Nature Communications 2022-02-07

Pre-training speech models on large volumes of data has achieved remarkable success. OpenAI Whisper is a multilingual multitask model trained 680k hours supervised data. It generalizes well to various recognition and translation benchmarks even in zero-shot setup. However, the full pipeline for developing such (from collection training) not publicly accessible, which makes it difficult researchers further improve its performance address training-related issues as efficiency, robustness,...

10.1109/asru57964.2023.10389676 article EN 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2023-12-16

Auscultation of the heart is a widely studied technique, which requires precise hearing from practitioners as means distinguishing subtle differences in heart-beat rhythm. This technique popular due to its non-invasive nature, and can be an early diagnosis aid for range cardiac conditions. Machine listening approaches support this process, monitoring continuously allowing representation both mild chronic Despite potential, relevant databases benchmark studies are scarce. In paper, we...

10.1109/jbhi.2019.2955281 article EN IEEE Journal of Biomedical and Health Informatics 2019-11-22

Research on speech-to-speech translation (S2ST) has progressed rapidly in recent years. Many end-to-end systems have been proposed and show advantages over conventional cascade systems, which are often composed of recognition, synthesis sub-systems. However, most still rely intermediate textual supervision during training, makes it infeasible to work for languages without written forms. In this work, we propose a novel model, Textless Translatotron, is based Translatotron 2 [1], training an...

10.1109/icassp49357.2023.10096797 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023-05-05

The enumeration reduction algorithm is a practical method for solving the shortest vector problem(SVP) in lattice theory, and it plays an important role analyzing security of lattice-based post-quantum cryptography NTRU(Number Theory Research Unit), but there still problem low efficiency high-dimensional NTRU. Based on symplectic properties NTRU lattices, new upper boundary initial A search space analyzed. A<sup><i>new</i></sup> 2.4 times smaller than A<sup><i>Schnorr</i></sup> ENUM...

10.1117/12.3045735 article EN other-oa 2025-01-15

Automatic phonemic transcription tools are useful for low-resource language documentation. However, due to the lack of training sets, only a tiny fraction languages have tools. Fortunately, multilingual acoustic modeling provides solution given limited audio data. A more challenging problem is build transcribers with zero The difficulty this task that phoneme inventories often differ between and target language, making it infeasible recognize unseen phonemes. In work, we address by adopting...

10.1609/aaai.v34i05.6341 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2020-04-03

While neural text-to-speech (TTS) has achieved human-like natural synthetic speech, multilingual TTS systems are limited to resource-rich languages due the need for paired text and studio-quality audio data. This paper proposes a method zero-shot using text-only data target language. The use of allows development low-resource which only textual resources available, making accessible thousands languages. Inspired by strong cross-lingual transferability language models, our framework first...

10.24963/ijcai.2023/575 article EN 2023-08-01

Neural text-to-speech (TTS) systems have made significant progress in generating natural synthetic speech. However, neural TTS requires large amounts of paired training data, which limits its applicability to a small number resource-rich languages. Previous work on low-resource has addressed the data hungriness based transfer learning from multilingual model languages, but it still relies heavily availability for target In this paper, we propose text-inductive language adaptation framework...

10.1109/taslp.2024.3369537 article EN cc-by IEEE/ACM Transactions on Audio Speech and Language Processing 2024-01-01

Cancer is a horrible disease and major reason to cause death in the world. Early detection diagnosis can help doctor save life. Many computer-aided techniques use image processing do cancer obtain considerable achievements. In this paper, we propose novel ResNet-based deep learning network identify metastatic from scan images. Furthermore, apply Test Time Augmentation make our model more robust improve accuracy. The results of experiments on slightly modified version PatchCamelyon (PCam)...

10.1109/iccece51280.2021.9342346 article EN 2021-01-15

Developing a practical speech recognizer for low resource language is challenging, not only because of the (potentially unknown) properties language, but also test data may be from same domain as available training data.In this paper, we focus on latter challenge, i.e. mismatch, systems trained using sequence-based criterion. We demonstrate effectiveness pre-trained English recognizer, which robust to such mismatched conditions, normalizing feature extractor language. In our example, use...

10.1109/slt.2018.8639569 article EN 2022 IEEE Spoken Language Technology Workshop (SLT) 2018-12-01

Multilingual acoustic models have been successfully applied to low-resource speech recognition.Most existing works combined many small corpora together, and pretrained a multilingual model by sampling from each corpus uniformly.The is eventually fine-tuned on target corpus.This approach, however, fails exploit the relatedness similarity among in training set.For example, might benefit more same domain or close language.In this work, we propose simple but useful strategy take advantage of...

10.21437/interspeech.2019-3052 article EN Interspeech 2022 2019-09-13

Brian Yan, Jiatong Shi, Soumi Maiti, William Chen, Xinjian Li, Yifan Peng, Siddhant Arora, Shinji Watanabe. Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023). 2023.

10.18653/v1/2023.iwslt-1.20 article EN cc-by 2023-01-01

Voice Assistants (VAs) such as Amazon Alexa or Google Assistant rely on wake-word detection to respond people's commands, which could potentially be vulnerable audio adversarial examples. In this work, we target our attack the system, jamming model with some inconspicuous background music deactivate VAs while adversary is present. We implemented an emulated system of based recent publications. validated models against real in terms accuracy. Then computed adversaries consideration...

10.48550/arxiv.1911.00126 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Grapheme-to-Phoneme (G2P) has many applications in NLP and speech fields. Most existing work focuses heavily on languages with abundant training datasets, which limits the scope of target to less than 100 languages. This attempts apply zero-shot learning approximate G2P models for all low-resource endangered Glottolog (about 8k languages). For any unseen language, we first build phylogenetic tree (i.e. language family tree) identify top-k nearest have sets. Then run those obtain a hypothesis...

10.18653/v1/2022.findings-acl.166 article EN cc-by Findings of the Association for Computational Linguistics: ACL 2022 2022-01-01

10.3785/j.issn.1008-973x.2009.12.019 article EN Journal of ZheJiang University (Engineering Science) 2010-01-16

Phone Recognition is one of the most important tasks in field multilingual speech recognition, especially for low-resource languages whose orthographies are not available. However, recognition datasets so far only focus on high-resource languages, there very few available with detailed phone annotation. In this work, we present a large phonetic dataset, which preprocessed and aligned from UCLA dataset. The dataset contains around 100 7000 utterances total. This would provide an ideal...

10.1109/icassp39728.2021.9413720 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021-05-13

With recent advancements in language technologies, humans are now speaking to devices. Increasing the reach of spoken technologies requires building systems local languages. A major bottleneck here underlying data-intensive parts that make up such systems, including automatic speech recognition (ASR) require large amounts labelled data. aim aiding development dialog low resourced languages, we propose a novel acoustics based intent system uses discovered phonetic units for classification....

10.1109/icassp39728.2021.9415112 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021-05-13

We introduce a new resource, AlloVera, which provides mappings from 218 allophones to phonemes for 14 languages. Phonemes are contrastive phonological units, and their various concrete realizations, predictable context. While phonemic representations language specific, phonetic (stated in terms of (allo)phones) much closer universal (language-independent) transcription. AlloVera allows the training speech recognition models that output transcriptions International Phonetic Alphabet (IPA),...

10.48550/arxiv.2004.08031 preprint EN other-oa arXiv (Cornell University) 2020-01-01

There have been many studies on improving the efficiency of shared learning in Multi-Task Learning (MTL). Previous works focused "micro" sharing perspective for a small number tasks, while Recommender Systems (RS) and other AI applications, we often need to model large tasks. For example, when using MTL various user behaviors RS, if differentiate new users items from old ones, tasks will increase exponentially with multidimensional relations. This work proposes Multi-Faceted Hierarchical...

10.1145/3511808.3557140 article EN Proceedings of the 31st ACM International Conference on Information &amp; Knowledge Management 2022-10-16
Coming Soon ...