- Natural Language Processing Techniques
- Topic Modeling
- Text Readability and Simplification
- Speech Recognition and Synthesis
- Speech and dialogue systems
- Multimodal Machine Learning Applications
- Ferroelectric and Negative Capacitance Devices
- Advanced Sensor and Control Systems
- Educational Technology and Pedagogy
- Target Tracking and Data Fusion in Sensor Networks
- Advanced Computational Techniques and Applications
- Translation Studies and Practices
- Legal Issues in Education
- Music and Audio Processing
- Time Series Analysis and Forecasting
- Advanced Algorithms and Applications
- Advanced Decision-Making Techniques
- Academic integrity and plagiarism
- Neural Networks and Applications
- linguistics and terminology studies
- Gaussian Processes and Bayesian Inference
Gansu Institute of Political Science and Law
2014-2023
Xidian University
2023
Peking University
2021
Zhejiang University
2020
Microsoft Research (United Kingdom)
2020
Beijing University of Chemical Technology
2010
Michigan State University
2010
Technical University of Darmstadt
2009
In this work, we develop SimulSpeech, an end-to-end simultaneous speech to text translation system which translates in source language target concurrently. SimulSpeech consists of a encoder, segmenter and decoder, where 1) the builds upon encoder leverages connectionist temporal classification (CTC) loss split input streaming real time, 2) encoder-decoder attention adopts wait-k strategy for translation. is more challenging than previous cascaded systems (with automatic recognition (ASR)...
Recent years have witnessed rapid advancements in the safety alignments of large language models (LLMs). Methods such as supervised instruction fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) thus emerged vital components constructing LLMs. While these methods achieve robust fine-grained alignment to values, their practical application is still hindered by high annotation costs incomplete alignments. Besides, intrinsic values within training corpora not been fully...
Non-autoregressive translation (NAT) achieves faster inference speed but at the cost of worse accuracy compared with autoregressive (AT). Since AT and NAT can share model structure is an easier task than due to explicit dependency on previous target-side tokens, a natural idea gradually shift training from harder task. To smooth training, in this paper, we introduce semi-autoregressive (SAT) as intermediate tasks. SAT contains hyperparameter k, each k value defines different degrees...
The English grammatical error correction system is suitable for the learning environment, with goal of accurately correcting errors in learners' writing. However, false corrections are often generated practical applications, and many cannot be corrected, thus misleading learners. quality estimation model beneficial to ensure that learners obtain accurate results avoid sentences caused by corrections. Grammatical models can generate multiple hypotheses higher quality, but existing do not...
In this paper, we present a new verification style reading comprehension dataset named VGaokao from Chinese Language tests of Gaokao. Different existing efforts, the is originally designed for native speakers’ evaluation, thus requiring more advanced language understanding skills. To address challenges in VGaokao, propose novel Extract-Integrate-Compete approach, which iteratively selects complementary evidence with query updating mechanism and adaptively distills supportive evidence,...
Non-autoregressive translation (NAT) achieves faster inference speed but at the cost of worse accuracy compared with autoregressive (AT). Since AT and NAT can share model structure is an easier task than due to explicit dependency on previous target-side tokens, a natural idea gradually shift training from harder task. To smooth training, in this paper, we introduce semi-autoregressive (SAT) as intermediate tasks. SAT contains hyperparameter k, each k value defines different degrees...
Large-scale corpora play a vital role in the construction of large language models (LLMs). However, existing LLMs exhibit limited abilities understanding low-resource languages, including minority languages China, due to lack training data. To improve accessibility these we present MC^2, Multilingual Corpus Minority Languages which is largest open-source corpus so far. It encompasses four underrepresented i.e., Tibetan, Uyghur, Kazakh Arabic script, and Mongolian traditional script. Notably,...
Recent studies have uncovered that language model distillation is less effective when facing a large capacity gap between the teacher and student, introduced assistant-based to bridge gap. As connection, scale performance of assistant vital importance bring knowledge from student. However, existing methods require maximally many trials before scheduling an optimal assistant. To this end, we propose minimal schedule (MiniDisc) for in minimally one trial. In particular, motivated by finding...
Based on the glowworm swarm optimization (GSO) and BP neural network (BPNN), an algorithm for optimized (GSOBPNN) is proposed. In algorithm, GSO used to generate better initial thresholds weights so as compensate random defects of BPNN, thus it can make BPNN have faster convergence greater learning ability. The efficiency proposed prediction method tested by simulation chaotic time series generated Lorenz system. simulations results show that has higher forecasting accuracy compared with...
Natural language inference (NLI) has the intention to infer a hypothesis from premise, and strictly faithful results depend on neural networks with anti-interference ability. To improve stability of process, we initialize optimize adversarial examples based both distance minimization embedding similarity maximization, where outside region are usually constructed small perturbations. In specific, ideal candidate set alternative wordss is obtained by efficient pruning, example forced lie close...
With the construction and promotion of Chinese national outstanding course, high-quality network curriculum are in urgent need. This article discusses application platform course "Polymer Physics" teaching practice, which include syllabus design, statistical analysis different modulus improvements future.
Existing speech to translation systems heavily rely on the text of target language: they usually translate source language either and then synthesize from text, or directly with for auxiliary training. However, those methods cannot be applied unwritten languages, which have no written phoneme available. In this paper, we develop a system named as UWSpeech, converts into discrete tokens converter, translates source-language translator, finally synthesizes an inverter. We propose method called...
Example pipeline of region velocity estimation and visulization steady-state model dynamical using EM algorithm in R with Seurat to pretreat scRNA-seq data
In this paper, we present a new verification style reading comprehension dataset named VGaokao from Chinese Language tests of Gaokao. Different existing efforts, the is originally designed for native speakers' evaluation, thus requiring more advanced language understanding skills. To address challenges in VGaokao, propose novel Extract-Integrate-Compete approach, which iteratively selects complementary evidence with query updating mechanism and adaptively distills supportive evidence,...
While technology has made information readily available to university students, many of them have no sound understanding how use the sources properly, especially ESL students (Löfström & Kupila, 2013). When they others’ ideas, text, or work without crediting sources, may commit either intentional involuntary plagiarism (Camara et al, 2017). reuse a submitted assignment for another course improperly, self-plagiarism (APA Style, 2019), However, rather than simply punishing plagiarism,...