Chen Zhang

ORCID: 0000-0002-4485-8434
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Natural Language Processing Techniques
  • Topic Modeling
  • Text Readability and Simplification
  • Speech Recognition and Synthesis
  • Speech and dialogue systems
  • Multimodal Machine Learning Applications
  • Ferroelectric and Negative Capacitance Devices
  • Advanced Sensor and Control Systems
  • Educational Technology and Pedagogy
  • Target Tracking and Data Fusion in Sensor Networks
  • Advanced Computational Techniques and Applications
  • Translation Studies and Practices
  • Legal Issues in Education
  • Music and Audio Processing
  • Time Series Analysis and Forecasting
  • Advanced Algorithms and Applications
  • Advanced Decision-Making Techniques
  • Academic integrity and plagiarism
  • Neural Networks and Applications
  • linguistics and terminology studies
  • Gaussian Processes and Bayesian Inference

Gansu Institute of Political Science and Law
2014-2023

Xidian University
2023

Peking University
2021

Zhejiang University
2020

Microsoft Research (United Kingdom)
2020

Beijing University of Chemical Technology
2010

Michigan State University
2010

Technical University of Darmstadt
2009

In this work, we develop SimulSpeech, an end-to-end simultaneous speech to text translation system which translates in source language target concurrently. SimulSpeech consists of a encoder, segmenter and decoder, where 1) the builds upon encoder leverages connectionist temporal classification (CTC) loss split input streaming real time, 2) encoder-decoder attention adopts wait-k strategy for translation. is more challenging than previous cascaded systems (with automatic recognition (ASR)...

10.18653/v1/2020.acl-main.350 article EN cc-by 2020-01-01

Recent years have witnessed rapid advancements in the safety alignments of large language models (LLMs). Methods such as supervised instruction fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) thus emerged vital components constructing LLMs. While these methods achieve robust fine-grained alignment to values, their practical application is still hindered by high annotation costs incomplete alignments. Besides, intrinsic values within training corpora not been fully...

10.1609/aaai.v39i26.34957 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2025-04-11

Non-autoregressive translation (NAT) achieves faster inference speed but at the cost of worse accuracy compared with autoregressive (AT). Since AT and NAT can share model structure is an easier task than due to explicit dependency on previous target-side tokens, a natural idea gradually shift training from harder task. To smooth training, in this paper, we introduce semi-autoregressive (SAT) as intermediate tasks. SAT contains hyperparameter k, each k value defines different degrees...

10.24963/ijcai.2020/534 preprint EN 2020-07-01

The English grammatical error correction system is suitable for the learning environment, with goal of accurately correcting errors in learners' writing. However, false corrections are often generated practical applications, and many cannot be corrected, thus misleading learners. quality estimation model beneficial to ensure that learners obtain accurate results avoid sentences caused by corrections. Grammatical models can generate multiple hypotheses higher quality, but existing do not...

10.1109/access.2023.3239693 article EN cc-by-nc-nd IEEE Access 2023-01-01

In this paper, we present a new verification style reading comprehension dataset named VGaokao from Chinese Language tests of Gaokao. Different existing efforts, the is originally designed for native speakers’ evaluation, thus requiring more advanced language understanding skills. To address challenges in VGaokao, propose novel Extract-Integrate-Compete approach, which iteratively selects complementary evidence with query updating mechanism and adaptively distills supportive evidence,...

10.18653/v1/2021.findings-emnlp.255 preprint EN cc-by 2021-01-01

Non-autoregressive translation (NAT) achieves faster inference speed but at the cost of worse accuracy compared with autoregressive (AT). Since AT and NAT can share model structure is an easier task than due to explicit dependency on previous target-side tokens, a natural idea gradually shift training from harder task. To smooth training, in this paper, we introduce semi-autoregressive (SAT) as intermediate tasks. SAT contains hyperparameter k, each k value defines different degrees...

10.48550/arxiv.2007.08772 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Large-scale corpora play a vital role in the construction of large language models (LLMs). However, existing LLMs exhibit limited abilities understanding low-resource languages, including minority languages China, due to lack training data. To improve accessibility these we present MC^2, Multilingual Corpus Minority Languages which is largest open-source corpus so far. It encompasses four underrepresented i.e., Tibetan, Uyghur, Kazakh Arabic script, and Mongolian traditional script. Notably,...

10.48550/arxiv.2311.08348 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Recent studies have uncovered that language model distillation is less effective when facing a large capacity gap between the teacher and student, introduced assistant-based to bridge gap. As connection, scale performance of assistant vital importance bring knowledge from student. However, existing methods require maximally many trials before scheduling an optimal assistant. To this end, we propose minimal schedule (MiniDisc) for in minimally one trial. In particular, motivated by finding...

10.48550/arxiv.2205.14570 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Based on the glowworm swarm optimization (GSO) and BP neural network (BPNN), an algorithm for optimized (GSOBPNN) is proposed. In algorithm, GSO used to generate better initial thresholds weights so as compensate random defects of BPNN, thus it can make BPNN have faster convergence greater learning ability. The efficiency proposed prediction method tested by simulation chaotic time series generated Lorenz system. simulations results show that has higher forecasting accuracy compared with...

10.4028/www.scientific.net/amm.513-517.2412 article EN Applied Mechanics and Materials 2014-02-06

Natural language inference (NLI) has the intention to infer a hypothesis from premise, and strictly faithful results depend on neural networks with anti-interference ability. To improve stability of process, we initialize optimize adversarial examples based both distance minimization embedding similarity maximization, where outside region are usually constructed small perturbations. In specific, ideal candidate set alternative wordss is obtained by efficient pruning, example forced lie close...

10.2139/ssrn.4453305 preprint EN 2023-01-01

With the construction and promotion of Chinese national outstanding course, high-quality network curriculum are in urgent need. This article discusses application platform course "Polymer Physics" teaching practice, which include syllabus design, statistical analysis different modulus improvements future.

10.1109/icee.2010.1383 article EN International Conference on E-Business and E-Government 2010-05-01

Existing speech to translation systems heavily rely on the text of target language: they usually translate source language either and then synthesize from text, or directly with for auxiliary training. However, those methods cannot be applied unwritten languages, which have no written phoneme available. In this paper, we develop a system named as UWSpeech, converts into discrete tokens converter, translates source-language translator, finally synthesizes an inverter. We propose method called...

10.48550/arxiv.2006.07926 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Example pipeline of region velocity estimation and visulization steady-state model dynamical using EM algorithm in R with Seurat to pretreat scRNA-seq data

10.17504/protocols.io.b8kbrusn preprint EN 2022-05-02

In this paper, we present a new verification style reading comprehension dataset named VGaokao from Chinese Language tests of Gaokao. Different existing efforts, the is originally designed for native speakers' evaluation, thus requiring more advanced language understanding skills. To address challenges in VGaokao, propose novel Extract-Integrate-Compete approach, which iteratively selects complementary evidence with query updating mechanism and adaptively distills supportive evidence,...

10.48550/arxiv.2109.05149 preprint EN cc-by arXiv (Cornell University) 2021-01-01

While technology has made information readily available to university students, many of them have no sound understanding how use the sources properly, especially ESL students (Löfström & Kupila, 2013). When they others’ ideas, text, or work without crediting sources, may commit either intentional involuntary plagiarism (Camara et al, 2017). reuse a submitted assignment for another course improperly, self-plagiarism (APA Style, 2019), However, rather than simply punishing plagiarism,...

10.55016/ojs/cpai.v4i2.74168 article EN Canadian Perspectives on Academic Integrity 2021-12-30
Coming Soon ...