Yanfei Jiang

ORCID: 0009-0001-2817-1018
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Natural Language Processing Techniques
  • Topic Modeling
  • Text Readability and Simplification
  • Software System Performance and Reliability
  • Speech Recognition and Synthesis
  • Biomedical Text Mining and Ontologies
  • Software Engineering Research
  • Translation Studies and Practices
  • Advanced Text Analysis Techniques
  • Real-time simulation and control systems
  • Semantic Web and Ontologies
  • Medical Imaging and Analysis
  • Software Reliability and Analysis Research
  • Music and Audio Processing
  • Handwritten Text Recognition Techniques
  • Network Security and Intrusion Detection
  • Robotics and Automated Systems
  • Software Engineering Techniques and Practices
  • Distributed and Parallel Computing Systems
  • Software Testing and Debugging Techniques
  • Multimodal Machine Learning Applications
  • Data Quality and Management

Huawei Technologies (China)
2022-2024

Huawei Technologies (United States)
2024

Cross-lingual Machine Translation (MT) quality estimation plays a crucial role in evaluating translation performance. GEMBA, the first MT assessment metric based on Large Language Models (LLMs), employs one-step prompting to achieve state-of-the-art (SOTA) system-level estimation; however, it lacks segment-level analysis. In contrast, Chain-of-Thought (CoT) outperforms by offering improved reasoning and explainability. this paper, we introduce Knowledge-Prompted Estimator (KPE), CoT method...

10.23919/icact60172.2024.10471974 article EN 2024-02-04

Automated log analysis is crucial in modern software-intensive systems for facilitating program comprehension throughout software maintenance and engineering life cycles. Existing methods perform tasks such as parsing anomaly detection by providing a single prediction value without interpretation. However, given the increasing volume of system events, limited interpretability results hinders analysts' status their ability to take appropriate actions. Moreover, these require substantial...

10.1145/3643916.3644408 article EN 2024-04-15

Zhanglin Wu, Zongyao Li, Daimeng Wei, Hengchao Shang, Jiaxin Guo, Xiaoyu Chen, Zhiqiang Rao, Zhengzhe Yu, Jinlong Yang, Shaojun Yuhao Xie, Bin Jiawei Zheng, Ming Zhu, Lizhi Lei, Hao Yanfei Jiang. Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023). 2023.

10.18653/v1/2023.iwslt-1.13 article EN cc-by 2023-01-01

Automated log analysis is crucial in modern software-intensive systems for facilitating program comprehension throughout software maintenance and engineering life cycles. Existing methods perform tasks such as parsing anomaly detection by providing a single prediction value without interpretation. However, given the increasing volume of system events, limited interpretability results hinders analysts' status their ability to take appropriate actions. Moreover, these require substantial...

10.48550/arxiv.2308.07610 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Zhanglin Wu, Daimeng Wei, Zongyao Li, Zhengzhe Yu, Shaojun Xiaoyu Chen, Hengchao Shang, Jiaxin Guo, Yuhao Xie, Lizhi Lei, Hao Yang, Yanfei Jiang. Proceedings of the Eighth Conference on Machine Translation. 2023.

10.18653/v1/2023.wmt-1.16 article EN cc-by 2023-01-01

Zhanglin Wu, Yilun Liu, Min Zhang, Xiaofeng Zhao, Junhao Zhu, Ming Xiaosong Qiao, Jingfei Ma Miaomiao, Zhao Yanqing, Song Peng, Shimin Tao, Hao Yang, Yanfei Jiang. Proceedings of the Eighth Conference on Machine Translation. 2023.

10.18653/v1/2023.wmt-1.70 article EN cc-by 2023-01-01

Cross-lingual Machine Translation (MT) quality estimation plays a crucial role in evaluating translation performance. GEMBA, the first MT assessment metric based on Large Language Models (LLMs), employs one-step prompting to achieve state-of-the-art (SOTA) system-level estimation; however, it lacks segment-level analysis. In contrast, Chain-of-Thought (CoT) outperforms by offering improved reasoning and explainability. this paper, we introduce Knowledge-Prompted Estimator (KPE), CoT method...

10.48550/arxiv.2306.07486 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Automated log analysis with AI technologies is commonly used in network, system, and service operation maintenance to ensure reliability quality assurance. Log parsing serves as an essential primary stage analysis, where unstructured logs are transformed into structured data facilitate subsequent downstream analysis. However, traditional algorithms designed for single-domain processing struggle handle the challenges posed by multi-source inputs, leading a decline accuracy. Adapting these...

10.1109/tnsm.2023.3329144 article EN IEEE Transactions on Network and Service Management 2023-11-01

Machine Translation Evaluation is critical to research, as the evaluation results reflect effectiveness of training strategies. As a result, fair and efficient method necessary. Many researchers have raised questions about currently available metrics from various perspectives, propose suggestions accordingly. However, our knowledge, few has analyzed difficulty level source sentence its influence on results. This paper presents HW-TSC’s submission WMT23 MT Test Suites shared task. We...

10.18653/v1/2023.wmt-1.22 article EN cc-by 2023-01-01

ChatGPT has shown promising results for Machine Translation (MT).However, whether it is comparable to standard translation models and performs well in some specific domain remains as an open question.In this paper, we conduct human evaluations on its performance three domains using the Direct Assessment (DA) method.The evaluation result shows that a whole achieves with models, especially general domain.However, ChatGPT's inferior terms of translating domain-specific terminologies, appears be...

10.26615/issn.2683-0078.2023_023 article EN 2023-01-01

Named Entity Recognition (NER) from speech is usually implemented through a two-step pipeline that consists of (1) processing the audio using an Automatic Speech (ASR) system and (2) applying NER tagger to ASR output. In this paper, we incorporate pinyin <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sup> — spelled sounds Chinese characters into speech, aiming improve performance two steps. First, take pretrained model ChineseBERT embed...

10.1109/apsipaasc58517.2023.10317287 article EN 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 2023-10-31

Zhanglin Wu, Daimeng Wei, Zongyao Li, Zhengzhe Yu, Shaojun Xiaoyu Chen, Hengchao Shang, Jiaxin Guo, Yuhao Xie, Lizhi Lei, Hao Yang, Yanfei Jiang. Proceedings of the Eighth Conference on Machine Translation. 2023.

10.18653/v1/2023.wmt-1.27 article EN cc-by 2023-01-01

As we know, cross-lingual word embedding alignment is critically important for referencefree machine translation evaluation, where source texts are directly compared with system translations.In this paper, it revealed that multilingual knowledge distillation sentence could achieve implicitly.A simplified analysis given to explain the implicit reason.And according analysis, be deduced using last layer embeddings of distilled student model will have best effect, which also validated by...

10.1109/access.2023.3260835 article EN cc-by-nc-nd IEEE Access 2023-01-01

Named Entity Recognition (NER) is one of the most fundamental tasks in natural language processing (NLP). Different from widely-used sequence labeling framework NER, span prediction based methods are more naturally suitable for nested NER problem and have received a lot attention recently. However, classifying samples generated by traversing all sub-sequences computational expensive during training very ineffective at inference. In this paper, we propose FastSpanNER approach to reduce...

10.1109/icnlp58431.2023.00042 article EN 2022 4th International Conference on Natural Language Processing (ICNLP) 2023-03-01

Instruction tuning is crucial for enabling Language Learning Models (LLMs) in responding to human instructions. The quality of instruction pairs used greatly affects the performance LLMs. However, manual creation high-quality datasets costly, leading adoption automatic generation by LLMs as a popular alternative. To ensure high LLM-generated datasets, several approaches have been proposed. Nevertheless, existing methods either compromise dataset integrity filtering large proportion samples,...

10.48550/arxiv.2311.13246 preprint EN other-oa arXiv (Cornell University) 2023-01-01

As Neural Machine Translation (NMT) heavily relies on training data, finding an effective method to help NMT make better use of limited data is great significance. In this paper, with the motivation famous Google's PageRank algorithm, we propose a novel unsupervised EntityRank for mining bilingual named entity pairs from parallel corpora, which involves three critical components (Generator, Scorer and Filter). To apply mined by NMT, design augmentation strategy state-of-the-art (SOTA) model...

10.1109/bigdata55660.2022.10021032 article EN 2021 IEEE International Conference on Big Data (Big Data) 2022-12-17

Yuhao Xie, Zongyao Li, Zhanglin Wu, Daimeng Wei, Xiaoyu Chen, Zhiqiang Rao, Shaojun Hengchao Shang, Jiaxin Guo, Lizhi Lei, Hao Yang, Yanfei Jiang. Proceedings of the Eighth Conference on Machine Translation. 2023.

10.18653/v1/2023.wmt-1.32 article EN cc-by 2023-01-01

Entity correction is crucial in Automatic Speech TABLE I Recognition (ASR), since erroneous entities seriously affect our understanding of ASR results. In this paper, order to correct entity errors, we propose a knowledge prompt approach for Whisper (a recent model trained with corpus containing 680k hours labeled speech recorded various conditions). For given audio, consists three steps: (1) obtaining its result by Whisper; (2) fuzzy matching the base obtain candidate entities; (3) using as...

10.1109/bigdata59044.2023.10386366 article EN 2021 IEEE International Conference on Big Data (Big Data) 2023-12-15

Recently, ChatGPT has shown promising results for Machine Translation (MT).However, how to apply Automatic Post-Editing (APE) remains as an open question.In this paper, we propose a novel zero-shot APE method by leveraging and Multilingual Knowledge Graph (MKG).In method, use MKG find incorrectly translated entities, then generate prompts with these entities their correct translations provided in MKG, aiming have automatically the mistranslations.To evaluate our construct two test datasets...

10.26615/issn.2683-0078.2023_010 article EN 2023-01-01
Coming Soon ...