- Natural Language Processing Techniques
- Topic Modeling
- Text Readability and Simplification
- Software System Performance and Reliability
- Speech Recognition and Synthesis
- Biomedical Text Mining and Ontologies
- Software Engineering Research
- Translation Studies and Practices
- Advanced Text Analysis Techniques
- Real-time simulation and control systems
- Semantic Web and Ontologies
- Medical Imaging and Analysis
- Software Reliability and Analysis Research
- Music and Audio Processing
- Handwritten Text Recognition Techniques
- Network Security and Intrusion Detection
- Robotics and Automated Systems
- Software Engineering Techniques and Practices
- Distributed and Parallel Computing Systems
- Software Testing and Debugging Techniques
- Multimodal Machine Learning Applications
- Data Quality and Management
Huawei Technologies (China)
2022-2024
Huawei Technologies (United States)
2024
Cross-lingual Machine Translation (MT) quality estimation plays a crucial role in evaluating translation performance. GEMBA, the first MT assessment metric based on Large Language Models (LLMs), employs one-step prompting to achieve state-of-the-art (SOTA) system-level estimation; however, it lacks segment-level analysis. In contrast, Chain-of-Thought (CoT) outperforms by offering improved reasoning and explainability. this paper, we introduce Knowledge-Prompted Estimator (KPE), CoT method...
Automated log analysis is crucial in modern software-intensive systems for facilitating program comprehension throughout software maintenance and engineering life cycles. Existing methods perform tasks such as parsing anomaly detection by providing a single prediction value without interpretation. However, given the increasing volume of system events, limited interpretability results hinders analysts' status their ability to take appropriate actions. Moreover, these require substantial...
Zhanglin Wu, Zongyao Li, Daimeng Wei, Hengchao Shang, Jiaxin Guo, Xiaoyu Chen, Zhiqiang Rao, Zhengzhe Yu, Jinlong Yang, Shaojun Yuhao Xie, Bin Jiawei Zheng, Ming Zhu, Lizhi Lei, Hao Yanfei Jiang. Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023). 2023.
Automated log analysis is crucial in modern software-intensive systems for facilitating program comprehension throughout software maintenance and engineering life cycles. Existing methods perform tasks such as parsing anomaly detection by providing a single prediction value without interpretation. However, given the increasing volume of system events, limited interpretability results hinders analysts' status their ability to take appropriate actions. Moreover, these require substantial...
Zhanglin Wu, Daimeng Wei, Zongyao Li, Zhengzhe Yu, Shaojun Xiaoyu Chen, Hengchao Shang, Jiaxin Guo, Yuhao Xie, Lizhi Lei, Hao Yang, Yanfei Jiang. Proceedings of the Eighth Conference on Machine Translation. 2023.
Zhanglin Wu, Yilun Liu, Min Zhang, Xiaofeng Zhao, Junhao Zhu, Ming Xiaosong Qiao, Jingfei Ma Miaomiao, Zhao Yanqing, Song Peng, Shimin Tao, Hao Yang, Yanfei Jiang. Proceedings of the Eighth Conference on Machine Translation. 2023.
Cross-lingual Machine Translation (MT) quality estimation plays a crucial role in evaluating translation performance. GEMBA, the first MT assessment metric based on Large Language Models (LLMs), employs one-step prompting to achieve state-of-the-art (SOTA) system-level estimation; however, it lacks segment-level analysis. In contrast, Chain-of-Thought (CoT) outperforms by offering improved reasoning and explainability. this paper, we introduce Knowledge-Prompted Estimator (KPE), CoT method...
Automated log analysis with AI technologies is commonly used in network, system, and service operation maintenance to ensure reliability quality assurance. Log parsing serves as an essential primary stage analysis, where unstructured logs are transformed into structured data facilitate subsequent downstream analysis. However, traditional algorithms designed for single-domain processing struggle handle the challenges posed by multi-source inputs, leading a decline accuracy. Adapting these...
Machine Translation Evaluation is critical to research, as the evaluation results reflect effectiveness of training strategies. As a result, fair and efficient method necessary. Many researchers have raised questions about currently available metrics from various perspectives, propose suggestions accordingly. However, our knowledge, few has analyzed difficulty level source sentence its influence on results. This paper presents HW-TSC’s submission WMT23 MT Test Suites shared task. We...
ChatGPT has shown promising results for Machine Translation (MT).However, whether it is comparable to standard translation models and performs well in some specific domain remains as an open question.In this paper, we conduct human evaluations on its performance three domains using the Direct Assessment (DA) method.The evaluation result shows that a whole achieves with models, especially general domain.However, ChatGPT's inferior terms of translating domain-specific terminologies, appears be...
Named Entity Recognition (NER) from speech is usually implemented through a two-step pipeline that consists of (1) processing the audio using an Automatic Speech (ASR) system and (2) applying NER tagger to ASR output. In this paper, we incorporate pinyin <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sup> — spelled sounds Chinese characters into speech, aiming improve performance two steps. First, take pretrained model ChineseBERT embed...
Zhanglin Wu, Daimeng Wei, Zongyao Li, Zhengzhe Yu, Shaojun Xiaoyu Chen, Hengchao Shang, Jiaxin Guo, Yuhao Xie, Lizhi Lei, Hao Yang, Yanfei Jiang. Proceedings of the Eighth Conference on Machine Translation. 2023.
As we know, cross-lingual word embedding alignment is critically important for referencefree machine translation evaluation, where source texts are directly compared with system translations.In this paper, it revealed that multilingual knowledge distillation sentence could achieve implicitly.A simplified analysis given to explain the implicit reason.And according analysis, be deduced using last layer embeddings of distilled student model will have best effect, which also validated by...
Named Entity Recognition (NER) is one of the most fundamental tasks in natural language processing (NLP). Different from widely-used sequence labeling framework NER, span prediction based methods are more naturally suitable for nested NER problem and have received a lot attention recently. However, classifying samples generated by traversing all sub-sequences computational expensive during training very ineffective at inference. In this paper, we propose FastSpanNER approach to reduce...
Instruction tuning is crucial for enabling Language Learning Models (LLMs) in responding to human instructions. The quality of instruction pairs used greatly affects the performance LLMs. However, manual creation high-quality datasets costly, leading adoption automatic generation by LLMs as a popular alternative. To ensure high LLM-generated datasets, several approaches have been proposed. Nevertheless, existing methods either compromise dataset integrity filtering large proportion samples,...
As Neural Machine Translation (NMT) heavily relies on training data, finding an effective method to help NMT make better use of limited data is great significance. In this paper, with the motivation famous Google's PageRank algorithm, we propose a novel unsupervised EntityRank for mining bilingual named entity pairs from parallel corpora, which involves three critical components (Generator, Scorer and Filter). To apply mined by NMT, design augmentation strategy state-of-the-art (SOTA) model...
Yuhao Xie, Zongyao Li, Zhanglin Wu, Daimeng Wei, Xiaoyu Chen, Zhiqiang Rao, Shaojun Hengchao Shang, Jiaxin Guo, Lizhi Lei, Hao Yang, Yanfei Jiang. Proceedings of the Eighth Conference on Machine Translation. 2023.
Entity correction is crucial in Automatic Speech TABLE I Recognition (ASR), since erroneous entities seriously affect our understanding of ASR results. In this paper, order to correct entity errors, we propose a knowledge prompt approach for Whisper (a recent model trained with corpus containing 680k hours labeled speech recorded various conditions). For given audio, consists three steps: (1) obtaining its result by Whisper; (2) fuzzy matching the base obtain candidate entities; (3) using as...
Recently, ChatGPT has shown promising results for Machine Translation (MT).However, how to apply Automatic Post-Editing (APE) remains as an open question.In this paper, we propose a novel zero-shot APE method by leveraging and Multilingual Knowledge Graph (MKG).In method, use MKG find incorrectly translated entities, then generate prompts with these entities their correct translations provided in MKG, aiming have automatically the mistranslations.To evaluate our construct two test datasets...