NFDI4DS | UHH-SEMS - Publication Details

Knowledge-Prompted Estimator: A Novel Approach to Explainable Machine Translation Assessment

OPENALEX - Publications

Hao Yang Min Zhang Shimin Tao Minghan Wang Daimeng Wei and 1 more

Cross-lingual Machine Translation (MT) quality estimation plays a crucial role in evaluating translation performance. GEMBA, the first MT assessment metric based on Large Language Models (LLMs), employs one-step prompting to achieve state-of-the-art (SOTA) system-level estimation; however, it lacks segment-level analysis. In contrast, Chain-of-Thought (CoT) outperforms by offering improved reasoning and explainability. this paper, we introduce Knowledge-Prompted Estimator (KPE), CoT method...

10.23919/icact60172.2024.10471974 article EN 2024-02-04

Interpretable Online Log Analysis Using Large Language Models with Prompt Strategies

OPENALEX - Publications

Yilun Liu Shimin Tao Weibin Meng Jingyu Wang Wenbing Ma and 4 more

Automated log analysis is crucial in modern software-intensive systems for facilitating program comprehension throughout software maintenance and engineering life cycles. Existing methods perform tasks such as parsing anomaly detection by providing a single prediction value without interpretation. However, given the increasing volume of system events, limited interpretability results hinders analysts' status their ability to take appropriate actions. Moreover, these require substantial...

10.1145/3643916.3644408 article EN 2024-04-15

CoachLM: Automatic Instruction Revisions Improve the Data Quality in LLM Instruction Tuning

OPENALEX - Publications

Yilun Liu Shimin Tao Xiaofeng Zhao Ming Zhu Wenbing Ma and 9 more

10.1109/icde60146.2024.00390 article EN 2022 IEEE 38th International Conference on Data Engineering (ICDE) 2024-05-13

Improving Neural Machine Translation Formality Control with Domain Adaptation and Reranking-based Transductive Learning

OPENALEX - Publications

Zhanglin Wu Zongyao Li Daimeng Wei Hengchao Shang Jiaxin Guo and 12 more

Zhanglin Wu, Zongyao Li, Daimeng Wei, Hengchao Shang, Jiaxin Guo, Xiaoyu Chen, Zhiqiang Rao, Zhengzhe Yu, Jinlong Yang, Shaojun Yuhao Xie, Bin Jiawei Zheng, Ming Zhu, Lizhi Lei, Hao Yanfei Jiang. Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023). 2023.

10.18653/v1/2023.iwslt-1.13 article EN cc-by 2023-01-01

LogPrompt: Prompt Engineering Towards Zero-Shot and Interpretable Log Analysis

OPENALEX - Publications

Yilun Liu Shimin Tao Weibin Meng Jingyu Wang Wenbin Ma and 5 more

Automated log analysis is crucial in modern software-intensive systems for facilitating program comprehension throughout software maintenance and engineering life cycles. Existing methods perform tasks such as parsing anomaly detection by providing a single prediction value without interpretation. However, given the increasing volume of system events, limited interpretability results hinders analysts' status their ability to take appropriate actions. Moreover, these require substantial...

10.48550/arxiv.2308.07610 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Treating General MT Shared Task as a Multi-Domain Adaptation Problem: HW-TSC’s Submission to the WMT23 General MT Shared Task

OPENALEX - Publications

Zhanglin Wu Daimeng Wei Zongyao Li Zhengzhe Yu Shaojun Li and 7 more

Zhanglin Wu, Daimeng Wei, Zongyao Li, Zhengzhe Yu, Shaojun Xiaoyu Chen, Hengchao Shang, Jiaxin Guo, Yuhao Xie, Lizhi Lei, Hao Yang, Yanfei Jiang. Proceedings of the Eighth Conference on Machine Translation. 2023.

10.18653/v1/2023.wmt-1.16 article EN cc-by 2023-01-01

Empowering a Metric with LLM-assisted Named Entity Annotation: HW-TSC’s Submission to the WMT23 Metrics Shared Task

OPENALEX - Publications

Zhanglin Wu Yilun Liu Min Zhang Xiaofeng Zhao Junhao Zhu and 9 more

Zhanglin Wu, Yilun Liu, Min Zhang, Xiaofeng Zhao, Junhao Zhu, Ming Xiaosong Qiao, Jingfei Ma Miaomiao, Zhao Yanqing, Song Peng, Shimin Tao, Hao Yang, Yanfei Jiang. Proceedings of the Eighth Conference on Machine Translation. 2023.

10.18653/v1/2023.wmt-1.70 article EN cc-by 2023-01-01

Knowledge-Prompted Estimator: A Novel Approach to Explainable Machine Translation Assessment

OPENALEX - Publications

Hao Yang Min Zhang Shimin Tao Minghan Wang Daimeng Wei and 1 more

Cross-lingual Machine Translation (MT) quality estimation plays a crucial role in evaluating translation performance. GEMBA, the first MT assessment metric based on Large Language Models (LLMs), employs one-step prompting to achieve state-of-the-art (SOTA) system-level estimation; however, it lacks segment-level analysis. In contrast, Chain-of-Thought (CoT) outperforms by offering improved reasoning and explainability. this paper, we introduce Knowledge-Prompted Estimator (KPE), CoT method...

10.48550/arxiv.2306.07486 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Multi-Source Log Parsing With Pre-Trained Domain Classifier

OPENALEX - Publications

Yilun Liu Shimin Tao Weibin Meng Jingyu Wang Hao Yang and 1 more

Automated log analysis with AI technologies is commonly used in network, system, and service operation maintenance to ensure reliability quality assurance. Log parsing serves as an essential primary stage analysis, where unstructured logs are transformed into structured data facilitate subsequent downstream analysis. However, traditional algorithms designed for single-domain processing struggle handle the challenges posed by multi-source inputs, leading a decline accuracy. Adapting these...

10.1109/tnsm.2023.3329144 article EN IEEE Transactions on Network and Service Management 2023-11-01

Multifaceted Challenge Set for Evaluating Machine Translation Performance

OPENALEX - Publications

Xiaoyu Chen Daimeng Wei Zhanglin Wu Ting Zhu Hengchao Shang and 6 more

Machine Translation Evaluation is critical to research, as the evaluation results reflect effectiveness of training strategies. As a result, fair and efficient method necessary. Many researchers have raised questions about currently available metrics from various perspectives, propose suggestions accordingly. However, our knowledge, few has analyzed difficulty level source sentence its influence on results. This paper presents HW-TSC’s submission WMT23 MT Test Suites shared task. We...

10.18653/v1/2023.wmt-1.22 article EN cc-by 2023-01-01

Human Evaluation for Translation Quality of ChatGPT: A Preliminary Study

OPENALEX - Publications

Yanqing Zhao Min Zhang Xiaoyu Chen Yadong Deng Aiju Geng and 11 more

ChatGPT has shown promising results for Machine Translation (MT).However, whether it is comparable to standard translation models and performs well in some specific domain remains as an open question.In this paper, we conduct human evaluations on its performance three domains using the Direct Assessment (DA) method.The evaluation result shows that a whole achieves with models, especially general domain.However, ChatGPT's inferior terms of translating domain-specific terminologies, appears be...

10.26615/issn.2683-0078.2023_023 article EN 2023-01-01

Incorporating Pinyin into Pipeline Named Entity Recognition from Chinese Speech

OPENALEX - Publications

Min Zhang Xiaosong Qiao Yanqing Zhao Chang Su Yinglu Li and 13 more

Named Entity Recognition (NER) from speech is usually implemented through a two-step pipeline that consists of (1) processing the audio using an Automatic Speech (ASR) system and (2) applying NER tagger to ASR output. In this paper, we incorporate pinyin <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sup> — spelled sounds Chinese characters into speech, aiming improve performance two steps. First, take pretrained model ChineseBERT embed...

10.1109/apsipaasc58517.2023.10317287 article EN 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 2023-10-31

The Path to Continuous Domain Adaptation Improvements by HW-TSC for the WMT23 Biomedical Translation Shared Task

OPENALEX - Publications

Zhanglin Wu Daimeng Wei Zongyao Li Zhengzhe Yu Shaojun Li and 7 more

Zhanglin Wu, Daimeng Wei, Zongyao Li, Zhengzhe Yu, Shaojun Xiaoyu Chen, Hengchao Shang, Jiaxin Guo, Yuhao Xie, Lizhi Lei, Hao Yang, Yanfei Jiang. Proceedings of the Eighth Conference on Machine Translation. 2023.

10.18653/v1/2023.wmt-1.27 article EN cc-by 2023-01-01

HW-TSC’s Speech to Text Translation System for IWSLT 2024 in Indic track

OPENALEX - Publications

Bin Wei Zongyao Li Jiaxin Guo Daimeng Wei Zhanglin Wu and 7 more

10.18653/v1/2024.iwslt-1.8 article EN 2024-01-01

Implicit Cross-Lingual Word Embedding Alignment for Reference-Free Machine Translation Evaluation

OPENALEX - Publications

Min Zhang Hao Yang Yanqing Zhao Xiaosong Qiao Shimin Tao and 3 more

As we know, cross-lingual word embedding alignment is critically important for referencefree machine translation evaluation, where source texts are directly compared with system translations.In this paper, it revealed that multilingual knowledge distillation sentence could achieve implicitly.A simplified analysis given to explain the implicit reason.And according analysis, be deduced using last layer embeddings of distilled student model will have best effect, which also validated by...

10.1109/access.2023.3260835 article EN cc-by-nc-nd IEEE Access 2023-01-01

FastSpanNER: Speeding up SpanNER by Named Entity Head Prediction

OPENALEX - Publications

Min Zhang Yanqing Zhao Xiaosong Qiao Song Peng Shimin Tao and 3 more

Named Entity Recognition (NER) is one of the most fundamental tasks in natural language processing (NLP). Different from widely-used sequence labeling framework NER, span prediction based methods are more naturally suitable for nested NER problem and have received a lot attention recently. However, classifying samples generated by traversing all sub-sequences computational expensive during training very ineffective at inference. In this paper, we propose FastSpanNER approach to reduce...

10.1109/icnlp58431.2023.00042 article EN 2022 4th International Conference on Natural Language Processing (ICNLP) 2023-03-01

Automatic Instruction Optimization for Open-source LLM Instruction Tuning

OPENALEX - Publications

Yilun Liu Shimin Tao Xiaofeng Zhao Ming Zhu Wenbin Ma and 9 more

Instruction tuning is crucial for enabling Language Learning Models (LLMs) in responding to human instructions. The quality of instruction pairs used greatly affects the performance LLMs. However, manual creation high-quality datasets costly, leading adoption automatic generation by LLMs as a popular alternative. To ensure high LLM-generated datasets, several approaches have been proposed. Nevertheless, existing methods either compromise dataset integrity filtering large proportion samples,...

10.48550/arxiv.2311.13246 preprint EN other-oa arXiv (Cornell University) 2023-01-01

HW-TSC 2024 Submission for the Quality Estimation Shared Task

OPENALEX - Publications

Wei Shan Ming Zhu Yuang Li Mengyao Piao Xiaofeng Zhao and 4 more

10.18653/v1/2024.wmt-1.39 article EN 2024-01-01

EntityRank: Unsupervised Mining of Bilingual Named Entity Pairs from Parallel Corpora for Neural Machine Translation

OPENALEX - Publications

M. Zhang Song Peng Hao Yang Yanqing Zhao Xiaosong Qiao and 4 more

As Neural Machine Translation (NMT) heavily relies on training data, finding an effective method to help NMT make better use of limited data is great significance. In this paper, with the motivation famous Google's PageRank algorithm, we propose a novel unsupervised EntityRank for mining bilingual named entity pairs from parallel corpora, which involves three critical components (Generator, Scorer and Filter). To apply mined by NMT, design augmentation strategy state-of-the-art (SOTA) model...

10.1109/bigdata55660.2022.10021032 article EN 2021 IEEE International Conference on Big Data (Big Data) 2022-12-17

HW-TSC’s Submissions to the WMT23 Discourse-Level Literary Translation Shared Task

OPENALEX - Publications

Yuhao Xie Zongyao Li Zhanglin Wu Daimeng Wei Xiaoyu Chen and 7 more

Yuhao Xie, Zongyao Li, Zhanglin Wu, Daimeng Wei, Xiaoyu Chen, Zhiqiang Rao, Shaojun Hengchao Shang, Jiaxin Guo, Lizhi Lei, Hao Yang, Yanfei Jiang. Proceedings of the Eighth Conference on Machine Translation. 2023.

10.18653/v1/2023.wmt-1.32 article EN cc-by 2023-01-01

Knowledge Prompt for Whisper: An ASR Entity Correction Approach with Knowledge Base

OPENALEX - Publications

Min Zhang Xiaosong Qiao Yanqing Zhao Chang Su Yinglu Li and 7 more

Entity correction is crucial in Automatic Speech TABLE I Recognition (ASR), since erroneous entities seriously affect our understanding of ASR results. In this paper, order to correct entity errors, we propose a knowledge prompt approach for Whisper (a recent model trained with corpus containing 680k hours labeled speech recorded various conditions). For given audio, consists three steps: (1) obtaining its result by Whisper; (2) fuzzy matching the base obtain candidate entities; (3) using as...

10.1109/bigdata59044.2023.10386366 article EN 2021 IEEE International Conference on Big Data (Big Data) 2023-12-15

Leveraging ChatGPT and Multilingual Knowledge Graph for Automatic Post-Editing

OPENALEX - Publications

Min Zhang Xiaofeng Zhao Yanqing Zhao Hao Yang Xiaosong Qiao and 9 more

Recently, ChatGPT has shown promising results for Machine Translation (MT).However, how to apply Automatic Post-Editing (APE) remains as an open question.In this paper, we propose a novel zero-shot APE method by leveraging and Multilingual Knowledge Graph (MKG).In method, use MKG find incorrectly translated entities, then generate prompts with these entities their correct translations provided in MKG, aiming have automatically the mistranslations.To evaluate our construct two test datasets...

10.26615/issn.2683-0078.2023_010 article EN 2023-01-01