NFDI4DS | UHH-SEMS - Publication Details

An Empirical Study on Large Language Models in Accuracy and Robustness under Chinese Industrial Scenarios

Robustness Empirical Research

DOI: 10.48550/arxiv.2402.01723 Publication Date: 2024-01-26

Abstract Supplemental Material References Cited by

AUTHORS (9)

Zongjie Li

Wenying Qiu

Pingchuan Ma

Yichen Li

You Li

Sijia He

Baozheng Jiang

Shuai Wang

Weixi Gu

ABSTRACT

Recent years have witnessed the rapid development of large language models (LLMs) in various domains. To better serve number Chinese users, many commercial vendors China adopted localization strategies, training and providing local LLMs specifically customized for users. Furthermore, looking ahead, one key future applications will be practical deployment industrial production by enterprises users those sectors. However, accuracy robustness scenarios not been well studied. In this paper, we present a comprehensive empirical study on context area. We manually collected 1,200 domain-specific problems from 8 different sectors to evaluate LLM accuracy. designed metamorphic testing framework containing four industrial-specific stability categories with eight abilities, totaling 13,631 questions variants robustness. total, evaluated 9 developed vendors, as global vendors. Our major findings include: (1) Current exhibit low contexts, all scoring less than 0.6. (2) The scores vary across sectors, overall perform worse ones. (3) differs significantly abilities. Global are more robust under logical-related variants, while advanced related understanding terminology. results provide valuable guidance promoting domain capabilities both enterprise perspectives. further motivate possible research directions tooling support.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications

PlumX Metrics

An Empirical Study on Large Language Models in Accuracy and Robustness under Chinese Industrial Scenarios

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....