MoZIP: A Multilingual Benchmark to Evaluate Large Language Models in Intellectual Property

Benchmark (surveying)
DOI: 10.48550/arxiv.2402.16389 Publication Date: 2024-02-26
ABSTRACT
Large language models (LLMs) have demonstrated impressive performance in various natural processing (NLP) tasks. However, there is limited understanding of how well LLMs perform specific domains (e.g, the intellectual property (IP) domain). In this paper, we contribute a new benchmark, first Multilingual-oriented quiZ on Intellectual Property (MoZIP), for evaluation IP domain. The MoZIP benchmark includes three challenging tasks: multiple-choice quiz (IPQuiz), question answering (IPQA), and patent matching (PatentMatch). addition, also develop IP-oriented multilingual large model (called MoZi), which BLOOMZ-based that has been supervised fine-tuned with IP-related text data. We evaluate our proposed MoZi four well-known (i.e., BLOOMZ, BELLE, ChatGLM ChatGPT) benchmark. Experimental results demonstrate outperforms BELLE by noticeable margin, while it had lower scores compared ChatGPT. Notably, current much room improvement, even most powerful ChatGPT does not reach passing level. Our source code, data, are available at \url{https://github.com/AI-for-Science/MoZi}.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....