NFDI4DS | UHH-SEMS - Publication Details

Global Data Constraints: Ethical and Effectiveness Challenges in Large Language Model

FOS: Computer and information sciences Computer Science - Computation and Language Computation and Language (cs.CL)

DOI: 10.48550/arxiv.2406.11214 Publication Date: 2024-06-17

Abstract Supplemental Material References Cited by

AUTHORS (4)

Jin Yang

Zhiqiang Wang

Yanbin Lin

Zunduo Zhao

ABSTRACT

The efficacy and ethical integrity of large language models (LLMs) are profoundly influenced by the diversity quality their training datasets. However, global landscape data accessibility presents significant challenges, particularly in regions with stringent privacy laws or limited open-source information. This paper examines multifaceted challenges associated acquiring high-quality for LLMs, focusing on scarcity, bias, low-quality content across various linguistic contexts. We highlight technical implications relying publicly available but potentially biased irrelevant sources, which can lead to generation hallucinatory LLMs. Through a series evaluations using GPT-4 GPT-4o, we demonstrate how these constraints adversely affect model performance alignment. propose validate several mitigation strategies designed enhance robustness, including advanced filtering techniques collection practices. Our findings underscore need proactive approach developing LLMs that considers both effectiveness constraints, aiming foster creation more reliable universally applicable AI systems.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications OPENAIRE - Products

PlumX Metrics

Global Data Constraints: Ethical and Effectiveness Challenges in Large Language Model

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....