Global Data Constraints: Ethical and Effectiveness Challenges in Large Language Model
FOS: Computer and information sciences
Computer Science - Computation and Language
Computation and Language (cs.CL)
DOI:
10.48550/arxiv.2406.11214
Publication Date:
2024-06-17
AUTHORS (4)
ABSTRACT
The efficacy and ethical integrity of large language models (LLMs) are profoundly influenced by the diversity quality their training datasets. However, global landscape data accessibility presents significant challenges, particularly in regions with stringent privacy laws or limited open-source information. This paper examines multifaceted challenges associated acquiring high-quality for LLMs, focusing on scarcity, bias, low-quality content across various linguistic contexts. We highlight technical implications relying publicly available but potentially biased irrelevant sources, which can lead to generation hallucinatory LLMs. Through a series evaluations using GPT-4 GPT-4o, we demonstrate how these constraints adversely affect model performance alignment. propose validate several mitigation strategies designed enhance robustness, including advanced filtering techniques collection practices. Our findings underscore need proactive approach developing LLMs that considers both effectiveness constraints, aiming foster creation more reliable universally applicable AI systems.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....