Decoding AI Judgment: How LLMs Assess News Credibility and Bias
FOS: Computer and information sciences
Computer Science - Computers and Society
Computer Science - Computation and Language
Artificial Intelligence (cs.AI)
Computer Science - Artificial Intelligence
Computers and Society (cs.CY)
Computation and Language (cs.CL)
DOI:
10.48550/arxiv.2502.04426
Publication Date:
2025-02-06
AUTHORS (5)
ABSTRACT
Large Language Models (LLMs) are increasingly used to assess news credibility, yet little is known about how they make these judgments. While prior research has examined political bias in LLM outputs or their potential for automated fact-checking, internal evaluation processes remain largely unexamined. Understanding LLMs credibility provides insights into AI behavior and structured applied large-scale language models. This study benchmarks the reliability classifications of state-of-the-art - Gemini 1.5 Flash (Google), GPT-4o mini (OpenAI), LLaMA 3.1 (Meta) against structured, expert-driven rating systems such as NewsGuard Media Bias Fact Check. Beyond assessing classification performance, we analyze linguistic markers that shape decisions, identifying which words concepts drive evaluations. We uncover patterns associate with specific features by examining keyword frequency, contextual determinants, rank distributions. static classification, introduce a framework refine assessments retrieving external information, querying other models, adapting responses. allows us investigate whether reflect reasoning rely primarily on learned associations.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....