The Mixed Subjects Design: Treating Large Language Models as Potentially Informative Observations

DOI: 10.31235/osf.io/j3bnt_v2 Publication Date: 2025-01-29T10:24:54Z
ABSTRACT
Large Language Models (LLMs) provide cost-effective but possibly inaccurate predictions of human behavior. Despite growing evidence that predicted and observed behavior are often not interchangeable, there is limited guidance on using LLMs to obtain valid estimates causal effects other parameters. We argue LLM should be treated as potentially informative observations, while subjects serve a gold standard in mixed design. This paradigm preserves validity offers more precise at lower cost than experiments relying exclusively subjects. demonstrate–and extend–prediction-powered inference (PPI), method combines observations. define the PPI correlation measure interchangeability derive effective sample size for PPI. also introduce power analysis optimally choose between costly less cheap Mixed designs could enhance scientific productivity reduce inequality access evidence.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (0)