ChatGPT: Jack of all trades, master of none

Master data
DOI: 10.1016/j.inffus.2023.101861 Publication Date: 2023-06-03T15:37:09Z
ABSTRACT
OpenAI has released the Chat Generative Pre-trained Transformer (ChatGPT) and revolutionized approach in artificial intelligence to human-model interaction. The first contact with chatbot reveals its ability provide detailed precise answers various areas. Several publications on ChatGPT evaluation test effectiveness well-known natural language processing (NLP) tasks. However, existing studies are mostly non-automated tested a very limited scale. In this work, we examined ChatGPT's capabilities 25 diverse analytical NLP tasks, most of them subjective even humans, such as sentiment analysis, emotion recognition, offensiveness, stance detection. contrast, other tasks require more objective reasoning like word sense disambiguation, linguistic acceptability, question answering. We also evaluated GPT-4 model five selected subsets automated prompting process analyzed than 49k responses. Our comparison results available State-of-the-Art (SOTA) solutions showed that average loss quality was about 25% for zero-shot few-shot evaluation. For model, semantic is significantly lower ChatGPT. difficult task (lower SOTA performance), higher loss. It especially refers pragmatic problems recognition. personalize responses via Random Contextual Few-Shot Personalization, obtained better user-based predictions. Additional qualitative analysis revealed bias, likely due rules imposed human trainers by OpenAI. basis fundamental discussion whether high recent predictive models can indicate tool's usefulness society how learning validation procedures systems should be established.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (105)
CITATIONS (347)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....