Are Deep Neural Networks SMARTer than Second Graders?

Abstraction Benchmark (surveying) Deep Neural Networks
DOI: 10.48550/arxiv.2212.09993 Publication Date: 2022-01-01
ABSTRACT
Recent times have witnessed an increasing number of applications deep neural networks towards solving tasks that require superior cognitive abilities, e.g., playing Go, generating art, ChatGPT, etc. Such a dramatic progress raises the question: how generalizable are in problems demand broad skills? To answer this question, we propose SMART: Simple Multimodal Algorithmic Reasoning Task and associated SMART-101 dataset, for evaluating abstraction, deduction, generalization abilities visuo-linguistic puzzles designed specifically children 6--8 age group. Our dataset consists 101 unique puzzles; each puzzle comprises picture their solution needs mix several elementary skills, including arithmetic, algebra, spatial reasoning, among others. scale our training networks, programmatically generate entirely new instances puzzle, while retaining algorithm. benchmark performances on SMART-101, vision language meta-learning model using varied state-of-the-art backbones. experiments reveal powerful models offer reasonable supervised setting, they not better than random accuracy when analyzed generalization. We also evaluate recent ChatGPT other large subset find these show convincing reasoning answers often incorrect.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....